Cryptographic Hash Functions

2018, Aug 16    

I recently enrolled in a course called Blockchain and Cryptocurrencies Technologies in a Computer Science Masters program, which is a copy from the Princeton’s course freely available on Coursera.

As a way to take the most out of the course, I’ll be making small and simple posts about topics from each class.

Cryptographic Hash Functions

In order to explain what are cryptographic hash functions, I first need to explain what are hash functions, and which properties they should abide to.

Simply put, a hash function is one that receives a variable-size input, and returns a fixed-size output, usually 256 or 512 bits. Another requirement is that the function must be efficiently computable.

To be considered a cryptographic hash function a hash function must also implement three security properties:

  • Collision-free: this property means that it should be impossible to find two different input values that produce the same output hash, that is, x != z and H(x) = H(z), where H(x) represents the hash value of x. Of course, that’s impossible to achieve, given that the space from which inputs can be drawn is infinitely bigger than the space of the outputs, so this property demands only that it should be probabilistically impossible to achieve such result.
  • Hiding: it states that it should be infeasible to find the input for the hash function, given the output, that is, given H(x) it shouldn’t be feasible to find x.
  • Puzzle-friendly: given that the input for the hash function is divided in two parts, x and r, and the hash function is determined by H(x | r) = y, it should be very hard to find x if you have access to y and r.

Hash Pointers and Data Structures

Using regular data structures plus hash functions, it’s possible to create “new” data structures that make so that’s impossible to tamper with the data contained within it.

Let’s use as an example the image below, which represents a linked list:

Hash linked list

It’s like a regular linked list, except that the pointers to the previous elements in the list are actually hash pointers: they hold the hash value of the whole previous element.

If a malicious actor were to change the data of the first element (index 0), its hash value would be changed as well, and by checking the second element’s hash pointer to the previous element, you’d be able to tell that the list has been tampered with.

I won’t go into detail, but another useful data structure with hash pointers is a balanced binary tree, called Merkle Tree.

Digital Signature

Digital Signature is a protocol to sign and validate documents of any form. A given person can sign a document, and everyone else will be able to verify that the produced document indeed came from the person that signed it.

To use digital signatures, there must be a unique matching pair of keys, one private and one public. The private key is used by the person that’s signing the document, and the public key is used by whoever wants to verify the signed document.

The API for digital signatures has three methods:

  • (privateKey, publicKey) = generateKeys(keySize): returns a unique matching pair of private and public keys.

  • signature = sign(privateKey, document): signs a document with the private key.

  • isValid = verify(publicKey, document, signature): verifies if the signature matches the document, i.e., if the signature was produced by signing the document with the private key that matches the given public key.

Public Keys as Identities

Expanding on Digital Signatures, it’s possible to use the same public key defined above as a unique identity. Because the match between private and public keys is unique, any public key that verifies a signed document represents the identity of whoever possesses the private key. This way, a person can have multiple identities on the internet, but any given identity (that is, private key) will belong to one and only person.