In the previous post, I had explained the uses of cryptography in the blockchain. This post provides an overview of the use of hash functions in blockchains.
The electronic coin is an essential feature of every blockchain, albeit the exchange of cryptocurrency may not be the use case. Blockchains can be used for other purposes such as storing land title information, vehicle ownership records. In such cases, a currency will not be apt to use. Rather, tokens are exchanged to form records of events.
What is a hash function?

A hash function takes an arbitrary amount of data and maps it into a fixed size data. Some examples are shown above. In blockchain technology, we use cryptographic hashing algorithms which are suitable for use in cryptography.
Blocks in a Chain
When a new block is added to the existing blockhchain, some data must be included from the previous block for reasons of immutability. This is performed by applying a hash function on two pieces of information:
- Hash of the previous block
- Data of the new block
The hash function generates a hash for the new block. The hash is often in hexadecimal format, i.e. is a number.
An example
Suppose Alice needs a pair of socks and has decided to buy them from Bob. Let Alice and Bob each have a key pair. A key pair refers to the set of public and private keys of a person. Public key cryptography was explained in my my previous post. If you missed it check here.
Since Alice and Bob want to perform this transaction using Bitcoins, Alice must first ask for Bob’s public key. To authenticate the transaction to Bob, she uses her private key to sign a piece of data, forming a signature. The bitcoin software will then use the hash of the previous transaction and the public key of Bob to generate this data, which then becomes Alice’s signature after she signs it. The software applies SHA256 hashing algorithm on the data. This transaction is then broadcast to all other nodes on the network. The nodes then try to add the transaction to the block which they are building. A block contains a set of transactions along with the hash of the previous block in the chain.
In the bitcoin network, blocks are added to the blockchain every 10 minutes or so. However, since all the nodes have a copy of the blockchain, the question of which is the correct version comes up. This can be solved by allowing nodes to reach a consensus as to the true history of transactions. The nodes can have different versions, since the transactions are broadcast in a best-effort basis. So not all nodes receive details of a transaction occurring.
For this purpose, blockchains use consensus algorithms to allow nodes to agree on a single history of the blockchain. Here too, cryptographic hashes are used. However, this will be explained some other day.
Timestamp:20200527…
In the transaction between Alice and Bob, Bob could verify that Alice had actually sent him the electronic coin, but he can’t check if the Alice double-spent the coin. For this purpose, blockchains use a timestamp peer-to-peer timestamp server. A timestamp server can verify if a transaction/record/document exists at a given time. If a transaction were recorded by nodes such as:
d02264297573876c86e24f84b96f7d69f8cfe5733aea4ab2da53be15cba86b34 sent 10.52407 tokens to fbb9f5acd42980cc1c21111e6ae3de904a0d76c47e40fdd77cbf8e9d6d5d0071
And immediately the first sender tried to send the same token to someone else:
d02264297573876c86e24f84b96f7d69f8cfe5733aea4ab2da53be15cba86b34 sent 10.52407 tokens to 5e1ef9ee776c5da5c0d06d4a9f69c374a7c5e279d491c0c0e2fd4a98b1b8e5d4
then, the second transaction will be rejected, since the first one occurred before the second. This happens because every transaction belongs to a pool of unconfirmed transactions. This implies that in any blockchain, transactions often remain in the unconfirmed state before they are added to the blockchain. If miners of the blockchain include the first transaction in a block, and start working on the second block, the hash of the previous transaction will be used to create a hash for the new block. When this is done, if a transaction containing the same amount(tokens cannot be split) is broadcast to the network, it is rejected since the timestamp of the first transaction is older. Timestamp servers use the hash of the previous block and the current block’s data to generate a hash for the current block. Timestamping is performed by the nodes of the network itself. The very first block of a blockchain is called the genesis block. As new blocks are added, this eventually results in the structure called the blockchain.
Mining in Blockchain
In blockchain, mining refers to the activity of adding new records to the exisiting record of transactions. In the bitcoin cryptocurrency, mining can be performed by a collection of nodes, called a mining pool, or using a special computer type of computer called an ASIC. The primary purpose of mining is to allow nodes to see the history of the transactions without being able to modify it. Mining not only introduces new coins/tokens to the network, but also keeps the transactions secure by providing an incentive to miners to continue adding transactions to the blockchain. These incentives come in the form of newly generated coins/tokens and transaction fees collected for including a transaction in the block. These miners keep adding blocks to the blockchain. These new records are in the form of blocks. Here too, cryptographic hash functions play a role. A block contains transactions, and the number of transactions included in it depends on the time taken to find the solution to a cryptographic puzzle(although there is a hard limit on the maximum number of transactions).
Proof-of-Work
Before a new block is added to the blockchain, the hash of the previous block header must be used as an input to the hash function (technically the SHA256 hash function). Along with the previous block’s hash, the new block’s header is also used. The generated value is the hash of the new block and represents a 256bit number (although in hexadecimal base). Before this block can be added, the miner must first solve the puzzle which I mentioned before. This puzzle is called the proof-of-work mechanism, however not all blockchains use this. Some like the Ethereum cryptocurrency use proof-of-stake as an alternative due to lesser computational costs.
The puzzle/problem is to find a hash for the new block, which must be numerically less than a target value. This target value is shared by all the nodes on the bitcoin network. The target is simply a 256bit (32Byte) integer. Every new block which needs to be added to the chain must have the block hash less than this number (because hashes are hexadecimal numbers). But you may ask “How can we change the hash, since the transactions are constant?”. As you may have guessed, the transactions are not changed, but a small number called a nonce is incremented to find the hash which satisifes the target(i.e. nonce must be less than the target). Upon finding such a hash, the block is broadcast to all the nodes, who authenticate the set of transactions and add the block to their exisitng chains. Moreover, the difficulty of the problem is set dynamically by adjusting the target value for every 2016 blocks generated.
Merkel Trees
Until now, I hadn’t mentioned how the transactions are stored in the blockchain. If you consider the size of the transactions involved (in terms of computer storage) in the blockchain, you will notice that it is quite noticeable, since blocks store transactions on a best-effort basis. While transactions are publicly broadcast, only the miners include the transactions into the blockchain. Inside every block, a variation of the tree data structure is generated. This is called a merkel tree. It looks a lot like a binary tree and stores cryptographic hashes.
The idea behind storing transactions in merkel trees is that provided the root node of the merkel tree, you can verify the authenticity of the transactions. All transactions are included in the block in a raw format, and a transaction id(txid) is generated by hashing the raw transactions. From these txids, the merkel tree is constructed. The resulting hashes are themselved pared with each other and hashed. If one or more txids can’t form a pair, they are hashed with themselves. This process repeats until the block contains only a single hash (the merkel root) as a representation of the transactions. The merkel root is a part of the block header.
Hopefully, you have understood the role of cryptographic hash functions in the blockchain. For a brief discussion of consensus algorithms, stay tuned.
3 thoughts on “Hashing in the Blockchain”