Hashing vs. Encryption: Understanding the Key Differences and Use Cases with code examples
Introduction
Hashing and Encryption are fundamental concepts in the programming world.
A beginner should know the clear distinction between them.
We will be going through hashing and encryption in detail. I will be adding code examples as well. The implementation will be done using nodeJS. We will see the difference between them. I will share my own experiences of using them personally in my software career journey.
Hashing
Hashing is a concept where given a string as input we pass that string to a
hashing function, which gives a fixed-size output irrespective of the length of the input string. The output string is called hash code or hash value.
For the same input, it will give the same output.
Hashing is used in many places but I will be focusing on the most popular place it is used. So It is used in storing the passwords in the database. The passwords are never stored directly in the database, they are always hashed and then stored in the DB.
The password is hashed because if the database is compromised, we don't
want the hacker to have access to the passwords.
So it is hashed because hashing is a one-way process and irreversible, even if a hacker gets access to the database he won't get access to passwords that are in plain text.
An output text from a hashing function cannot be reverted back to its original input text.
An analogy for this would be frying an egg after an egg is fried it cannot be made into a raw egg again.
So if the password is hashed and it is irreversible then how do we verify the password during a login? We do that by calculating the hash of the password you typed in the input field is the same as the hash present in the database for that user. As we already know that for the same input, the hash values will be the same.
But do you know that two different inputs can also give the same hash values?
Yes, that is possible and this is called collision. A good hashing function algorithm reduces the chances of collision.
Popular hashing functions are MD5, SHA-1, SHA-2, SHA-3, Bcrypt, Scrypt, Argon2.
MD5 & SHA-1 although widely used in the past, these algorithms are prone to higher collisions.
SHA-2 & SHA-3 are recommended for password hashing as it reduces collisions significantly
Bcrypt, Scrypt & Argon2 are specially designed for password hashing, they are designed in such a way as to slow the hashing process to prevent brute-force attacks.
You can check this link to see how the MD5 converter works, all the algorithm works similar way to this.
Another place where hashing is very widely used is to check the integrity of the file. Whenever you download a file from the internet, you might have an option to verify if the file was properly downloaded. This is sometimes done
internally via the system OS. The file from the internet also sends the hash value of the file. After downloading the hash value from the internet is compared with the hash value of the file downloaded. If it matches we can be sure that the file downloaded is in perfect condition.
Hashing is also widely used in data structures known as hashmaps. These structures are similar to objects in JavaScript, where we store key-value pairs. A hashmap uses a hash function to compute an index based on the key, an array is used to store the value and the index calculated via the hash function is where the value is stored. This allows for efficient retrieval: when we search for a key, the hash function calculates its index, quickly locates its position in memory, and retrieves the stored value.
The database also makes use of hashing when indexing a column in the database.
Caching also uses hashing to store data in the memory.
Personal Use Case:
I had implemented storing of passwords in the database. Initially, when I had no experience and just started with software development, I stored it as plaintext. I realized that if the database were to get compromised this would be a major issue. So I changed the implementation and the password was then hashed. Bcrypt is a popular library in nodejs that was used for this.
Encryption
Encryption means converting a readable plaintext into an unreadable format.
This is achieved by using an encryption key. This is done to ensure data is protected from unauthorized access.
Encryption is widely used to store sensitive information like bank details, user information, and other things in the databases.
A key distinction between hashing and encryption is that we can get the original text from an unreadable format in the case of encryption but this is not possible in case of hashing.
There are two main types of encryption
- Symmetric Encryption
Symmetric encryption means the encryption and decryption are done using the same key. The storage of keys and key management is critical in this method. As, if the key is leaked all encryption can be broken. The most popular symmetric encryption are AES(Advanced Encryption Standard) and DES(Data Encryption Standard).
Symmetric encryption is used to encrypt data at rest and also in transit, by rest I mean data that is not moving over a network. Data that is present in the database. For hard disk encryption, transferring data over a VPN is encrypted via symmetric encryption.
Data stored in cloud storage are also protected using symmetric encryption.
- Asymmetric Encryption
Here there are two keys involved in the encryption. A public key and a private key. Public key as the name suggests is publicly available and not kept as a secret. The private key has to be private and stored securely.
The asymmetric encryption works as follows:
A public key is used to encrypt the data whereas a private key is used to decrypt the data. If two parties want to communicate via asymmetric encryption, If party A starts the message then it will ask for a public key from
party B. It will use this public key to encrypt a message. This encrypted message will be then sent to party B. Party B will use private key available with him to decrypt the message.
When party B wants to send a message to party A, a similar process will happen but here the message will be encrypted using party A's public key, and party A will be able to read it by decrypting it using the private key of party A himself.
Asymmetric encryption is much slower as compared to symmetric encryption,
as the encryption keys involved are of larger length than in symmetric encryption. Also, the mathematical operations used for asymmetric encryption are complex and intensive compared to symmetric encryption.
The key pair generated that is public and private keys are mathematically related to each other but they are computationally unfeasible to derive one from the other.
Personal Use Case:
Personally, I have used both types of encryption in my professional career. Symmetric encryption was used by me when I had to encrypt data stored in the database. The data was of a sensitive nature, it contained information of financial and personal information. The encryption method used was AES 256-bit encryption. We didn't store the actual decryption key in our database. This was crucial because if our database was compromised, the attacker wouldn't have immediate access to the decryption key. Instead, we stored an encrypted version of the decryption key in the database.
We used Azure Key Vault, a cloud service for securely storing and accessing secrets. The master key to decrypt the stored encrypted key was kept in Azure Key Vault. When we needed to decrypt data, our application would: a. Retrieve the encrypted key from our database b. Use Azure Key Vault to decrypt this key c. Use the resulting decrypted key to finally decrypt the sensitive data.
In one of my projects, we implemented asymmetric encryption to secure password transmission from the frontend to the backend. Here's how it worked:
We generated a public-private key pair on the backend server.
The public key was shared with the frontend team.
During login, the frontend used this public key to encrypt the user's password.
The encrypted password was then sent over the network to the backend.
On the backend, we used the private key (securely stored on the server) to decrypt the password.
This approach ensured that even if the network traffic was intercepted, the password would remain secure, as only our backend server with the private key could decrypt it.
RSA algorithm was used for the above implementation.
Below is a comparison of the distinction between Hashing vs Encryption
Characteristic | Hashing | Encryption |
Purpose | Data integrity, password storage, file verification | Secure data transmission and storage |
Reversibility | One-way (irreversible) | Two-way (reversible with key) |
Output | Fixed-length output regardless of input size | Variable-length output based on input size |
Key Usage | No key required | Requires encryption/decryption key(s) |
Deterministic | Yes (same input always produces same output) | Can be deterministic or non-deterministic depending on the algorithm |
Collision Resistance | Aims to minimize collisions | Not a primary concern |
Common Algorithms | MD5, SHA-1, SHA-2, SHA-3, bcrypt, Argon2 | AES, DES, RSA, ECC |
Use Cases | Password storage, data integrity checks, hash tables | Secure communication, data protection at rest |
Speed | Generally fast (except for intentionally slow password hashing algorithms) | Can be slower, especially for asymmetric encryption |
Security Concerns | Collision attacks, preimage attacks | Key management, algorithm vulnerabilities |
Code examples:
I will add code examples for hashing, symmetric encryption, and asymmetric encryption implementation in nodeJS.
You can play around with the code. Change the input values, and check how the hash value changed. Do similarly for encryption. Check the output lengths of the encrypted text in both examples.
1. Hashing
Conclusion:
In this blog, we looked at the difference between hashing and encryption.
We saw an example from my own professional journey.
I shared a code snippet of both which you guys can try. Do share your comments and words of encouragement. Signing off, Saish.