Fast hashing, it's not that simple

Whether you are developing a new application or defining a new protocol, you may have a hard time deciding which hash function to use. Which one is safe? Which one is fast?

Let's find out!

Why do we need fast hashing?

But Sylvain, I thought that hashing should be slow in order to stop hackers from cracking passwords in the event our database gets breached.

Not ths kind of hash functions. They are actually called Password-Based Key Derivation Functions (PBKDF) by cryptographers, such as Argon2id or Scrypt. Their purpose is to turn a password (which is cryptographically insecure) into a secure hash (or key). The slower they are, the more resistant they are to cracking.

Today, we will study cryptographic hash functions designed to ensure the integrity of data. These functions should be collision-resistant: it must be virtually impossible to find two messages m1 and m2 where hash(m1) == hash(m2). They should also be preimage-resistant: it must be virtually impossible to find the original message m for a given hash(m).

There are also fast non-cryptographic hash algorithms such as xxHash capable of reaching a throughput of 31.5 GB/s per core, sometimes faster than memcpy, but due to the high risk of collision, the possibility of finding a message m2 such that hash(m2) == hash(m1), they should be used in a few, very precise cases and will not be studied here.

Let's say that we have 150 GB of data to hash (a backup of all your pictures and videos) and you want to check its integrity. Here is how much time it would take depending on the hashing speed.

Throughput / core 1 core 4 cores 8 cores
300 MB/s 500s (8m20s) 125s (2m05s) 63s (1m03s)
800 MB/s 188s (3m08s) 47s 24s
1700 MB/s 88s (1m28s) 22s 11s
4000 MB/s 38s 10s 5s

Thus, the question is: how much time do you (or your users) want to lose waiting for your application to complete its computation?

There are a lot of debates online about which algorithm is faster due to its clever design, but what about real-life?

The numbers

I've benchmarked the most famous hashing algorithms available today, on both amd64 and arm64 processors, and the results are... interesting.

Without further ado, let's see the raw numbers. Here I tested the fastest Go implementations, but you should be able to find similar results for other ecosystems.

amd64 (AMD EPYC 7R13 - AWS c6a.4xlarge):

CPU:
- arch: amd64
- physical cores: 8
- logical cores: 16

CPU features:
- AVX2: true
- AVX: true
- SSE: true
- SSE2: true
- AES: true
- SHA1: false
- SHA2: false
- SHA512: false
Algorithm 64 B 64 KB 10 MB
SHA2-256 617.84 MB/s 1776.48 MB/s 1776.27 MB/s
SHA2-512 295.78 MB/s 760.26 MB/s 760.94 MB/s
SHA3-256 73.30 MB/s 375.60 MB/s 377.05 MB/s
Blake2b-256 387.09 MB/s 909.01 MB/s 908.23 MB/s
Blake3-256 812.69 MB/s 3903.07 MB/s 4010.50 MB/s

arm64 (Graviton 3 - AWS c7g.4xlarge):

CPU:
- arch: arm64
- physical cores: 16
- logical cores: 16

CPU features:
- AES: true
- SHA1: true
- SHA2: true
- SHA512: true
Algorithm 64 B 64 KB 10 MB
SHA2-256 478.46 MB/s 1625.15 MB/s 1628.37 MB/s
SHA2-512 133.36 MB/s 299.19 MB/s 299.85 MB/s
SHA3-256 80.34 MB/s 368.37 MB/s 369.60 MB/s
Blake2b-256 267.29 MB/s 594.58 MB/s 595.25 MB/s
Blake3-256 400.77 MB/s 462.32 MB/s 464.36 MB/s

Want to learn more about applied cryptography such as how to implement end-to-end encryption in practice? Get my book Black Hat Rust where we build our own end-to-end encrypted protocol to secure the communication of a Remote Access Tool in Rust.

Wow! Blake3 is incredibly fast... on amd64. Today we do not have an efficient implementation for arm64 for Go. I've benchmarked the official implementation in Rust on arm64 which uses NEON instructions and it has a throughput of roughly 900MB/s per core, still less than SHA-256.

To reach these speeds, the Blake3 implementation uses intel's AVX-512 instructions while the SHA-256 implementation use various intel's SIMD instructions on amd64 and SHA2 instructions on arm64.

But, AVX-512 instructions are known to decrease in performance when the number of threads increases on at least some inter processors due to frequency scaling. SHA-256 also has an AVX-512 backed implementation capable of reaching up to 3.5 GB/s per core.

Also, even if I trust the authors and the design looks sound, Blake3 is still a young algorithm which hasn't received a lot of analysis so far, compared to the others.

So, for me the answer is clear, there is no reasons not to choose SHA-256 for hashing.

1.7 GB/s per core is not going to be your bottleneck, I/O (disk, network) is.

For example, the 16-inch MacBook Pro with M2 Pro in a 2TB storage configuration achieves read/write speed of roughly 5-6 GB/s which means that we can saturate it with only 3 of the processor's 12 cores. 1.7 GB/s is also enough to saturate a 10 Gb/s network link (1.7 GB/s * 8 = 13.6 Gb/s).

Some Closing Thoughts

While not making headlines, SHA-256 was, is, and will continue to be a safe bet for any new application or protocol in the foreseeable future, as you will be able to easily find an efficient implementation using hardware acceleration for your programming language of choice. Being ubiquitous (Bitcoin, most software update mechanisms, antiviruses...), you will know for sure if its security is compromised, which is less likely for lesser known algorithms.

Furthermore, with the boom of arm64 processors which have dedicated SHA2 instructions, it's actually the fastest algorithm when we consider today's fractured computing landscape.

And this is before talking about NIST approved algorithms.

Choose boring, choose safe, choose fast. Choose SHA-256.

As always, you can find the code on GitHub: github.com/skerkour/go-benchmarks

1 email / week to learn how to (ab)use technology for fun & profit: Programming, Hacking & Entrepreneurship.
I hate spam even more than you do. I'll never share your email, and you can unsubscribe at any time.

Tags: hacking, go, programming, cryptography

Want to learn Rust, Cryptography and Security? Get my book Black Hat Rust!