Open Source Weekly #7 - SIMD

Today we are going to talk about high performance. Not the standard distributed cloud blahblah but high performance on a single core on a single machine. Please welcome SIMD instructions.

Did you know?

What are SIMD instructions?

SIMD (for Single Instruction, Multiple Data) are special instructions on CPUs and GPUs that allow them to perform an operation on multiple bytes with only one instruction.
It is often called vectorization because operations are applied to a vector of data with a single instruction.
It allows to implement very fast algorithms on a single thread on general purpose hardware, as opposed to specific hardware acceleration like AES-NI.
It is particularly used in machine learning, cryptography, databases and content processing (video / image / audio encoding) and is a good alternative to multithreading.
MMX, SSE, AVX, AVX-512 are SIMD instructions for Intel. NEON for ARM.


ripgrep (public domain)
A really (really) fast CLI search tool which in a lot of situations can replace grep. It is modern and user friendly. For example, it supports .gitignore files by default.

simdjson-go (Apache-2.0)
A Go port of simdjson by MinIO which can parse up to gigabytes of JSON per second 🚀

pikkr (MIT)
“JSON parser which picks up values directly without performing tokenization in Rust”

faster (MPL 2.0)
“Easy, powerful, portable, absurdly fast numerical calculations. Includes static dispatch with inlining based on your platform and vector types, zero-allocation iteration, vectorized loading/storing, and support for uneven collections." For Rust.

ncnn (BSD 3-Clause)
A high-performance neural network inference framework optimized for the mobile platform by Tencent.

mandel-simd (public domain)
Mandelbrot Set in SSE, AVX, and NEON. Great to learn how to use SIMD with a small project.

fastlwc (MIT)
SIMD-enhanced wc (Word Counter).

Project of the community

QuestDB (Apache 2.0)
“QuestDB is a NewSQL relational database designed to process time-series data, faster. Our approach comes from low-latency trading. QuestDB’s stack is engineered from scratch, zero-GC Java and dependency-free. The whole database and console fits in a 3.5Mb package."
They are using SIMD to achieve extreme performance (see below).


On the dangers of Intel’s frequency scaling
Unfortunately using SIMD is not a silver bullet and may slow down your multithreaded programs due to how some intel’s processors throttle.

Using SIMD to aggregate billions of rows per second
With their new 4.2 release, QuestDB (a time-series database) introduced SIMD which made their (already fast) aggregations faster by 100x.

Parsing gigabytes of JSON per second in Go
This is the detailed explanation of why and how MinIO ported simdjson to Go.

SIMD < SIMT < SMT: parallelism in NVIDIA GPUs
The difference between SIMD, SIMT and SMT and how it works in Nvidia GPUs.

Towards fearless SIMD
A good write-up about how to transpose the safe and zero-cost abstraction philosophy of Rust to the SIMD world.

SIMD instructions in Go
An overview of using SIMD instructions with Go for different processor architectures.

Base64 encoding and decoding at almost the speed of a memory copy
This research paper describes how they achieved base64 encoding and decoding at almost the speed of a memcpy. Really impressive.

Learning SIMD with Rust by finding planets
A concrete guide of using SIMD to speed up an algorithm in a portable way (in Rust).


This week has been particularly intense regarding security. Major vulnerabilities have been patched in Firefox (2 times!), Ubuntu’s Linux Kernel, Red Hat’s Linux Kernel, Suse’s Linux Kernel, Android, and more.
Go update right now (browsers need to be restarted to apply auto-updates).

You can read more here.

Stay safe ✌️

I’ll write you once a week on avoiding complexity, hacking and entrepreneurship.
You can subscribe by Email, RSS or Mastodon