rust/beginners-guide.md
2020-09-21 21:56:09 -06:00

3.7 KiB

Beginner's Guide To SIMD

Hello and welcome to our SIMD basics guide!

Because SIMD is a subject that many programmers haven't worked with before, we thought that it's best to outline some terms and other basics for you to get started with.

Quick Background

SIMD stands for Single Instruction, Multiple Data. In other words, SIMD is when the CPU performs a single action on more that one logical pieces of data at the same time. Instead of adding two registers that each contain one f32 value and getting an f32 as the result, you might add two registers that each contain f32x4 (128 bits of data) and then you get an f32x4 as the output.

This might seem a tiny bit weird at first, but there's a good reason for it. Back in the day, as CPUs got faster and faster, eventually they got so fast that the CPU would just melt itself. The heat management (heat sinks, fans, etc) simply couldn't keep up with how much electricity was going through the metal. Two main strategies were developed to help get around the limits of physics.

  • One of them you're probably familiar with: Multi-core processors. By giving a processor more than one core, each core can do its own work, and because they're physically distant (at least on the CPU's scale) the heat can still be managed. Unfortunately, not all tasks can just be split up across cores in an efficient way.
  • The second strategy is SIMD. If you can't make the register go any faster, you can still make the register wider. This lets you process more data at a time, which is almost as good as just having a faster CPU. As with multi-core programming, SIMD doesn't fit every kind of task, so you have to know when it will improve your program.

Terms

SIMD has a few special vocabulary terms you should know:

  • Vector: A SIMD value is called a vector. This shouldn't be confused with the Vec<T> type. A SIMD vector has a fixed size, known at compile time. All of the elements within the vector are of the same type. This makes vectors similar to arrays. One difference is that a vector is generally aligned to its entire size (eg: 16 bytes, 32 bytes, etc), not just the size of an individual element. Sometimes vector data is called "packed" data.

  • Lane: A single element position within a vector is called a lane. If you have N lanes available then they're numbered from 0 to N-1 when referring to them, again like an array. The biggest difference between an array element and a vector lane is that it is relatively costly to access an individual lane value. Generally, the vector has to be pushed out of register onto the stack, then an individual lane is accessed while it's on the stack. For this reason, when working with SIMD you should avoid reading or writing the value of an individual lane during hot loops.

  • Bit Widths: When talking about SIMD, the bit widths used are the bit size of the vectors involved, not the individual elements. So "128-bit SIMD" has 128-bit vectors, and that might be f32x4, i32x4, i16x8, or other variations. While 128-bit SIMD is the most common, there's also 64-bit, 256-bit, and even 512-bit on the newest CPUs.

  • Vertical: When an operation is "vertical", each lane processes individually without regard to the other lane in the same vector. For example, a "vertical add" between two vectors would add lane 0 in a with lane 0 in b, with the total in lane 0 of out, and then the same thing for lanes 1, 2, etc. Most SIMD operations are vertical operations, so if your problem is a vertical problem then you can probably solve it with SIMD.

  • Horizontal: When an operation is "horizontal", the lanes within a single vector interact in some way. A "horizontal add" might add up lane 0 of a with lane 1 of a, producing the output into lane 0 of out.