Don't forget to add the niche

Rust is my favorite programming language. One of the many things I love about Rust is its layout optimizations that allow you to use high-level data types while still getting the same, or even better size and cache optimality than you would with C/C++. Possibly Rust's most impressive layout optimization is the famous niche optimization for enums, which is also, unfortunately, not the best advertised or the most obvious to exploit.

When I'm writing Rust code, I use enums, especially Option and Result, all the time, so I've gotten somewhat keen to the mechanics of the optimization. In my opinion, it could stand to be easier to trigger; if you don't know the rules of niche optimization, you're almost certainly leaving cheap or free performance on the table. In some cases, the unused space is already there in your enum, going unused, but the compiler will not perform niche optimization unless you explicitly add a niche to your struct.

Playground link with all examples from this post.

Niche optimization primer

Here's a brief explanation of the basic concept for the uninitiated. In Rust, when an enum variant contains fields that have unused bits or illegal bit patterns, the compiler will try to exploit this "free" space to optimize the layout of your enum. In the general case, an enum in Rust is represented as a tagged union. For example, this enum:
enum MyEnum {
    First(u16),
    Second(u32),
}
... has (roughly) the same layout as this C struct:
struct MyEnum {
    uint8_t tag;
    union {
        uint16_t first;
        uint16_t second;
    };
};

When one variant contains a field whose type has illegal values, there may be a better representation. For example, this enum:

enum MyEnum2 {
    First(u16),
    Second(u16, NonZeroU16),
}
... actually has the same layout as the tuple (u16, u16). Instead of adding a "tag" to the layout, the compiler uses a special check to detect which variant the tuple represents: if tuple.0 != 0, the tuple cannot be an instance of the second variant, so it must be an instance of MyEnum::First. The pseudocode for this check looks like this:
let (x, y): MyEnum2 = ...;
if y == 0 {
    // Cast to First(x)
} else {
    // Cast to Second(x, y)
}
All cases of niche optimization are various generalizations of this simple trick.

Triggering niche optimization

A bit pattern that is invalid for a particular type, such as zero for NonZeroU16 is called a niche. The following (non-exhaustive) list gives the primitive types that have niches:
  • bool
  • char
  • &T
  • &mut T
  • NonNull<T>
  • NonZero<T>
  • function pointetplrs
  • structs whose fields have niches
Different types have different numbers of niches. There is only one niche for NonZeroU16, but bool can only take the values zero and one, leaving seven bits worth of niche representations in which the compiler can store a tag. So this enum only takes up two bytes of memory:
enum MyEnum3 {
    First(u8, bool),
    Second(u8),
    Third(u8),
    Fourth(u8),
    Fifth(u8),
    Sixth(u8),
}
The niche found by the compiler also cannot overlap with other variants. Compare these two enums:
// Triggers niche optimization
enum MyEnum4 {
    First(u16),
    Second(u16, bool),
}

enum MyEnum5 {
	First(u16, bool),
    Second(u16, bool),
}
In the case of Option, the None variant contains no fields, so Option<T> always triggers niche optimization as long as T contains a niche.

Caveat: padding

It seems obvious that a struct with padding bytes (due to alignment) would trigger niche optimization. But, surprisingly, this is not the case. Rust treats padding bytes as fair game for unsafe code to overwrite, so padding does not count as a niche. In order to trigger niche optimization for a type with padding bytes, you have to explicitly fill those padding bytes with a dummy field that has a niche, such as bool. Here's an example:
// Does not trigger niche optimization
struct Padded {
    x: u16,
    y: u8,
}

// Triggers niche optimization
struct PaddedWithNiche {
    x: u16,
    y: u8,
    _pad: bool,
}
This is a pretty silly piece of code to have to write to trigger an important optimization. I am not the first to point out this flaw.

This is only one case where niche optimization could be made easier to trigger. Some improvements that would be useful in future Rust versions:

  • padding is a niche by default (requires a new Rust edition)
  • NonMax<T> (integers less than INT_MAX)
  • NonNegative<T> (a.k.a. u7, u15, etc.)
  • fully-featured bitfields

Conclusion

That's the end of the post. Happy niche-hunting! If you have questions, message me on Mastodon or BlueSky.

Comments

Popular posts from this blog

Between a block and a hard place

Computing the probability of generating conflicting random strings