articles

Rust: Optimizing integer to string conversions

Doing integer to string conversions is very common when doing development (logging, or even showing frames per second (FPS) in video games...), not teaching you anything new here. As such, performance in this area is very important.

Explanations of the initial situation

If a type implements the Display trait, it also gets the ToString trait implementation "for free" because of this blanket implementation:

Runimpl<T> ToString for T
where
    T: Display + ?Sized
{}

Internally, ToString is implemented as such:

  1. We create a String.
  2. We create a Formatter and we give it this new String as buffer.
  3. We call Display::fmt and we pass T (as self) and the Formatter as arguments.
  4. The buffer (our String) was now filled, we can return it.

So based on this, it's difficult to see where the issue could be. As usual, devil is in the details.

Display implementation

For integers in particular, the Display implementation needs to check a few things to correctly generate a string representation:

There are quite a lot of checks. You can see the code here if you want the technical details.

Anyway, all these checks have a cost, so why not skip them? Well, that's exactly what was done.

First improvement was #136264. It specialized the implementation of ToString for all integers except i128/u128. The results were (interesting result for u8/i8):

name 1.87.0-nightly (3ea711f17 2025-03-09) With #136264 diff
bench_i16 32.06 ns/iter (+/- 0.12) 17.62 ns/iter (+/- 0.03) -45%
bench_i32 31.61 ns/iter (+/- 0.04) 15.10 ns/iter (+/- 0.06) -52%
bench_i64 31.71 ns/iter (+/- 0.07) 15.02 ns/iter (+/- 0.20) -52%
bench_i8 13.21 ns/iter (+/- 0.14) 14.93 ns/iter (+/- 0.16) +13%
bench_u16 31.20 ns/iter (+/- 0.06) 16.14 ns/iter (+/- 0.11) -48%
bench_u32 33.27 ns/iter (+/- 0.05) 16.18 ns/iter (+/- 0.10) -51%
bench_u64 31.44 ns/iter (+/- 0.06) 16.62 ns/iter (+/- 0.21) -47%
bench_u8 10.57 ns/iter (+/- 0.30) 13.00 ns/iter (+/- 0.43) +22%

Second improvement was #142294 for i128/u128. It's exactly the same idea as the first improvement, we specialize ToString implementation. Here's the result:

bench name 1.89 nightly with #142294 diff
bench_i128 29.25 ns/iter (+/- 0.66) 17.52 ns/iter (+/- 0.7) -40.1%
bench_u128 34.06 ns/iter (+/- 0.21) 16.1 ns/iter (+/- 0.6) -52.7%

Improving the conversion itself (not 128 bits)

Specializing the ToString implementations was an easy win, but now it's time to actually take a look at the conversion itself for integers smaller than 128 bits.

Let's take a look at #128204.

First thing to note: we have a buffer [MaybeUninit::<u8>::uninit(); SIZE] where SIZE is the maximum number of digits which can represent the integer. This buffer is filled starting by the end and then we convert the content to a &str starting at the current offset.

So if you have a u8 of value 9, SIZE is 3 (because the maximum value is 256, which is 3 digits), the buffer will look like this [<uninit>, <uninit>, b'9'], so the offset is 2.

Initially, SIZE was always the number of digits of the biggest 128 bits integer (so 40), which is pretty bad, especially for the smallest integers which only need a size of 3. So the first obvious things to do was to make the array size to depend on the integer maximum number of digits. Luckily for us, there is a (const!) method for that: ilog:

Runconst SIZE: usize = usize::MAX.ilog(10) as usize + 1;
let mut buf = [MaybeUninit::<u8>::uninit(); SIZE];

You can replace usize with any integer and you will always have the maximum of digits for this integer.

So that's the first improvement, you can see this change here.

The second improvement was to remove a loop at compile time for u8/i8 since it only works if the value is bigger or equal to 10.000.

You can see this change here.

Unsurprisingly, very small performance improvements there:

bench name current std with #128204 diff
bench_std_fmt::bench_i16_0 16.45 ns/iter (+/- 0.25) 16.50 ns/iter (+/- 0.15) 0.3%
bench_std_fmt::bench_i16_max 17.83 ns/iter (+/- 0.66) 17.58 ns/iter (+/- 0.10) -1.4%
bench_std_fmt::bench_i16_min 20.97 ns/iter (+/- 0.49) 20.50 ns/iter (+/- 0.28) -2.3%
bench_std_fmt::bench_i32_0 16.63 ns/iter (+/- 0.06) 16.62 ns/iter (+/- 0.07) -0.1%
bench_std_fmt::bench_i32_max 19.79 ns/iter (+/- 0.43) 19.55 ns/iter (+/- 0.14) -1.2%
bench_std_fmt::bench_i32_min 22.97 ns/iter (+/- 0.50) 22.08 ns/iter (+/- 0.08) -4.0%
bench_std_fmt::bench_i64_0 16.63 ns/iter (+/- 0.39) 16.69 ns/iter (+/- 0.44) 0.4%
bench_std_fmt::bench_i64_half 19.60 ns/iter (+/- 0.05) 19.10 ns/iter (+/- 0.05) -2.6%
bench_std_fmt::bench_i64_max 25.22 ns/iter (+/- 0.34) 24.43 ns/iter (+/- 0.02) -3.2%
bench_std_fmt::bench_i8_0 16.27 ns/iter (+/- 0.32) 15.80 ns/iter (+/- 0.17) -3.0%
bench_std_fmt::bench_i8_max 16.71 ns/iter (+/- 0.09) 16.25 ns/iter (+/- 0.01) -2.8%
bench_std_fmt::bench_i8_min 20.07 ns/iter (+/- 0.22) 19.80 ns/iter (+/- 0.30) -1.4%
bench_std_fmt::bench_u128_0 21.37 ns/iter (+/- 0.24) 21.35 ns/iter (+/- 0.35) -0.1%
bench_std_fmt::bench_u128_max 48.13 ns/iter (+/- 0.20) 48.78 ns/iter (+/- 0.29) 1.3%
bench_std_fmt::bench_u16_0 16.48 ns/iter (+/- 0.46) 16.03 ns/iter (+/- 0.39) -2.8%
bench_std_fmt::bench_u16_max 17.31 ns/iter (+/- 0.32) 17.41 ns/iter (+/- 0.32) 0.6%
bench_std_fmt::bench_u16_min 16.40 ns/iter (+/- 0.45) 16.02 ns/iter (+/- 0.39) -2.4%
bench_std_fmt::bench_u32_0 16.17 ns/iter (+/- 0.04) 16.29 ns/iter (+/- 0.16) 0.7%
bench_std_fmt::bench_u32_max 19.00 ns/iter (+/- 0.10) 19.16 ns/iter (+/- 0.28) 0.8%
bench_std_fmt::bench_u32_min 16.16 ns/iter (+/- 0.09) 16.28 ns/iter (+/- 0.11) 0.7%
bench_std_fmt::bench_u64_0 16.22 ns/iter (+/- 0.22) 16.14 ns/iter (+/- 0.18) -0.5%
bench_std_fmt::bench_u64_half 19.25 ns/iter (+/- 0.07) 18.95 ns/iter (+/- 0.05) -1.6%
bench_std_fmt::bench_u64_max 24.31 ns/iter (+/- 0.08) 24.18 ns/iter (+/- 0.08) -0.5%
bench_std_fmt::bench_u8_0 15.76 ns/iter (+/- 0.08) 15.66 ns/iter (+/- 0.08) -0.6%
bench_std_fmt::bench_u8_max 16.53 ns/iter (+/- 0.03) 16.29 ns/iter (+/- 0.02) -1.5%
bench_std_fmt::bench_u8_min 15.77 ns/iter (+/- 0.06) 15.67 ns/iter (+/- 0.02) -0.6%

It was followed by a small cleanup which simplified the implementations in #133247. In short: instead of having the same implementation for signed and unsigned integers, we convert signed into unsigned and then call the unsigned formatting.

Lastly there was #135265 which made the following improvements (quoting the pull request author):

The speed gain is because (1) the loop decision no longer depends on the digit calculation, (2) it produces fewer instructions, and (3) it no longer needs a check for special case zero, which has a "significant" leading zero.

So in short:

Sadly there is no benchmark numbers I can show off...

Improving the conversion itself (128 bits)

Let's start with #136594. It did the following improvements:

About the second one, before this pull request we used pointers like this:

Runlet buf_ptr = MaybeUninit::slice_as_mut_ptr(buf);
*buf_ptr.add(curr) = nb + b'0';

Now we do like this:

Runbuf[curr].write(nb + b'0');

And here are the results:

bench name current std with #128204 diff
fmt::write_u128_max 51.44 ns/iter (+/- 0.44) 46.65 ns/iter (+/- 0.39) -11.8%
fmt::write_u128_min 30.63 ns/iter (+/- 0.92) 26.77 ns/iter (+/- 0.58) -15.6%

There was also a small cleanup to replace the manually written number of digits of u128 (39) with .ilog(10) done in #142114.

Adding extra tests

Working on performance is nice, but having tests to ensure we didn't break the world is always a good thing to have. We added new tests in #138546 to ensure that the formatting was working as expected.

What's next?

Before all these changes, if you wanted to have the fastest integer to string conversion possible in Rust, you needed to use the itoa crate. Now, the main difference is that it uses a buffer you need to provide everytime to store the generated string, which allows to skip the buffer allocations.

Such an API is planned to be added to the core library. A pull request (#142098) is already open to implement it. The new code will look like this:

Runuse core::fmt::NumBuffer;

let nb = 0u32;
let mut buf = NumBuffer::new();
let string_repr: &str = n.format_into(&mut buf);

You can reuse buf as many times as you want as long as it's for the same integer size.

Once all these pull requests will be merged, I think we'll be able to have some very nice performance for integer to string conversions in Rust.

What remains to be done

The Display trait implementation for integers is still slow because of all the checks (as a reminder, you can see the code here). Improving this part seems like the logical next step. Hopefully someone will get to that at some point. :)

Words of the end

All this was done thanks(?) to my relaxing cat:


my cat sleeping on a bench in front of a pond
Posted on the 19/06/2025 at 15:30 by @GuillaumeGomez

Previous article

Rustdoc merged doctests (solved) issue on stable
Back to articles list
RSS feedRSS feed