Some days ago I noticed that on a Mac, doing
snprintf calls from multiple threads shows
curious lack of scaling (see tweet).
snprintf with fmt library can speed up the OBJ exporter
in Blender 3.2 by 3-4 times. This could have been the end of the story, filed under a
“eh, sprintf is bad!” drawer, but I started to wonder why it shows this lack of scaling.
A simple test: convert two million integers into strings. And then try to do the same on multiple
threads at once, i.e. each thread converts the same two million integers. If the number of threads
is below the number of CPU cores, this should take about the same time – each thread would just
happily be converting their own numbers, and not interfere with the other threads.
Featured Content Adsadd advertising here
Yes the reality is more complicated, with CPU thermals, shared caches and whatnot coming into play,
but we’re interested in broad patterns, not exact science here!
And here’s what happens on an Apple M1 Max laptop (vertical axis is log scale):
Converting two million numbers into strings takes 100 milliseconds when one CPU core is doing it.
When all eight “performance” cores are doing it, it takes 1.8 seconds, or 18 times as long.
That’s, like, not great!
Yo dude, you should not use sprintf
“Well duh” you say, “obviously you should not use sprintf, you should use C++ iostreams”. Okay.
Here’s converting integers into strings via a