Curious lack of sprintf scaling

35
Curious lack of sprintf scaling

Some days ago I noticed that on a Mac, doing snprintf calls from multiple threads shows
curious lack of scaling (see tweet).
Replacing snprintf with fmt library can speed up the OBJ exporter
in Blender 3.2 by 3-4 times. This could have been the end of the story, filed under a
“eh, sprintf is bad!” drawer, but I started to wonder why it shows this lack of scaling.

Test case

A simple test: convert two million integers into strings. And then try to do the same on multiple
threads at once, i.e. each thread converts the same two million integers. If the number of threads
is below the number of CPU cores, this should take about the same time – each thread would just
happily be converting their own numbers, and not interfere with the other threads.

Yes the reality is more complicated, with CPU thermals, shared caches and whatnot coming into play,
but we’re interested in broad patterns, not exact science here!

And here’s what happens on an Apple M1 Max laptop (vertical axis is log scale):

Converting two million numbers into strings takes 100 milliseconds when one CPU core is doing it.
When all eight “performance” cores are doing it, it takes 1.8 seconds, or 18 times as long.
That’s, like, not great!

Yo dude, you should not use sprintf

“Well duh” you say, “obviously you should not use sprintf, you should use C++ iostreams”. Okay.
Here’s converting integers into strings via a std::stringstream

Read More

Vanic
WRITTEN BY

Vanic

“Simplicity, patience, compassion.
These three are your greatest treasures.
Simple in actions and thoughts, you return to the source of being.
Patient with both friends and enemies,
you accord with the way things are.
Compassionate toward yourself,
you reconcile all beings in the world.”
― Lao Tzu, Tao Te Ching