UCSB: Extending the Final Yahoo NoSQL Benchmark

UCSB: Extending the Final Yahoo NoSQL Benchmark

2021 became terrific for DataBase Management Tool and startups in customary.
Whereas classical SQL is panicked, the data-management market as a total is booming at 17% CAGR and may maybe well reach $150 Billion in 2026, in step with Gartner.
That and the hype allowed dozens of DBMS startups to raise extra capital final twelve months alone than of their whole preceding decade-long historical past.
For 13 companies in our outdated prolonged comparison, it meant swallowing $4.5 Billion of VC money.

With so many avid gamers and such high-stakes, there have to be were an outline metric – a technique to sort out the wheat from the chaff.
There are two:

  • YCSB: Yahoo Cloud Serving Benchmark,
  • TPC: Transaction processing Performance Council.

These duvet moderately tons of workloads.
The first is for Key-Designate Retail outlets (KVS), and the second is for largely SQL DBMS systems, built on high KVS.
So in case you are building a DBMS, it is shimmering to utilize each, one for the chronic data constructions and one for the greater-stage logic.
As expected, we use each and outperform moderately tons of avid gamers in each, but we will be capable to skip the TPC for now.

With spherical 4K ⭐ on GitHub, YCSB is the customary risk.
Prior to now, we comprise outdated it extensively, and our outdated article covers quite a bit we can skip this time:

  • How 🦄 are built on high of start-offer RocksDB and WiredTiger? bounce
  • The liquid-cooled 👹 monster hardware we use for benchmarking: right here
  • 100 MB, 1 GB, 10 GB and 100 GB outcomes right here

As we now comprise previously promised, we are befriend with expanded datasets and fresh optimizations, but they don’t appear to be accurate inner UnumDB!
After cautious overview, we determined to rewrite the genuine YCSB equipment, extending and updating it along the procedure!
Oh, and it’s start-offer – test it on GitHub 🤗
Whereas you accurate are making an are attempting to price the fresh outcomes – right here you jog.

UCSB Benchmark Duration for RocksDB, MongoDB and UnumDB

Overall, designing fresh benchmarks isn’t belief to be a appropriate tone.
Particularly in case you’re going to measure your individual (confidently upcoming) product, it makes it too easy to prioritize the operations you are appropriate at and lessen the others.
So we preserved the critical segment of YCSB – its canonical random key mills and the three most deceptive letters of the name 😅

We can impart about many issues, along with:

  • A benchmark for High-Performance Tool have to be High-Performance Tool in itself.
  • Monitoring hardware resource usage from a separate project, Valgrind fashion.
  • ACID guarantees and multithreading in Key-Designate Retail outlets.
  • Designate of running a DBMS in a Docker container.
  • SLC vs MLC vs TLC relation on DBMS flee.
  • 1 TB outcomes for RocksDB, UnumDB and the others.

If it sounds racy, let’s bounce in!

Performance is a Feature

The genuine YCSB became published over 10 years within the past and centered isolated DBMS applications.
These flee in a separate project, in a certain address feature and be in contact by sockets, in most cases by undeniable-text commands.
It became straightforward ample to be comprehensible and diverse ample to be broadly acceptable, so it took off.
Folk respect us comprise applied it to systems which will be great extra “low-stage” than, let’s insist Amazon DynamoDB, Apache Cassandra or ElasticSearch.

In these 10 years, the hardware has changed.
Let’s evaluate AMD CPUs from these two eras:

2012 2022
Top CPU Model Athlon II X4 651K EPYC 7773X
Lithography 32 nm 7 nm
TDP 100 Watt 280 Watt
Core Count 4 64
Clock Frequency 3.0 GHz 2.2 – 3.5 GHz
Cache Dimension 4 MB 804 MB
PCIe 20x Gen2 128x Gen4
PCIe Bandwidth 10 GB/s 256 GB/s
RAM 2x channel DDR3-1866 8x channel DDR4-3200
RAM Bandwidth 30 GB/s 204 GB/s

Undoubtedly, now no longer all of that theoretical bandwidth is continually accessible, but I grunt you don’t need cpu-world.com to agree that CPUs changed!

The an identical applies to SSDs and GPUs.
Storage-stage technologies are heavily underutilizing the latter.
The tool have to harness all of that flee and parallelism, but it’s easiest feasible in low-stage languages.

Java & Java-respect

All performant KVS are applied in C++ and YCSB is applied in Java.
This vogue, that you just’re going to need some originate of a “International Funcion Interface” to comprise interplay with the KVS.
This accurate now adds pointless work for our CPU, but it’s a minor arena when put next to leisure.

Instance 1

Every language and its ecosystem has moderately tons of priorities.
Java specializes within the simplicity of trend, whereas C++ trades it for greater efficiency.

inner most static String getRowKey(String db, String table, String key) {
    return db + ":" + table + ":" + key;

The above snippet is from the Apples & SnowFlakes FoundationDB adapter inner YCSB, but it’s an identical all the procedure by the total repo.
It’s accountable for producing keys for queries.
Right here’s what a most recent instructed C++ version would see respect:

auto get_row_key(std:: string_view db, std:: string_view table, std:: string_view key) {
    return std:: structure("{}:{}:{}", db, table, key);

My whole Java skills is ready 1 week long and came about over 10 years within the past.
So opt the next section with a grain of salt.

From Java 7 onwards, the Java String Pool lives within the Heap feature, which is rubbish silent by the JVM.
This code will develop a StringBuilder, a heap-allocated array of pointers to heap-allocated strings, later materializing within the final concatenated String.
Of route, on-heap over again.
And if we know something about High-Performance Computing, the heap is dear, but along with Garbage Series and multithreading, it becomes fully insupportable.
The an identical applies to the C++ version.
Yes, we are doing easiest 1 allocation there, but it is also too unhurried to be known as HPC.
We decide to interchange std::structure with std::format_to and export the cease end result accurate into a reusable buffer.

Instance 2

If one example is now no longer ample, below is the code snippet, which produces random integers before packing them into String key.

long nextLong(long itemcount) {
    // from "Fleet Generating Billion-Document Artificial Databases", Jim Grey et al, SIGMOD 1994
    if (itemcount != countforzeta) {
        synchronized (this) {
            if (itemcount > countforzeta) {

    double u = ThreadLocalRandom.fresh().nextDouble();
    double uz = u * zetan;

    if (uz  1.0)

Read More

Related Articles

Stripe Crypto

The crypto ecosystem and its regulatory outlook continue to evolve rapidly, and our feature availability varies by region and use case. Please see our crypto supportability page for more details on our current product availability. Fill out the form to tell us more about what you’re building so we can better understand how to support…

Create your crypto business with Stripe

The crypto ecosystem and its regulatory outlook continue to evolve rapidly, and our feature availability varies by region and use case. Please see our crypto supportability page for more details on our current product availability. Fill out the form to tell us more about what you’re building so we can better understand how to support…

Windows 11 Guide

A guide on setting up your Windows 11 Desktop with all the essential Applications, Tools, and Games to make your experience with Windows 11 great! Note: You can easily convert this markdown file to a PDF in VSCode using this handy extension Markdown PDF. Getting Started Windows 11 Desktop Bypass Windows 11’s TPM, CPU and…

What is money, anyway?

Published: March 2022 Money is a surprisingly complex subject. People spend their lives seeking money, and in some ways it seems so straightforward, and yet what humanity has defined as money has changed significantly over the centuries. How could something so simple and so universal, take so many different forms? Source of Icons: Flaticon It’s…