Speeding up Scurry’s constructed-in JSON encoder for substantial arrays of objects

33

Printed on March 3, 2022 by Phil Eaton

runjson

I became once having a scrutinize into just a few of
octosql‘s benchmarks the different
day and seen a substantial chunk of time in
DataStation/dsq
is spent in encoding JSON objects. JSON is an intermediate layout in
DataStation and it is ravishing inefficient. However the cause it is dilapidated is
in consequence of nearly every scripting language supported by DataStation has a
builtin library for studying/writing JSON.

All code for these benchmarks are accessible on
Github
.

The ensuing JSON encoder library would per chance be accessible on Github.

Worthwhile datasets

I threw collectively a rapid CLI for producing faux details,
fakegen. And generated
two datasets: one with 20 columns and 1M rows, and one with 1K columns
and 10K rows.

$ mkdir -p ~/tmp/benchmarks && cd ~/tmp/benchmarks
$ run set up github.com/multiprocessio/fakegen@most modern
$ fakegen --rows 100000 --cols 20> long.json
$ fakegen --rows 10000 --cols 1000> large.json
$ ls -lah *.json
-rw-r--r-- 1 phil phil 1.2G Mar  3 15: 42 long.json
-rw-r--r-- 1 phil phil 1.6G Mar  3 15: 44 large.json
$ wc *.json
wc *.json
   1999999  114514109 1214486728 long.json
     19999  213613856 1666306735 large.json

A benchmark program

Then I started having a scrutinize into what Scurry’s JSON encoder is de facto
spending time doing.

First I wrote a program that reads and decodes a JSON file, picks an
encoder (comely the present library encoder for now), and encodes the
JSON object help into one other file. I dilapidated
pkg/profile to simplify the system
of hooking into pprof so that I could per chance per chance gather a CPU profile of execution.

$ run mod init main
$ cat main.run
kit main

import (
        "encoding/json"
        "os"

        "github.com/pkg/profile"
)

func stdlibEncoder(out *os.File, obj interface{}) error {
        encoder :=json.NewEncoder(out)
        return encoder.Encode(obj)
}

func main() {
        var in string
        encoderArg :="stdlib"
        encoder :=stdlibEncoder

        for i, arg :=range os.Args {
                if arg=="--in" {
                        in=os.Args[i+1]
                        i +=1
                        proceed
                }

                if arg=="--encoder" {
                        encoderArg=os.Args[i+1]
                        swap encoderArg {
                        case "stdlib":
                                encoder=stdlibEncoder
                        default:
                                dismay("Unknown encoder: " + encoderArg)
                        }
                        i +=1
                        proceed
                }
        }

        fr, err :=os.Delivery(in + ".json")
        if err !=nil {
                dismay(err)
        }
        defer fr.End()

        decoder :=json.NewDecoder(fr)
        var o interface{}
        err=decoder.Decode(&o)
        if err !=nil {
                dismay(err)
        }

        fw, err :=os.OpenFile(in+"-"+encoderArg+".json", os.O_TRUNC|os.O_WRONLY|os.O_CREATE, os.ModePerm)
        if err !=nil {
                dismay(err)
        }
        defer fw.End()

        p :=profile.Delivery()
        defer p.Pause()
        err=encoder(fw, o)
        if err !=nil {
                dismay(err)
        }
}

Compile and flee it:

$ run mod aesthetic
$ run fabricate -o main main.run
$ ./main --in long
2022/03/03 15: 49: 00 profile: cpu profiling enabled, /tmp/profile2956118756/cpu.pprof
2022/03/03 15: 49: 08 profile: cpu profiling disabled, /tmp/profile2956118756/cpu.pprof

Examining pprof outcomes

Now we are able to flee run instrument pprof by disagreement profile to seek the set we’re
spending essentially the most time:

$ run instrument pprof -top /tmp/profile2956118756/cpu.pprof
File: main
Model: cpu
Time: Mar 3, 2022 at 3: 49pm (UTC)
Duration: 8.15s, Total samples=9.66s (118.54%)
Displaying nodes accounting for 8.75s, 90.58% of 9.66s total
Dropped 95 nodes (cum 

Roughly 8.2 seconds. Now let’s also flee against the large JSON dataset
and profile that consequence.

$ ./main --in large
2022/03/03 15: 50: 30 profile: cpu profiling enabled, /tmp/profile800187419/cpu.pprof
2022/03/03 15: 50: 36 profile: cpu profiling disabled, /tmp/profile800187419/cpu.pprof
$ run instrument pprof -top /tmp/profile800187419/cpu.pprof
File: main
Model: cpu
Time: Mar 3, 2022 at 3: 50pm (UTC)
Duration: 6.36s, Total samples=7.11s (111.88%)
Displaying nodes accounting for six.67s, 93.81% of seven.11s total
Dropped 61 nodes (cum 

Roughly 6.4 seconds.

Sorting object keys

Now one thing we notice is that it spends a good chunk of time in sort
functions. In fact, it is hardcoded in Go’s JSON implementation to
require sorting of object keys. The most common data in DataStation is
arrays of objects representing rows of data. This JSON is an
intermediate representation so there’s no value in DataStation/dsq to
having keys sorted.

Could we improve performance if we wrote a specialization of the
builtin JSON library that skips sorting JSON object keys if the
overall object to be written is an array of objects? We’ll only care
about the top-level object keys. If there are nested objects we won’t
bother about that. Within nested objects we’ll just use the existing
Go JSON encoder. Having a fallback and internally using the Go JSON
encoder makes this a pretty safe and simple approach.

Let’s try a basic implementation.

$ cp main.go nosort.go
$ diff -u main.go nosort.go
--- main.go     2022-03-03 14: 25: 01.530812750 +0000
+++ nosort.go   2022-03-03 18: 58: 35.227829357 +0000
@@ -2,11 +2,94 @@

 import (
        "encoding/json"
+       "log"
        "os"
+       "strconv"

        "github.com/pkg/profile"
 )

+func nosortEncoder(out *os.File, obj interface{}) error {
+       a, ok :=obj.([]interface{})
+       // Fall back to normal encoder
+       if !ok {
+               log.Println("Falling back to stdlib")
+               return stdlibEncoder(out, obj)
+       }
+
+       _, err :=out.Write([]byte("["))
+       if err !=nil {
+               return err
+       }
+
+       for i, row :=range a {
+               // Write a comma before the current object
+               if i> 0 {
+                       _, err=out.Write([]byte(",n"))
+                       if err !=nil {
+                               return err
+                       }
+               }
+
+               r, good sufficient :=row.(plot[string]interface{})
+               if !good sufficient {
+                       log.Println("Falling help to stdlib")
+                       bs, err :=json.Marshal(row)
+                       if err !=nil {
+                               return err
+                       }
+
+                       _, err=out.Write(bs)
+                       if err !=nil {
+                               return err
+                       }
+
+                       proceed
+               }
+
+               _, err :=out.Write([]byte("{"))
+               if err !=nil {
+                       return err
+               }
+
+               j :=-1
+               for col, val :=range r {
+                       j +=1
+
+                       // Write a comma sooner than the contemporary key-payment
+                       if j> 0 {
+                               _, err=out.Write([]byte(","))
+                               if err !=nil {
+                                       return err
+                               }
+                       }
+
+                       _, err=out.Write([]byte(strconv.QuoteToASCII(col) + ":"))
+                       if err !=nil {
+                               return err
+                       }
+
+                       bs, err :=json.Marshal(val)
+                       if err !=nil {
+                               return err
+                       }
+
+                       _, err=out.Write(bs)
+                       if err !=nil {
+                               return err
+                       }
+               }
+
+               _, err=out.Write([]byte("}"))
+               if err !=nil {
+                       return err
+               }
+       }
+
+       _, err=out.Write([]byte("]"))
+       return err
+}
+
 func stdlibEncoder(out *os.File, obj interface{}) error {
        encoder :=json.NewEncoder(out)
        return encoder.Encode(obj)
@@ -29,6 +112,8 @@
                        swap encoderArg {
                        case "stdlib":
                                encoder=stdlibEncoder
+                       case "nosort":
+                               encoder=nosortEncoder
                        default:
                                dismay("Unknown encoder: " + encoderArg)
                        }

Very easy code that comely does some form checking and mostly spends
time writing JSON wrapper syntax, with calls to Scurry’s builtin JSON
library inner of objects. The exclusively funky thing in there you might per chance per chance per chance per chance simply
see is the strconv.QuoteToAscii name. This merely quotes and
escapes nested quotes. Here is extreme since escaped nested quotes
are tremendous inner a JSON object key.

Let’s fabricate and flee, passing the contemporary encoder title to main.

$ run fabricate -o main nosort.run
$ ./main --in large --encoder nosort
2022/03/03 15: 53: 51 profile: cpu profiling enabled, /tmp/profile1940788787/cpu.pprof
2022/03/03 15: 54: 40 profile: cpu profiling disabled, /tmp/profile1940788787/cpu.pprof

Roughly 49 seconds. Woah. That’s contrivance slower than the builtin JSON
library. But let’s dig in with pprof to mark why. Since we
must be doing precisely what the Scurry library does but no longer sorting, it
mustn’t be that you simply might be ready to mediate of that we’re slower.

$ run instrument pprof -top /tmp/profile1940788787/cpu.pprof
File: main
Model: cpu
Time: Mar 3, 2022 at 3: 53pm (UTC)
Duration: 48.86s, Total samples=47.64s (97.51%)
Displaying nodes accounting for 45.44s, 95.38% of 47.64s total
Dropped 87 nodes (cum 

Buffered I/O

Ok so in this case we spend a huge amount of time in the write
syscall. The traditional way to get around this is to used buffered IO
so you’re not actually calling the write syscall all the time. Let’s
give that a shot.

$ cp nosort.go bufio.go
$ diff -u nosort.go bufio.go
--- nosort.go   2022-03-03 18: 58: 35.227829357 +0000
+++ bufio.go    2022-03-03 19: 02: 03.913590177 +0000
@@ -1,6 +1,7 @@
 package main

 import (
+       "bufio"
        "encoding/json"
        "log"
        "os"
@@ -17,7 +18,9 @@
                return stdlibEncoder(out, obj)
        }

-       _, err :=out.Write([]byte("["))
+       bo :=bufio.NewWriter(out)
+       defer bo.Flush()
+       _, err :=bo.Write([]byte("["))
        if err !=nil {
                return err
        }
@@ -25,7 +28,7 @@
        for i, row :=range a {
                // Write a comma before the current object
                if i> 0 {
-                       _, err=out.Write([]byte(",n"))
+                       _, err=bo.Write([]byte(",n"))
                        if err !=nil {
                                return err
                        }
@@ -39,15 +42,14 @@
                                return err
                        }

-                       _, err=out.Write(bs)
+                       _, err=bo.Write(bs)
                        if err !=nil {
                                return err
                        }
-
                        proceed
                }

-               _, err :=out.Write([]byte("{"))
+               _, err :=bo.Write([]byte("{"))
                if err !=nil {
                        return err
                }
@@ -58,13 +60,13 @@

                        // Write a comma sooner than the contemporary key-payment
                        if j> 0 {
-                               _, err=out.Write([]byte(","))
+                               _, err=bo.Write([]byte(","))
                                if err !=nil {
                                        return err
                                }
                        }

-                       _, err=out.Write([]byte(strconv.QuoteToASCII(col) + ":"))
+                       _, err=bo.Write([]byte(strconv.QuoteToASCII(col) + ":"))
                        if err !=nil {
                                return err
                        }
@@ -74,19 +76,19 @@
                                return err
                        }

-                       _, err=out.Write(bs)
+                       _, err=bo.Write(bs)
                        if err !=nil {
                                return err
                        }
                }

-               _, err=out.Write([]byte("}"))
+               _, err=bo.Write([]byte("}"))
                if err !=nil {
                        return err
                }
        }

-       _, err=out.Write([]byte("]"))
+       _, err=bo.Write([]byte("]"))
        return err
 }

Own it and flee it:

$ run fabricate -o main bufio.run
$ ./main --in large --encoder nosort
2022/03/03 19: 11: 12 profile: cpu profiling enabled, /tmp/profile1195717494/cpu.pprof
2022/03/03 19: 11: 19 profile: cpu profiling disabled, /tmp/profile1195717494/cpu.pprof

Roughly 7 seconds. Not nasty down from 49 seconds! But let’s watch the set
we’re spending time now.

$ run instrument pprof -top /tmp/profile1195717494/cpu.pprof
File: main
Model: cpu
Time: Mar 3, 2022 at 7: 11pm (UTC)
Duration: 6.41s, Total samples=6.26s (97.60%)
Displaying nodes accounting for 5.79s, 92.49% of 6.26s total
Dropped 47 nodes (cum 

=code>

Read More

Vanic
WRITTEN BY

Vanic

β€œSimplicity, patience, compassion.
These three are your greatest treasures.
Simple in actions and thoughts, you return to the source of being.
Patient with both friends and enemies,
you accord with the way things are.
Compassionate toward yourself,
you reconcile all beings in the world.”
― Lao Tzu, Tao Te Ching