Please present: this code is composed alpha / pre-manufacturing. All the pieces right here ought to be thought-about preliminary.
Once you occur to admire ZSVlib, please give it a important person!
ZSVlib is a hasty CSV parser library. It achieves excessive efficiency the utilize of SIMD operations,
efficient reminiscence utilize and diversified optimization ways.
Preliminary efficiency results compare favorably vs diversified snappy CSV parsers.
The below were results on a pre-M1 OSX MBA; diversified results were generally an analogous even though on Windows
the adaptation became once distinguished smaller (~20%however composed the an analogous course):
Spy 12/19 update re M1 processor at https://github.com/liquidaty/zsv/blob/main/app/benchmark/README.md
zsv) is an extensible CSV utility, which uses ZSVlib,
for duties much like slicing and dicing, querying with SQL,
combining, converting, serializing, knocking down and extra.
ZSV is streamlined for easy vogue of custom dynamic extensions, one in every of which is
readily available right here and offers added facets much like statification and validation reporting,
automated column mapping and transformation, and github-admire capabilities for sharing
ZSVlib and ZSV are written in C, however since ZSVlib is a library, and ZSV
extensions are perfect shared libraries, you’d utilize ZSVlib with
your maintain code in any programming language, as prolonged because it has been compiled
into a shared library that implements the anticipated interface.
- Available as BOTH a library and an utility
- Commence-provide, permissively licensed
- Handles exact-world CSV the an analogous methodology that spreadsheet packages attain (including
edge circumstances). Gracefully handles (and would possibly maybe maybe “trim”) exact-world files that would also very successfully be
- Runs on OSX (examined on clang/gcc), Linux (gcc), Windows (mingw),
BSD (gcc-only) and in-browser (emscripten/wasm)
- Fleet (maybe the fastest ever?). Spy
- Low reminiscence utilization (no subject how extensive your files is)
- Easy to utilize as a library in a few strains of code
- Involves ZSV scream-line app with batteries:
- bear shut out, rely, sql predict, mutter, flatten, serialize and extra
- Easy to develop/customize zsv with a few strains of code by ability of modular scamper-in framework.
Appropriate write a few custom functions and compile into a distributable DLL that any existing zsv
installation can utilize
zsvare permissive licensed
- Coming presently!: free extension with added capabilities:
- generate multi-tab XLSX validation damage-out experiences
- generate multi-table XLSX or CSV stratifications
- automate column mapping and transformations
- bear, explain and fragment re-usable files domains the utilize of github-admire facets
Pre-constructed binaries for OSX, Windows and Linux are readily available at https://zsvhub.com/bag
zsv runs most intriguing– by a long way– as a desktop CLI. But, you’d furthermore strive an extended
ZSV model within the browser (even though it runs distinguished slower), at
https://zsvhub.com/playground. An tutorial that demonstrates a minute subset of the
capabilities of ZSV and the ZSVHub extension is right away available at
Why one other CSV parser / utility?
Our targets, which we were unable to bag in a pre-existing mission, are:
- Moderately excessive efficiency
- Available as each and every a library and a standalone executable / scream-line interface utility (CLI)
- Reminiscence-efficient, configurable resource limits
- Handles exact-world CSV circumstances the an analogous methodology that Excel does, including all edge circumstances
(quote handling, newline handling (either n or r), embedded newlines,
odd quoting (e.g. aaa”aaa,bbb…)
- Handles diversified “soiled” files factors:
- Assumes legitimate UTF8, however doesn’t misbehave if input incorporates awful UTF8
- Approach to specify multi-row headers
- Does now no longer buy or stop working within the case of inconsistent numbers of columns
- Easy to utilize library or lengthen/customize CLI
There are loads of safe tools that execute excessive efficiency. Among these we
thought-about were xsv and tsv-utils. While they met our efficiency
goal, each and every were designed basically as a utility and now no longer a library, and
were now no longer easy enough, for our wants, to customize. This became once because they were now no longer designed
for modular customizations that would also very successfully be maintained (or licensed) independently
of the associated mission (as well to to the fact that they were written in Rust
and D, respectively, which occur to be languages with which we lacked deep
skills). Others we thought-about were Miller (mlr), csvkit and Amble (csv module), which did now no longer meet our efficiency goal.
We furthermore thought-about various libraries the utilize of SIMD, however none appeared to (but) meet the “exact-world CSV” goal.
Therefore zsv became once created as a library and a versatile utility, each and every optimized for tempo
and ease of vogue for extending and/or customizing to your wants
ZSV comes with loads of constructed-in commands:
echo: be taught CSV from stdin and write it again out to stdout. That is basically
suitable for demonstrating how to utilize the API and furthermore how to bear a scamper-in,
and has some restricted utility past that e.g. for adding/putting off the UTF8 BOM,
or cleansing up awful UTF8
bear shut out: re-form CSV by skipping main rubbish, combining header rows into
a single header, deciding on or other than specified columns, putting off reproduction
columns, sampling, wanting and extra
sql: bustle ad-hoc SQL predict on a CSV file
desc: present a hasty description of your table files
most intriguing: format for console (mounted-width) present, or convert to markdown
2tsv: convert CSV to JSON or TSV
serialize(inverse of flatten): convert an NxM table to a single 3x (Nx(M-1))
table with columns: Row, Column Title, Column Tag
flatten(inverse of serialize): flatten a table by combining rows that fragment
a total worth in a specified identifier column
stack: merge CSV recordsdata vertically
Each of these can furthermore be constructed as an fair executable.
Building and installing the CLI
./configure && sudo model set up
Spy INSTALL.md for additional important facets.
To boot to the above extensions, on the very least one third-occasion extensions will most likely be made
readily available. Once you occur to are wanting so to add your extensions to this list, please contact the
Creating your maintain extension
You would possibly maybe well lengthen ZSV by offering a pre-compiled shared or static library that
defines the functions specified by
extension_template.h and which ZSV hundreds in
one in every of 3 systems:
- as a static library that’s statically linked at compile time
- as a dynamic library that’s linked at compile time and located in any
library search path
- as a dynamic library that’s located within the an analogous folder because the ZSV executable
and loaded at runtime if/as/when the custom mode is invoked
Instance and template
You would possibly maybe well invent and bustle a sample extension by working
model test from app/ext_example.
The simplest methodology to put into effect your maintain extension is to
reproduction and customize the template recordsdata in app/ext_template
Alpha release boundaries
This alpha release doesn’t but put into effect the stout vary of core facets
which would possibly maybe well maybe be deliberate for implementation earlier than beta release. Once you occur to are attracted to
serving to, please post an argument.
Seemingly subsequent steps:
- online “playground”
- optimize search; add search with hyperscan or re2 regex matching, maybe parallelize?
- auto-generated documentation, and greater documentation on the total
- Extra benchmarking. Would be mountainous to utilize https://bitbucket.org/ewanhiggs/csv-recreation/src/grasp/ as a springboard to benchmarking a series of various duties