Show HN: Run SQL Queries Against JSON, CSV, Excel, Parquet, and More

65
Show HN: Run SQL Queries Against JSON, CSV, Excel, Parquet, and More

Install

Assemble Droop 1.17+ and then hasten:

$ proceed install github.com/multiprocessio/datastation/runner/cmd/dsq@most up-to-date

Utilization

You most likely can either pipe files to dsq or you would possibly possibly perhaps perhaps perhaps proceed a file name to it.

When piping files to dsq it be indispensable to specify the file extension or MIME fashion.

For instance:

$ cat testdata.csv | dsq csv "SELECT FROM {} LIMIT 1"

Or:

$ cat testdata.parquet | dsq parquet "SELECT COUNT(1) FROM {}"

If you are passing a file, it ought to possess the frequent extension for its
jabber fashion.

For instance:

10″”>
$ dsq testdata.json "SELECT FROM {} WHERE x > 10"

Or:

$ dsq testdata.ndjson "SELECT name, AVG(time) FROM {} GROUP BY name ORDER BY AVG(time) DESC"

Supported Recordsdata Kinds

Name File Extension(s) Notes
CSV csv
JSON json Favor to be an array of objects. Nested object fields are left out.
Newline-delimited JSON ndjson, jsonl
Parquet parquet
Excel xlsx, xls Within the intervening time only works if there would possibly possibly be exclusively one sheet.
Apache Error Logs text/apache2error Within the intervening time only works if being piped in.
Apache Entry Logs text/apache2access Within the intervening time only works if being piped in.
Nginx Entry Logs text/nginxaccess Within the intervening time only works if being piped in.

Engine

Below the hood dsq uses DataStation as a library and below that hood
DataStation uses SQLite to energy a majority of these SQL queries on
arbitrary (structured) files.

Comparisons

The chase column is basically basically based on rough benchmarks basically basically based on q’s
benchmarks
. Sooner or later
I will invent a extra thorough and public benchmark.

Name Link Droop Supported File Kinds Engine Maturity
q http://harelba.github.io/q/ Snappy CSV, TSV Uses SQLite Ragged
textql https://github.com/dinedal/textql Enough CSV, TSV Uses SQLite Ragged
octoql https://github.com/dice2222/octosql Dull JSON, CSV, Excel, Parquet Custom engine missing many parts from SQLite Ragged
dsq Here Enough CSV, JSON, Newline-delimited JSON, Parquet, Excel, Logs Uses SQLite Not ancient

License, enhance, community, whatnot

Perceive the repo’s foremost README.md for the particulars.

Join the pack! Join 8000+ others registered customers, and glean chat, originate groups, post updates and originate buddies all thru the world!
www.knowasiak.com/register

Ava Chan
WRITTEN BY

Ava Chan

I'm a researcher at Utokyo :) and a big fan of Ava Max