SQL query engine for text files on Linux?

Question

We use grep, cut, sort, uniq, and join at the command line all the time to do data analysis. They work great, although there are shortcomings. For example, you have to give column numbers to each tool. We often have wide files (many columns) and a column header that gives column names. In fact, our files look a lot like SQL tables. I'm sure there is a driver (ODBC?) that will operate on delimited text files, and some query engine that will use that driver, so we could just use SQL queries on our text files. Since doing analysis is usually ad hoc, it would have to be minimal setup to query new files (just use the files I specify in this directory) rather than declaring particular tables in some config.

Practically speaking, what's the easiest? That is, the SQL engine and driver that is easiest to set up and use to apply against text files?

ephemient · Accepted Answer

David Malcolm wrote a little tool named "squeal" (formerly "show"), which allows you to use SQL-like command-line syntax to parse text files of various formats, including CSV.

An example on squeal's home page:

$ squeal "count(*)", source from /var/log/messages* group by source order by "count(*)" desc
count(*)|source              |
--------+--------------------+
1633    |kernel              |
1324    |NetworkManager      |
98      |ntpd                |
70      |avahi-daemon        |
63      |dhclient            |
48      |setroubleshoot      |
39      |dnsmasq             |
29      |nm-system-settings  |
27      |bluetoothd          |
14      |/usr/sbin/gpm       |
13      |acpid               |
10      |init                |
9       |pcscd               |
9       |pulseaudio          |
6       |gnome-keyring-ask   |
6       |gnome-keyring-daemon|
6       |gnome-session       |
6       |rsyslogd            |
5       |rpc.statd           |
4       |vpnc                |
3       |gdm-session-worker  |
2       |auditd              |
2       |console-kit-daemon  |
2       |libvirtd            |
2       |rpcbind             |
1       |nm-dispatcher.action|
1       |restorecond         |

bambam · Answer

q - Run SQL directly on CSV or TSV files:

https://github.com/harelba/q

SQL query engine for text files on Linux?

Tags:

text

sql

command-line

mcassano

2 Answers

ephemient

bambam

Recent Activity

Donate For Us

SQL query engine for text files on Linux?

Tags:

text

sql

command-line

mcassano

2 Answers

ephemient

bambam

Related questions

Recent Activity

Donate For Us