Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an equivalent of R’s data.table fread cmd keyword in pandas?

Tags:

One very useful aspect of R’s data.table reading workhorse “fread” is the “cmd” keyword with which one can programmatically build a shell command and pass it to fread to read the output of the shell command in as a data.table.

This is very powerful for interactive use as the command can be any string, e.g. an ssh one which will run on a remote host and can defer basic parsing to a simple grep/sec/awk all in one line while preventing the need for making temporary directories and files and taking additional steps to fetch remote files.

From what I can tell looking at the latest pandas docs there does not appear to be an equivalent in any of the pd.read_* methods. Is it not a feature? Is there maybe an easy equivalent people use instead?

like image 754
Palace Chan Avatar asked May 09 '20 21:05

Palace Chan


1 Answers

As @sammywemmy pointed out there are two alternatives. The first, and slightly more verbose one than the R equivalent is to use subprocess like this:

import pandas as pd, import subprocess
with subprocess.Popen("shell_cmd", shell=True, stdout=subprocess.PIPE) as p:
    df = pd.read_csv(p.stdout)

A more efficient and less verbose alternative is to use the datatable package and do something like this:

import datatable as dt
df = dt.fread(cmd="shell_cmd").to_pandas()

You can also opt to work natively with the datatable Frame type.

like image 94
Palace Chan Avatar answered Oct 02 '22 16:10

Palace Chan