One very useful aspect of R’s data.table reading workhorse “fread” is the “cmd” keyword with which one can programmatically build a shell command and pass it to fread to read the output of the shell command in as a data.table.
This is very powerful for interactive use as the command can be any string, e.g. an ssh one which will run on a remote host and can defer basic parsing to a simple grep/sec/awk all in one line while preventing the need for making temporary directories and files and taking additional steps to fetch remote files.
From what I can tell looking at the latest pandas docs there does not appear to be an equivalent in any of the pd.read_* methods. Is it not a feature? Is there maybe an easy equivalent people use instead?
As @sammywemmy pointed out there are two alternatives. The first, and slightly more verbose one than the R equivalent is to use subprocess
like this:
import pandas as pd, import subprocess
with subprocess.Popen("shell_cmd", shell=True, stdout=subprocess.PIPE) as p:
df = pd.read_csv(p.stdout)
A more efficient and less verbose alternative is to use the datatable
package and do something like this:
import datatable as dt
df = dt.fread(cmd="shell_cmd").to_pandas()
You can also opt to work natively with the datatable
Frame type.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With