Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fread - skip lines starting with certain character - "#"

I am using the fread function in R for reading files to data.tables objects.

However, when reading the file I'd like to skip lines that start with #, is that possible?

I could not find any mention to that in the documentation.

like image 980
Dnaiel Avatar asked Sep 20 '13 15:09

Dnaiel


2 Answers

fread can read from a piped command that filters out such lines, like this:

fread("grep -v '^#' filename")
like image 93
malcook Avatar answered Nov 13 '22 16:11

malcook


Not currently, but it's on the list to do.

Are the # lines at the top forming a header which is more than 30 lines long?

If so, that's come up before and the solution is :

fread("filename", autostart=60)

where 60 is chosen to be inside the block of data to be read.

From ?fread :

Once the separator is found on line autostart, the number of columns is determined. Then the file is searched backwards from autostart until a row is found that doesn't have that number of columns. Thus, the first data row is found and any human readable banners are automatically skipped. This feature can be particularly useful for loading a set of files which may not all have consistently sized banners. Setting skip>0 overrides this feature by setting autostart=skip+1 and turning off the search upwards step.

The default autostart=30 might just need bumping up a bit in your case.

Or maybe skip=n or skip="string" helps :

If -1 (default) use the procedure described below starting on line autostart to find the first data row. skip>=0 means ignore autostart and take line skip+1 as the first data row (or column names according to header="auto"|TRUE|FALSE as usual). skip="string" searches for "string" in the file (e.g. a substring of the column names row) and starts on that line (inspired by read.xls in package gdata).

like image 35
Matt Dowle Avatar answered Nov 13 '22 16:11

Matt Dowle