I am using the fread function in R for reading files to data.tables objects.
However, when reading the file I'd like to skip lines that start with #, is that possible?
I could not find any mention to that in the documentation.
fread can read from a piped command that filters out such lines, like this:
fread("grep -v '^#' filename")
Not currently, but it's on the list to do.
Are the # lines at the top forming a header which is more than 30 lines long?
If so, that's come up before and the solution is :
fread("filename", autostart=60)
where 60 is chosen to be inside the block of data to be read.
From ?fread
:
Once the separator is found on line autostart, the number of columns is determined. Then the file is searched backwards from autostart until a row is found that doesn't have that number of columns. Thus, the first data row is found and any human readable banners are automatically skipped. This feature can be particularly useful for loading a set of files which may not all have consistently sized banners. Setting skip>0 overrides this feature by setting autostart=skip+1 and turning off the search upwards step.
The default autostart=30
might just need bumping up a bit in your case.
Or maybe skip=n
or skip="string"
helps :
If -1 (default) use the procedure described below starting on line autostart to find the first data row. skip>=0 means ignore autostart and take line skip+1 as the first data row (or column names according to header="auto"|TRUE|FALSE as usual). skip="string" searches for "string" in the file (e.g. a substring of the column names row) and starts on that line (inspired by read.xls in package gdata).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With