I came across a file like this:
COL1 COL2 COL3
weqw asrg qerhqetjw
weweg ethweth rqerhwrtjw
rhqerhqerhq qergqer qerhqew5h
qerh qergqer wetjwryerj
I could not load it directly with fread
so I replaced \s+
by ,
with sed
than I gave to fread and it solved it. But is there a built in way of reading this kind of data with data.table
?
fread
does not (yet) have any capabilities for reading fixed-width files.
I, too, often come across files annoyingly stored like this. Feel free to add a feature request on the Github page.
It may not be so in your case, but your solution with sed
would not work on a lot of FWF I come across because there's no space between columns, e.g. you'll see strings like 00010 that actually comprise 3 fields.
If that's the case, you'll need a field width dictionary, at which point you have several options:
read.fwf
within R
fwf
->csv
program (I use one I wrote in Python
and it's pretty fast, could share the code if you'd like)--basically the beefed up version of your initial approach, so that you never have to deal with the FWF againI personally stick with the second option most often. read.fwf
is not optimized like fread
so it will probably be slow. And if you've got a lot (say 20+) of FWF to read, the 3rd option is pretty tedious.
But I agree it would be nice to have something like this built in to fread
.
Fixed in current devel (v1.9.5) recently. Please upgrade and test (and report if any issues).
require(data.table) # v1.9.5+
fread("~/Downloads/tmp.txt")
# COL1 COL2 COL3
# 1: weqw asrg qerhqetjw
# 2: weweg ethweth rqerhwrtjw
# 3: rhqerhqerhq qergqer qerhqew5h
# 4: qerh qergqer wetjwryerj
fread()
gained strip.white
argument (default=TRUE
) amidst other arguments. Please check README
on project page for up-to-date NEWS.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With