Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any way to force fread() of data.table not to stop on empty lines?

Tags:

r

data.table

(question is not relevant anymore, since new version of data.table of 25-NOV-2016 - see accepted answer below)

So, I have a table with some empty lines in the middle. When I try to open it with fread, it stops, saying Stopped reading at empty line 10006, but text exists afterwards (discarded). Is there any way to avoid this without changing the data file?

like image 374
Vasily A Avatar asked Nov 10 '13 20:11

Vasily A


3 Answers

Version 1.9.8 of data.table, released 25-NOV-2016, has a new blank.lines.skip option to skip blank lines.

text <- "1,a\n\n2,b\n3,c\n4,a\n\n5,b\n\n6,c"

library(data.table)
fread(text)
##    V1 V2
## 1:  2  b
## 2:  3  c
## 3:  4  a
## Warning message:
## In fread("1,a\n\n2,b\n3,c\n4,a\n\n5,b\n\n6,c") :
##   Stopped reading at empty line 6 but text exists afterwards (discarded): 5,b

fread(text, blank.lines.skip=TRUE)
##    V1 V2
## 1:  1  a
## 2:  2  b
## 3:  3  c
## 4:  4  a
## 5:  5  b
## 6:  6  c
like image 194
dnlbrky Avatar answered Nov 15 '22 13:11

dnlbrky


You can use the Windows findstr command to get rid of empty lines.

Example file "Data.txt".

1,a

2,b
3,c
4,a


5,b

6,c

Reproduces your error.

> dt <- fread("Data.txt")
Warning message:
In fread("Data.txt") :
Stopped reading at empty line 6 of file, but text exists afterwards (discarded): 5,b

But works when using Windows findstr directly in fread.

> require(data.table)
> dt <- fread('findstr "." Data.txt')

# > dt
#    V1 V2
# 1:  1  a
# 2:  2  b
# 3:  3  c
# 4:  4  a
# 5:  5  b
# 6:  6  c
like image 30
Bram Visser Avatar answered Nov 15 '22 15:11

Bram Visser


If anyone else is having a similar problem, I've noticed that data.table 1.10.4 (the current 2017 release I'm using) seems to produce empty line errors with some files if you don't explicitly state:

'strip.white = FALSE'

I was looking at what were obviously line errors in ~350 files I was trying to import. Some lines were broken across two rows in the originals and, since they contained different forms of information, fread was warning of class coercion issues for some of the columns. But I was simultaneously getting 'empty line' errors as well for almost every file, on different lines. I manually checked those in notepad++. Many times. There were no empty lines and there were remaining lines; lots of them. Tried working through the import arguments and disabling specifically strip.white removed the empty line warnings.

like image 2
bg49ag Avatar answered Nov 15 '22 14:11

bg49ag