When I use R data.table(fread) to read dat file (3GB) a problem occurs:
Stopped early on line 3169933. Expected 136 fields but found 138. Consider fill=TRUE and comment.char=. First discarded non-empty line:

My code:
library(data.table)
file_path = 'data.dat' # 3GB
fread(file_path,fill=TRUE)
The problem is that my file has ~ 5 million rows. In detail:
fread() only reads my file to row 3169933 due to this error. fill = TRUE did not help in this case. Could anyone help me ?
R version: 3.6.3 data.table version: 1.13.2
Note about fill=TRUE in this case:
[Case 1- not my case] if part 1 of my file (50% rows) have 138 columns and part 2 have 136 columns then the fill=TRUE will help (it will fill two column in part 2 with NA)
[Case 2- my case] if part 1 of my file (50% rows) have 136 columns and part 2 have 138 columns then the fill =TRUE will not help in this case.
Not sure why you still have the problem even with fill=T... But if nothing helps, you can try playing with something like this:
tryCatch(
expr = {dt1 <<- fread(file_path)},
warning = function(w){
cat('Warning: ', w$message, '\n\n');
n_line <- as.numeric(gsub('Stopped early on line (\\d+)\\..*','\\1',w$message))
if (!is.na(n_line)) {
cat('Found ', n_line,'\n')
dt1_part1 <- fread(file_path, nrows=n_line)
dt1_part2 <- fread(file_path, skip=n_line)
dt1 <<- rbind(dt1_part1, dt1_part2, fill=T)
}
},
finally = cat("\nFinished. \n")
);
tryCatch() construct catches warning message so you can extract the line number and process it accordingly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With