Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

'Embedded nul in string' error when importing csv with fread

I have a large file (3.5G) that I'm trying to import using data.table::fread.

It was originally created from an rpt file that was opened as text and saved as a CSV.

This has worked fine with smaller files (of the same type of data-same columns and all. This one is just for a longer timeframe and wider reach).

When I try and run

mydata <- fread("mycsv.csv")

I get the error:

Error in fread("mycsv.csv") : embedded nul in string: 'y\0e\0a\0r\0'

What does this mean?

like image 855
datahappy Avatar asked Mar 25 '14 18:03

datahappy


5 Answers

We can remove the null terminators on the command line using something like:

sed 's/\\0//g' mycsv.csv > mycsv.csv

Or as suggested by @marbel, fread allows you to pass the sed call inside the text. Such as:

fread("sed 's/\\0//g' mycsv.csv")
like image 69
Robert Krzyzanowski Avatar answered Oct 22 '22 13:10

Robert Krzyzanowski


In this case, you can use read.csv with fileEncoding of UTF-16LE rather than fread.

read.csv("mycsv.csv",fileEncoding="UTF-16LE")

Considering your data size, using read.csv would take a couple of minutes, but I think it is not a big deal.

like image 26
Fan Wang Avatar answered Oct 22 '22 14:10

Fan Wang


You can test this small function:

cleanFiles<-function(file,newfile){
  writeLines(iconv(readLines(file,skipNul = TRUE)),newfile)
}

It's work for me

like image 5
xrsousa Avatar answered Oct 22 '22 14:10

xrsousa


A non-technical way to solve this would be, to

  1. Open the problematic .csv

  2. Ctrl+A (Select all)

  3. Open new Excel sheet

  4. Right click and choose 'Paste as values'

  5. Save and use this file in place of original one.

Worked for me, and doesn't take much time.

like image 3
Pree Avatar answered Oct 22 '22 13:10

Pree


If you are seeing NUL (x00) characters in an ASCII file you can do this: data.table::fread(text = readLines(pathIn, skipNul = T), ...)

like image 2
Jim Cutler Avatar answered Oct 22 '22 14:10

Jim Cutler