I have been trying to read a csv file into R, but it keeps cutting off. I think it might be due to the file encoding, but I'm not sure.
Here is the code I ran:
read.csv('crunchbase_companies_2.csv', fileEncoding="UTF-8", quote="")
I then get a warning message: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,: invalid input found on input connection
.
R reads the data, but only up to when it hits a special character and then stops. So I only end up with partial data in R. I pasted the data I get here: http://pastebin.com/EQLnXz2W. Note though it cuts off when it hits things like 'Ì'. So those characters are not in the sample data.
I have also checked the encoding in the terminal using file
. It returns Non-ISO extended-ASCII English text, with CR line terminators
.
What do I need to do to read the entire dataset?
I ran into a similar problem today and spent hours on it. I try to change encoding/fileEncoding, setlocal, and a couple of other things found here. But none of them work for me.
Eventually I found a non-English post (those people probably have more experience with this) and this trick:change the open model from "r" to "rb".
In my case, I use readLines, so it's
fileIn=file("userinfo.csv",open="rb",encoding="UTF-8")
lines = readLines(fileIn, n = rowPerRead, warn = FALSE)
I don't fully understand why, my guess is that the Unicode character is in Byte, so if it's not read by Byte, that big guy will just block the scan.
After hours struggling with a csv like this, experimenting with arguments to read.csv
like fileEncoding
and quote
I finally used read_csv
in the readr
package - simply with the default arguments - and it loaded everything perfectly straight away!
An unimaginative answer but worth trying before you attempt to reverse engineer the whole file yourself...
So while I don't quite know why, what ended up working is changing fileEncoding
to latin1
when calling the read.csv function.
This was mentioned in a different answer here. Somehow that's one thing I hadn't tried...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With