read.csv in R doesn't import all rows from csv file

Question

I have a comma separated dataset of around 10,000 rows. When doing read.csv, R created a dataframe rows lesser than the original file. It excluded/rejected 200 rows. When I open the csv file in Excel, the file looks okay. The file is well formatted for line delimiters and also field delimiters (as per parsing done by Excel).

I have identified the row numbers in my file which are getting rejected but I can't identify the cause by glancing over them.

Is there any way to look at logs or something which includes reason why R rejected these records?

Jan van der Laan · Accepted Answer

The OP indicates that the problem is caused by quotes in the CSV-file.

When the records in the CSV-file are not quoted, but only a few records contain quotes. The file can be opened using the quote="" option in read.csv. This disables quotes.

data <- read.csv(filename, quote="")

Another solution is to remove all quotes from the file, but this will also result in modified data (your strings don't contain any quotes anymore) and will give problems of your fields contain comma's.

lines <- readLines(filename)
lines <- gsub('"', '', lines, fixed=TRUE)
data <- read.csv(textConnection(lines))

A slightly more safe solution, which will only delete quotes when not just before or after a comma:

lines <- readLines(filename)
lines <- gsub('([^,])"([^,])', '\1""\2', lines)
data <- read.csv(textConnection(lines))

Adnan Khalid · Answer

I had same issue where difference between number of rows present in csv file and number of rows read by read.csv() command was significant. I used fread() command from data.table package in place of read.csv and it solved the problem.

read.csv in R doesn't import all rows from csv file

Tags:

r

user3422637

2 Answers

Jan van der Laan

Adnan Khalid

Recent Activity

Donate For Us

read.csv in R doesn't import all rows from csv file

Tags:

r

user3422637

2 Answers

Jan van der Laan

Adnan Khalid

Related questions

Recent Activity

Donate For Us