R read.csv "More columns than column names" error

Tags:

import

r

I have a problem when importing .csv file into R. With my code:

t <- read.csv("C:\\N0_07312014.CSV", na.string=c("","null","NaN","X"),
          header=T, stringsAsFactors=FALSE,check.names=F)

R reports an error and does not do what I want:

Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  more columns than column names

I guess the problem is because my data is not well formatted. I only need data from [,1:32]. All others should be deleted.

Data can be downloaded from: https://drive.google.com/file/d/0B86_a8ltyoL3VXJYM3NVdmNPMUU/edit?usp=sharing

Thanks so much！

778

asked Sep 10 '14 16:09

Vicki1227

1 Answers

That's one wonky CSV file. Multiple headers tossed about (try pasting it to CSV Fingerprint) to see what I mean.

Since I don't know the data, it's impossible to be sure the following produces accurate results for you, but it involves using readLines and other R functions to pre-process the text:

# use readLines to get the data
dat <- readLines("N0_07312014.CSV")

# i had to do this to fix grep errors
Sys.setlocale('LC_ALL','C')

# filter out the repeating, and wonky headers
dat_2 <- grep("Node Name,RTC_date", dat, invert=TRUE, value=TRUE)

# turn that vector into a text connection for read.csv
dat_3 <- read.csv(textConnection(paste0(dat_2, collapse="\n")),
                  header=FALSE, stringsAsFactors=FALSE)

str(dat_3)
## 'data.frame':    308 obs. of  37 variables:
##  $ V1 : chr  "Node 0" "Node 0" "Node 0" "Node 0" ...
##  $ V2 : chr  "07/31/2014" "07/31/2014" "07/31/2014" "07/31/2014" ...
##  $ V3 : chr  "08:58:18" "08:59:22" "08:59:37" "09:00:06" ...
##  $ V4 : chr  "" "" "" "" ...
## .. more
##  $ V36: chr  "" "" "" "" ...
##  $ V37: chr  "0" "0" "0" "0" ...

# grab the headers
headers <- strsplit(dat[1], ",")[[1]]

# how many of them are there?
length(headers)
## [1] 32

# limit it to the 32 columns you want (Which matches)
dat_4 <- dat_3[,1:32]

# and add the headers
colnames(dat_4) <- headers

str(dat_4)
## 'data.frame':    308 obs. of  32 variables:
##  $ Node Name         : chr  "Node 0" "Node 0" "Node 0" "Node 0" ...
##  $ RTC_date          : chr  "07/31/2014" "07/31/2014" "07/31/2014" "07/31/2014" ...
##  $ RTC_time          : chr  "08:58:18" "08:59:22" "08:59:37" "09:00:06" ...
##  $ N1 Bat (VDC)      : chr  "" "" "" "" ...
##  $ N1 Shinyei (ug/m3): chr  "" "" "0.23" "null" ...
##  $ N1 CC (ppb)       : chr  "" "" "null" "null" ...
##  $ N1 Aeroq (ppm)    : chr  "" "" "null" "null" ...
## ... continues

184

answered Oct 10 '22 05:10

hrbrmstr

Related questions
                            
                                Time-series histogram
                            
                                Loading/Reading data in R taking up too much memory
                            
                                In R, match function for rows or columns of matrix
                            
                                Convert anything that's not a number to blank
                            
                                Coloring line segments in ggplot2
                            
                                In R, how do I locally shuffle a vector's elements
                            
                                Print the Nth Row in a List of Data Frames
                            
                                Count the number of positive and negative numbers in a column
                            
                                How to install multicore package on R v3.1.2?
                            
                                Grabbing the last element of a vector
                            
                                Unlisting columns by groups
                            
                                Simultaneously subsetting and operating on a specific column of a data frame
                            
                                Create equidistant points along (Poly)line in R
                            
                                Smart way to chain ifelse statements?
                            
                                Extracting a certain substring (email address)
                            
                                how to aggregate this data in R
                            
                                r points in polygons
                            
                                In R how do I read a CSV file line by line and have the contents recognised as the correct data type?
                            
                                ggplot: relative frequencies of two groups
                            
                                How can I multiply vectors without a loop?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With