Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading in a file - Warning Message

I have a file that has 22268 rows BY 2521 columns. When I try to read in the file using this line of code:

file <- read.table(textfile, skip=2, header=TRUE, sep="\t", fill=TRUE, blank.lines.skip=FALSE)

But I only get 13024 rows BY 2521 columns read in and the following error:

Warning message: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : number of items read is not a multiple of the number of columns

I also used this command to see what rows had an incorrect number of columns:

x <-count.fields(textfile, sep="\t", skip=2)
incorrect <- which(x != 2521)

and got back a list of about 20 rows that were incorrect.

Is there a way to fill these rows with NA values?

I thought that is what the "fill" parameter does in the read.table function, but it doesn't appear so.

OR

Is there a way to ignore these rows that are identified in the "incorrect" variable?

like image 305
Sheila Avatar asked Dec 03 '12 23:12

Sheila


1 Answers

you can use readLines() to input the data, then find the offending rows.

    con <- file("path/to/file.csv", "rb")
    rawContent <- readLines(con) # empty
    close(con)  # close the connection to the file, to keep things tidy

then take a look at rawContent

To find the rows with an incorrect number of columns, for example:

    expectedColumns <- 2521
    delim <- "\t"

    indxToOffenders <-
    sapply(rawContent, function(x)   # for each line in rawContent
        length(gregexpr(delim, x)[[1]]) != expectedColumns   # count the number of delims and compare that number to expectedColumns
    ) 

Then to read in your data:

  myDataFrame <- read.csv(rawContent[-indxToOffenders], header=??, sep=delim)
like image 155
Ricardo Saporta Avatar answered Nov 04 '22 10:11

Ricardo Saporta