read.csv vs. read.table

Tags:

I have seen in several cases that while read.table() is not able to read a tab delimited file (for example the annotation table of a microarray) returning the following error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
line xxx did not have yyy elements

read.csv() works perfectly on the same file with no errors. I think also the speed of read.csv() is also higher than read.table().

Even more: read.table() is doing very crazy reading a file of me. It makes this error while reading line 100, but when I copy and paste lines 90 to 110 just after the head of the same file, it still makes error of line 100+21 (new lines copied at the beginning). If there is any problem with that line, why doesn't it report that error while reading the pasted line at the beginning? I confirm that read.csv() reads the same file with no error.

Do you have any idea of why read.table() is unable to read the same files that read.csv() works on it? Also is there any reason to use read.table() in any cases?

908

asked Oct 10 '12 21:10

Ali

2 Answers

read.csv is a fairly thin wrapper around read.table; I would be quite surprised if you couldn't exactly replicate the behaviour of read.csv by supplying the correct arguments to read.table. However, some of those arguments (such as the way that quotation marks or comment characters are handled) could well change the speed and behaviour of the function.

In particular, this is the full definition of read.csv:

function (file, header = TRUE, sep = ",", quote = "\"", dec = ".", 
    fill = TRUE, comment.char = "", ...) {
     read.table(file = file, header = header, sep = sep, quote = quote, 
        dec = dec, fill = fill, comment.char = comment.char, ...)
}

so as stated it's just read.table with a particular set of options.

As @Chase states in the comments below, the help page for read.table() says just as much under Details:

read.csv and read.csv2 are identical to read.table except for the defaults. They are intended for reading ‘comma separated value’ files (‘.csv’) or (read.csv2) the variant used in countries that use a comma as decimal point and a semicolon as field separator.

100

answered Sep 16 '22 14:09

Ben Bolker

Don't use read.table to read tab-delimited files, use read.delim. (It is just a thin wrapper around read.table but it sets the options to appropriate values)

answered Sep 17 '22 14:09

hadley

Related questions
                            
                                How to copy an object's structure (but not the data)
                            
                                How do I retrieve a matrix column and row name by a matrix index value?
                            
                                ggplot 2 facet_grid "free_y" but forcing Y axis to be rounded to nearest whole number
                            
                                Functional way to stack list of 2d matrices into 3d matrix
                            
                                shiny fluidrow column white space
                            
                                Launching R help: Error in file(out, "wt") : cannot open the connection
                            
                                Rename list items
                            
                                How to plot a hybrid boxplot: half boxplot with jitter points on the other half?
                            
                                Cumulative count of unique values in R
                            
                                R: read.csv adding sub-script "X" in header
                            
                                Logistic regression - defining reference level in R
                            
                                sum two columns in R
                            
                                R - when trying to install package: InternetOpenUrl failed
                            
                                Draw a trend line using ggplot
                            
                                Creating a function in R with variable number of arguments,
                            
                                How to use cast on a data frame?
                            
                                Extract date elements from POSIXlt and put into data frame in R
                            
                                Strategies for reading in CSV files in pieces?
                            
                                ggplot centered names on a map
                            
                                Count number of zeros per row, and remove rows with more than n zeros

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

read.csv vs. read.table

Tags:

file-io

r

read.table

read.csv

Ali

People also ask

2 Answers

Ben Bolker

hadley

Recent Activity

Donate For Us