Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

read.csv vs. read.table

I have seen in several cases that while read.table() is not able to read a tab delimited file (for example the annotation table of a microarray) returning the following error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
line xxx did not have yyy elements

read.csv() works perfectly on the same file with no errors. I think also the speed of read.csv() is also higher than read.table().

Even more: read.table() is doing very crazy reading a file of me. It makes this error while reading line 100, but when I copy and paste lines 90 to 110 just after the head of the same file, it still makes error of line 100+21 (new lines copied at the beginning). If there is any problem with that line, why doesn't it report that error while reading the pasted line at the beginning? I confirm that read.csv() reads the same file with no error.

Do you have any idea of why read.table() is unable to read the same files that read.csv() works on it? Also is there any reason to use read.table() in any cases?

like image 908
Ali Avatar asked Oct 10 '12 21:10

Ali


People also ask

What is the difference between read table and read csv?

csv() as well as the read. csv2() function are almost identical to the read. table() function, with the sole difference that they have the header and fill arguments set as TRUE by default. Tip: if you want to learn more about the arguments that you can use in the read.

What does read csv mean?

csv() function in R Language is used to read “comma separated value” files. It imports data in the form of a data frame.

What is read table?

T = readtable( filename ) creates a table by reading column oriented data from a file. readtable determines the file format from the file extension: . txt , .

What is the difference between Read_table and read_csv in pandas?

The difference between read_csv() and read_table() is almost nothing. In fact, the same function is called by the source: read_csv() delimiter is a comma character. read_table() is a delimiter of tab \t .


2 Answers

read.csv is a fairly thin wrapper around read.table; I would be quite surprised if you couldn't exactly replicate the behaviour of read.csv by supplying the correct arguments to read.table. However, some of those arguments (such as the way that quotation marks or comment characters are handled) could well change the speed and behaviour of the function.

In particular, this is the full definition of read.csv:

function (file, header = TRUE, sep = ",", quote = "\"", dec = ".", 
    fill = TRUE, comment.char = "", ...) {
     read.table(file = file, header = header, sep = sep, quote = quote, 
        dec = dec, fill = fill, comment.char = comment.char, ...)
}

so as stated it's just read.table with a particular set of options.

As @Chase states in the comments below, the help page for read.table() says just as much under Details:

read.csv and read.csv2 are identical to read.table except for the defaults. They are intended for reading ‘comma separated value’ files (‘.csv’) or (read.csv2) the variant used in countries that use a comma as decimal point and a semicolon as field separator.

like image 100
Ben Bolker Avatar answered Sep 16 '22 14:09

Ben Bolker


Don't use read.table to read tab-delimited files, use read.delim. (It is just a thin wrapper around read.table but it sets the options to appropriate values)

like image 33
hadley Avatar answered Sep 17 '22 14:09

hadley