Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Importing "csv" file with multiple-character separator to R?

Tags:

r

csv

read.table

I have a "csv" text file where each field is separated by \t&%$# which I'm now trying to import into R.

The sep= argument of read.table()instists on a single character. Is there a quick way to directly import this file?

Some of the data fields are user-submitted text which contain tabs, quotes, and other messy stuff, so changing the delimiter to something simpler seems like it could create other problems.

like image 994
Bryan Avatar asked Aug 12 '13 11:08

Bryan


2 Answers

The following code will be able to handle multiple separator chars:

#fileName <- file name with fully qualified path
#separators <- each of them separated by '|'

read <- function(fileName, separators) {
    data <- readLines(con <- file(fileName))
    close(con)
    records <- sapply(data, strsplit, split=separators)
    dataFrame <- data.frame(t(sapply(records,c)))
    rownames(dataFrame) <- 1: nrow(dataFrame)
    return(as.data.frame(dataFrame,stringsAsFactors = FALSE))
}
like image 97
Mafruz Zaman Avatar answered Nov 18 '22 23:11

Mafruz Zaman


As explained in this post, it is not possible in R without resorting to string parsing. You can pre-parse your file in another language (Awk, Perl, Python etc.) or read it line-by-line and parse the resulting strings in R.

like image 37
Doctor Dan Avatar answered Nov 18 '22 23:11

Doctor Dan