I have a "csv" text file where each field is separated by \t&%$#
which I'm now trying to import into R.
The sep=
argument of read.table()
instists on a single character. Is there a quick way to directly import this file?
Some of the data fields are user-submitted text which contain tabs, quotes, and other messy stuff, so changing the delimiter to something simpler seems like it could create other problems.
The following code will be able to handle multiple separator chars:
#fileName <- file name with fully qualified path
#separators <- each of them separated by '|'
read <- function(fileName, separators) {
data <- readLines(con <- file(fileName))
close(con)
records <- sapply(data, strsplit, split=separators)
dataFrame <- data.frame(t(sapply(records,c)))
rownames(dataFrame) <- 1: nrow(dataFrame)
return(as.data.frame(dataFrame,stringsAsFactors = FALSE))
}
As explained in this post, it is not possible in R without resorting to string parsing. You can pre-parse your file in another language (Awk, Perl, Python etc.) or read it line-by-line and parse the resulting strings in R.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With