I'm trying to read a table using fread. The txt file has text which look like:
"No","Comment","Type"
"0","he said:"wonderful|"","A"
"1","Pr/ "d/s". "a", n) ","B"
R codes I'm using is: dataset0 <- fread("data/test.txt", stringsAsFactors = F)
with the development version of data.table R package.
Expect to see a dataset with three columns; however:
Error in fread(input = "data/stackoverflow.txt", stringsAsFactors = FALSE) :
Line 3 starting <<"1","Pr/ ">> has more than the expected 3 fields.
Separator 3 occurs at position 26 which is character 6 of the last field: << n) ","B">>.
Consider setting 'comment.char=' if there is a trailing comment to be ignored.
How to solve it?
The development version of data.table handles files like this where the embedded quotes have not been escaped. See point 10 on the wiki page.
I just tested it on your input and it works.
$ more unescaped.txt
"No","Comment","Type"
"0","he said:"wonderful."","A"
"1","The problem is: reading table, and also "a problem, yes." keep going on.","A"
> DT = fread("unescaped.txt")
> DT
No Comment Type
1: 0 he said:"wonderful." A
2: 1 The problem is: reading table, and also "a problem, yes." keep going on. A
> ncol(DT)
[1] 3
Use readLines
to read line by line, then replace delimiter and read.table
:
# read with no sep
x <- readLines("test.txt")
# introduce new sep - "|"
x <- gsub("\",\"", "\"|\"", x)
# read with new sep
read.table(text = x, sep = "|", header = TRUE)
# No Comment Type
# 1 0 he said:"wonderful." A
# 2 1 The problem is: reading table, and also "a problem, yes." keep going on. A
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With