Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fread - multiple separators in a string

I'm trying to read a table using fread. The txt file has text which look like:

"No","Comment","Type"
"0","he said:"wonderful|"","A"
"1","Pr/ "d/s". "a", n) ","B"

R codes I'm using is: dataset0 <- fread("data/test.txt", stringsAsFactors = F) with the development version of data.table R package.

Expect to see a dataset with three columns; however:

Error in fread(input = "data/stackoverflow.txt", stringsAsFactors = FALSE) : 
Line 3 starting <<"1","Pr/ ">> has more than the expected 3 fields.
Separator 3 occurs at position 26 which is character 6 of the last field: << n) ","B">>. 
Consider setting 'comment.char=' if there is a trailing comment to be ignored.

How to solve it?

like image 883
A.Yin Avatar asked Mar 21 '17 23:03

A.Yin


2 Answers

The development version of data.table handles files like this where the embedded quotes have not been escaped. See point 10 on the wiki page.

I just tested it on your input and it works.

$ more unescaped.txt
"No","Comment","Type"
"0","he said:"wonderful."","A"
"1","The problem is: reading table, and also "a problem, yes." keep going on.","A"

> DT = fread("unescaped.txt")
> DT
   No                                                                  Comment Type
1:  0                                                     he said:"wonderful."    A
2:  1 The problem is: reading table, and also "a problem, yes." keep going on.    A
> ncol(DT)
[1] 3
like image 174
Matt Dowle Avatar answered Nov 20 '22 04:11

Matt Dowle


Use readLines to read line by line, then replace delimiter and read.table:

# read with no sep
x <- readLines("test.txt")

# introduce new sep - "|"
x <- gsub("\",\"", "\"|\"", x)

# read with new sep
read.table(text = x, sep = "|", header = TRUE)

#   No                                                                  Comment Type
# 1  0                                                     he said:"wonderful."    A
# 2  1 The problem is: reading table, and also "a problem, yes." keep going on.    A
like image 20
zx8754 Avatar answered Nov 20 '22 04:11

zx8754