I'm getting an error with read.table():
data <- read.table(file, header=T, stringsAsFactors=F, sep="@")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 160 did not have 28 elements
I checked line 160, and it did have 28 elements (it had 27 @ symbols).
I checked all of the 30242 lines there were 816534 @ symbols, which is 27 per line, so I'm pretty sure every single line has 28 elements. I also checked the file to confirm that there were no @ symbols anywhere else other than as separators.
Does anyone have an idea of what's going on here?
edit: Line 160 of file
158@Mental state: 1. Overall clinical symptoms@MD@S@2002@CMP-005@[email protected]@23.58@Clozapine versus typical neuroleptic medication for schizophrenia@[email protected]@02@SENSITIVITY ANALYSIS - CHINESE TRIALS@[email protected]@Fixed@16@5@2@45@Chinese trials@YES@Xia 2002 (CPZ)@STD-Xia-2002-_x0028_CPZ_x0029_@579@566@40
edit2: Line 161 of file
159@Length of surgery (minutes)@MD@Y@1995@CMP-001@[email protected]@47.0@Gamma and other cephalocondylic intramedullary nails versus extramedullary implants for extracapsular hip fractures in adults@[email protected]@01@Summary: Femoral nail (all types) versus sliding hip screw (SHS)@[email protected]@Random@12@1@1@53@Gamma nail@YES@O'Brien 1995@STD-O_x0027_Brien-1995@958@941@49
I think the problem is that there is a newline character that needs to be recognized by the quote
argument. Let's have a look.
txt <- c(
"158@Mental state: 1. Overall clinical symptoms@MD@S@2002@CMP-005@[email protected]@23.58@Clozapine versus typical neuroleptic medication for schizophrenia@[email protected]@02@SENSITIVITY ANALYSIS - CHINESE TRIALS@[email protected]@Fixed@16@5@2@45@Chinese trials@YES@Xia 2002 (CPZ)@STD-Xia-2002-_x0028_CPZ_x0029_@579@566@40",
"159@Length of surgery (minutes)@MD@Y@1995@CMP-001@[email protected]@47.0@Gamma and other cephalocondylic intramedullary nails versus extramedullary implants for extracapsular hip fractures in adults@[email protected]@01@Summary: Femoral nail (all types) versus sliding hip screw (SHS)@[email protected]@Random@12@1@1@53@Gamma nail@YES@O'Brien 1995@STD-O_x0027_Brien-1995@958@941@49"
)
We can use count.fields()
to preview the field lengths in the file. With a normal sep = "@"
and nothing else, we get an NA in between the lines, and incorrect counts
count.fields(textConnection(txt), sep = "@")
# [1] 28 NA 24
But when we recognize the newline separator in quote
, it returns the correct lengths
count.fields(textConnection(txt), sep = "@", quote = "\n")
# [1] 28 28
So, I recommend you add quote = "\n"
to your read.table
call and see if that solves it. It did for me
read.table(text = txt, sep = "@")
# [1] V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28
# <0 rows> (or 0-length row.names)
df <- read.table(text = txt, sep = "@", quote = "\n")
dim(df)
# [1] 2 28
anyNA(df)
# [1] FALSE
I had this same issue. This answer helped, but quote="\n" only worked up to a point. There was an element in the file that had a " as a character, so I had to use the default for quote. I also had # in one of the elements, so I had to use comment.char="". The help for read.table() referenced scan() in a couple spots, so I checked it out and found the allowEscapes argument that has False as the default. I added that to my read.table() call and set it to True. Here is the full command that worked for me: read.table(file="filename.csv", header=T, sep=",", comment.char="", allowEscapes=T) I hope this helps someone.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With