Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

read.fwf and the number sign

Tags:

r

read.table

I am trying to read this file (3.8mb) using its fixed-width structure as described in the following link.

This command:

a <- read.fwf('~/ccsl.txt',c(2,30,6,2,30,8,10,11,6,8))

Produces an error:

line 37 did not have 10 elements

After replicating the issue with different values of the skip option, I figured that the lines causing the problem all contain the "#" symbol.

Is there any way to get around it?

like image 474
Alex Avatar asked Dec 26 '11 09:12

Alex


2 Answers

As @jverzani already commented, this problem is probably the fact that the # sign often used as a character to signal a comment. Setting the comment.char input argument of read.fwf to something other than # could fix the problem. I'll leave my answer below as a more general case that you can use on any character that causes problems (e.g. the 's in the Dutch city name 's Gravenhage).

I've had this problem occur with other symbols. The approach I took was to simply replace the # by either nothing, or by a character which does not generate the error. In my case it was no problem to simply replace the character, but this might not be possible in your case.

So my approach would be to delete the symbol that generates the error, or replace by another character. This can be done using a text editor (find and replace), in an R script, or using some linux tools called grep and sed. If you want to do this in an R script, use scan or readLines to read the lines. Once the text is in memory, you can use sub to replace the character.

If you cannot replace the character, I would try the following approach: replace the character by a character that does not generate an error, read it into R using read.fwf, and finally replace the character by the # character.

like image 183
Paul Hiemstra Avatar answered Nov 16 '22 08:11

Paul Hiemstra


Following up on the answer above: to get all characters to be read as literals, use both comment.char="" and quote="" (the latter takes care of @PaulHiemstra's problem with single-quotes in Dutch proper nouns) in the call to read.fwf (this is documented in ?read.table).

like image 4
Ben Bolker Avatar answered Nov 16 '22 10:11

Ben Bolker