Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

read.table creates too few rows, but readLines has the right number

Tags:

r

I am trying to import a tab separated list into R.

It is 81704 rows long. However, read.table is only creating 31376. Here is my code:

population <- read.table('population.txt', header=TRUE,sep='\t',na.strings = 'NA',blank.lines.skip = FALSE)

There are no # commenting anything out.

Here are the first few lines:

[1] "NAME\tSTATENAME\tPOP_2009"      "Alabama\tAlabama\t4708708"      "Abbeville city\tAlabama\t2934"  "Adamsville city\tAlabama\t4782"
[5] "Addison town\tAlabama\t711"

When I read it raw, readLines gives the right number.

Any ideas are much appreciated!

like image 806
evt Avatar asked May 02 '11 06:05

evt


2 Answers

Difficult to diagnose without seeing the input file, but the usual suspects are quotes and comment characters (even if you think there are none of the latter). You can try:

quote = "", comment.char = ""

as arguments to read.table() and see if that helps.

like image 182
neilfws Avatar answered Nov 07 '22 14:11

neilfws


Check with count.fields what's in file:

n <- count.fields('population.txt', sep='\t', blank.lines.skip=FALSE)

Then you could check

length(n) # should be 81705 (it count header so rows+1), if yes then:
table(n) # show you what's wrong

Then you readLines your file and check rows with wrong number of fields. (e.g. x<-readLines('population.txt'); head(x[n!=6]))

like image 23
Marek Avatar answered Nov 07 '22 13:11

Marek