I am trying to import a tab separated list into R.
It is 81704 rows long. However, read.table is only creating 31376. Here is my code:
population <- read.table('population.txt', header=TRUE,sep='\t',na.strings = 'NA',blank.lines.skip = FALSE)
There are no # commenting anything out.
Here are the first few lines:
[1] "NAME\tSTATENAME\tPOP_2009" "Alabama\tAlabama\t4708708" "Abbeville city\tAlabama\t2934" "Adamsville city\tAlabama\t4782"
[5] "Addison town\tAlabama\t711"
When I read it raw, readLines gives the right number.
Any ideas are much appreciated!
Difficult to diagnose without seeing the input file, but the usual suspects are quotes and comment characters (even if you think there are none of the latter). You can try:
quote = "", comment.char = ""
as arguments to read.table() and see if that helps.
Check with count.fields
what's in file:
n <- count.fields('population.txt', sep='\t', blank.lines.skip=FALSE)
Then you could check
length(n) # should be 81705 (it count header so rows+1), if yes then:
table(n) # show you what's wrong
Then you readLines
your file and check rows with wrong number of fields. (e.g. x<-readLines('population.txt'); head(x[n!=6])
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With