Let's say we have a file name test.txt
which contains unknown number of columns:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5 6 7 8
1 2 3 4 5
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
fill=T
fails when line 8 has more than 5 columns:
read.table('test.txt', header=F, sep='\t', fill=T)
results:
V1 V2 V3 V4 V5
1 1 2 3 4 5
2 1 2 3 4 5
3 1 2 3 4 5
4 1 2 3 4 5
5 1 2 3 4 5
6 1 2 3 4 5
7 1 2 3 4 5
8 1 2 3 4 5
9 6 7 8 NA NA
10 1 2 3 4 5
11 1 2 3 4 5
12 6 NA NA NA NA
13 1 2 3 4 5
14 6 NA NA NA NA
15 1 2 3 4 5
16 6 NA NA NA NA
But with skip=3
, everything works fine
read.table('test.txt', header=F, sep='\t', fill=T, skip=3)
We got what we expected:
V1 V2 V3 V4 V5 V6 V7 V8
1 1 2 3 4 5 NA NA NA
2 1 2 3 4 5 NA NA NA
3 1 2 3 4 5 NA NA NA
4 1 2 3 4 5 NA NA NA
5 1 2 3 4 5 6 7 8
6 1 2 3 4 5 NA NA NA
7 1 2 3 4 5 6 NA NA
8 1 2 3 4 5 6 NA NA
9 1 2 3 4 5 6 NA NA
Why would this happen? Was it because fill=T
only check the first 5 rows? Is there any way to work around this?
I've found the answers right in the Examples of read.table.
ncol <- max(count.fields('test.txt', sep = "\t"))
read.table('test.txt', header=F, sep='\t', fill=T, col.names=paste0('V', seq_len(ncol)))
It did because of fill=T
only checks the first five rows. The solution is to specify col.names
.
use col.names = paste0("V",seq_len(N))
within read.table
where N
is the maximum number of columns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With