Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fill=TRUE will fail when different number of column occurr after 5 rows in read.table? [duplicate]

Tags:

r

read.table

Let's say we have a file name test.txt which contains unknown number of columns:

1   2   3   4   5
1   2   3   4   5
1   2   3   4   5
1   2   3   4   5
1   2   3   4   5
1   2   3   4   5
1   2   3   4   5
1   2   3   4   5   6   7   8
1   2   3   4   5
1   2   3   4   5   6
1   2   3   4   5   6
1   2   3   4   5   6

fill=T fails when line 8 has more than 5 columns:

read.table('test.txt', header=F, sep='\t', fill=T)

results:

   V1 V2 V3 V4 V5
1   1  2  3  4  5
2   1  2  3  4  5
3   1  2  3  4  5
4   1  2  3  4  5
5   1  2  3  4  5
6   1  2  3  4  5
7   1  2  3  4  5
8   1  2  3  4  5
9   6  7  8 NA NA
10  1  2  3  4  5
11  1  2  3  4  5
12  6 NA NA NA NA
13  1  2  3  4  5
14  6 NA NA NA NA
15  1  2  3  4  5
16  6 NA NA NA NA

But with skip=3, everything works fine

read.table('test.txt', header=F, sep='\t', fill=T, skip=3)

We got what we expected:

  V1 V2 V3 V4 V5 V6 V7 V8
1  1  2  3  4  5 NA NA NA
2  1  2  3  4  5 NA NA NA
3  1  2  3  4  5 NA NA NA
4  1  2  3  4  5 NA NA NA
5  1  2  3  4  5  6  7  8
6  1  2  3  4  5 NA NA NA
7  1  2  3  4  5  6 NA NA
8  1  2  3  4  5  6 NA NA
9  1  2  3  4  5  6 NA NA

Why would this happen? Was it because fill=T only check the first 5 rows? Is there any way to work around this?

like image 967
Gahoo Avatar asked Aug 18 '15 07:08

Gahoo


2 Answers

I've found the answers right in the Examples of read.table.

ncol <- max(count.fields('test.txt', sep = "\t"))
read.table('test.txt', header=F, sep='\t', fill=T, col.names=paste0('V', seq_len(ncol)))

It did because of fill=T only checks the first five rows. The solution is to specify col.names.

like image 128
Gahoo Avatar answered Nov 15 '22 05:11

Gahoo


use col.names = paste0("V",seq_len(N)) within read.table where N is the maximum number of columns.

like image 31
drmariod Avatar answered Nov 15 '22 05:11

drmariod