Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Import data into R with an unknown number of columns?

Tags:

import

r

I'm trying to read a text file with different row lengths:

1 1   2 1   2   3 1   2   3   4 1   2   3   4   5 1   2   3   4   5   6 1   2   3   4   5   6   7 1   2   3   4   5   6   7   8 

To overcome this problem, I'm using the argument fill=TRUE in read.table, so:

data<-read.table("test",sep="\t",fill=TRUE) 

Unfortunately, to assess the maximum row length, read.table reads only the first 5 lines of the file, and generates an object looking like this:

data    V1 V2 V3 V4 V5 1   1 NA NA NA NA 2   1  2 NA NA NA 3   1  2  3 NA NA 4   1  2  3  4 NA 5   1  2  3  4  5 6   1  2  3  4  5 7   6 NA NA NA NA 8   1  2  3  4  5 9   6  7 NA NA NA 10  1  2  3  4  5 11  6  7  8 NA NA 

Is there a way to force read.table to scroll over the whole file to assess the maximum row length? I know a possible solution would be to provide the column number, like:

data<-read.table("test",sep="\t",fill=TRUE,col.names=c(1:8)) 

But since I have a lot of files, I wanted to assess this automatically within R. Any suggestion? :-)


EDIT: the original file doesn't contain progressive numbers, so this is not a solution:

data1<-read.table("test",sep="\t",fill=TRUE) data2<-read.table("test",sep="\t",fill=TRUE,col.names=c(1:max(data1)) 
like image 508
Federico Giorgi Avatar asked Dec 09 '09 14:12

Federico Giorgi


People also ask

What is the R command for loading a comma separated value vector file?

csv() as a function to load a comma separated value file. This function is included as part of base R, and performs a similar job to read_csv() . We will be using read_csv() in this course; it is part of the tidyverse, so works well with other parts of the tidyverse, is faster than read.

How do I import external data into R?

For importing data in the R programming environment, we have to set our working directory with the setwd() function. To read a csv file, we use the in-built function read. csv() that outputs the data from the file as a data frame.


1 Answers

There is nice function count.fields (see help) which counts number of column per row:

count.fields("test", sep = "\t") #[1] 1 2 3 4 5 6 7 8 

So, using your second solution:

no_col <- max(count.fields("test", sep = "\t")) data <- read.table("test",sep="\t",fill=TRUE,col.names=1:no_col) data #   X1 X2 X3 X4 X5 X6 X7 X8 # 1  1 NA NA NA NA NA NA NA # 2  1  2 NA NA NA NA NA NA # 3  1  2  3 NA NA NA NA NA # 4  1  2  3  4 NA NA NA NA # 5  1  2  3  4  5 NA NA NA # 6  1  2  3  4  5  6 NA NA # 7  1  2  3  4  5  6  7 NA # 8  1  2  3  4  5  6  7  8 
like image 198
Marek Avatar answered Sep 28 '22 09:09

Marek