Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

rbind text files with different length of rows

Tags:

r

I'm trying to rbind two txt files with dif.length of rows ,for example:

I use this code:

a<-matrix(1:12,4,3)
b<-matrix(21:41,7,3)

setwd("test/")
write.table(a, file="a.txt",quote=FALSE,  row.names=FALSE,col.names=FALSE)
write.table(b, file="b.txt",quote=FALSE, row.names=FALSE, col.names=FALSE)
file_list <- list.files()
g<- do.call(rbind,lapply(file_list,FUN=function(files){scan(files,what = character())}))

I have this warning message:

"In (function (..., deparse.level = 1) : number of columns of result is not a multiple of vector length (arg 1)"

I want g looks like this:

##       [,1] [,2] [,3]
##  [1,]    1    5    9
##  [2,]    2    6   10
##  [3,]    3    7   11
##  [4,]    4    8   12
##  [5,]   21   28   35
##  [6,]   22   29   36
##  [7,]   23   30   37
##  [8,]   24   31   38
##  [9,]   25   32   39
## [10,]   26   33   40
## [11,]   27   34   41

Is there any solution for that as i'm new i R ? Thanks a lot,

like image 645
pshls Avatar asked Oct 19 '22 16:10

pshls


1 Answers

Unless you tell it otherwise, scan() will read the entire file as a single atomic vector. You could pass a list to the what argument, but it's much easier and safer to use a function that reads structured data. Also, you don't want to use what = character() because you're reading numeric values.

read.table() in base R, and fread() from package "data.table" can do this fairly easily.

files <- c("a.txt", "b.txt")

## read.table()
data.matrix(do.call(rbind, lapply(files, read.table)), rownames.force = FALSE)

## fread()
library(data.table)
data.matrix(rbindlist(lapply(files, fread)))

Both of these return the matrix

#       V1 V2 V3
#  [1,]  1  5  9
#  [2,]  2  6 10
#  [3,]  3  7 11
#  [4,]  4  8 12
#  [5,] 21 28 35
#  [6,] 22 29 36
#  [7,] 23 30 37
#  [8,] 24 31 38
#  [9,] 25 32 39
# [10,] 26 33 40
# [11,] 27 34 41

If you really wanted to use scan(), you could pass a list to the what argument to tell it the number of columns.

## get number of columns
nc <- max(unlist(lapply(files, count.fields)))
## read as a list, then bind together
do.call(rbind, lapply(files, function(x) {
    do.call(cbind, scan(x, what = as.list(double(nc)), quiet = TRUE))
}))
#       [,1] [,2] [,3]
#  [1,]    1    5    9
#  [2,]    2    6   10 
#  [3,]    3    7   11
#  [4,]    4    8   12
#  [5,]   21   28   35
#  [6,]   22   29   36
#  [7,]   23   30   37
#  [8,]   24   31   38
#  [9,]   25   32   39
# [10,]   26   33   40
# [11,]   27   34   41

But this is just count.fields() and then scan(), which is basically what read.table() is doing in one step. Plus this can be risky if there are missing values in the data.

like image 132
Rich Scriven Avatar answered Oct 27 '22 00:10

Rich Scriven