In R
is there an efficient way to read a transposed .csv
file?
For example consider the following text file:
Name,Peter,Paul,Marry
Age,40,9,38
This could be read into a data.table
with useful column classes using:
library(data.table)
file <- tempfile("tmp.txt")
writeLines("Name,Peter,Paul,Mary\nAge,40,5,38\n", file)
lines <- readLines(file)
lines <- lapply(lines, function(x) gsub(pattern=",", replacement="\n", x, fixed=TRUE))
lines <- lapply(lines[-3], fread)
do.call(cbind,lines)
#> Name Age
#> 1: Peter 40
#> 2: Paul 5
#> 3: Mary 38
Is there a simpler way to achieve this? Is there a more efficient version (my file is 1 GB)?
Note, that such column-major storage should be easier to read for a column-wise storage as in a data.table
.
DT=setDT(read.table(text=do.call(paste,transpose(fread(file,h=F))),h=T,stringsAsFactors = F))
DT
Name Age
1: Peter 40
2: Paul 5
3: Mary 38
sapply(DT,class)
Name Age
"character" "integer"
This is an implementation of @Dirk Eddelbuettel's suggested approach in the comments.
> library(data.table)
> aTbl = fread("file.csv", colClasses="character", header=F)
> aTbl
V1 V2 V3 V4
1: Name Peter Paul Mary
2: Age 40 5 38
> aTbl[, .SD
][, transpose(.SD)
][, setnames(.SD, .SD[1, t(.SD)])
][2:.N
][, fread(paste0(capture.output(write.csv(.SD, stdout(), row.names=F, quote=F)), collapse='\n'))
][, {bTbl <<- copy(.SD); .SD}
]
Name Age
1: Peter 40
2: Paul 5
3: Mary 38
> lapply(bTbl, class)
$Name
[1] "character"
$Age
[1] "integer"
>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With