Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read multiple lines of a file into one row of a dataframe

Tags:

file

r

I have a data file where individual samples are seperated by a blank line and each field is on it's own line:

age 20
weight 185
height 72

age 87
weight 109
height 60

age 15
weight 109
height 58

...

How can I read this file into a dataframe such that each row represents a sample with columns of age, weight, height?

    age    weight    height

1   20      185        72  
2   87      109        60
3   15      109        58
...
like image 255
turtle Avatar asked Jan 15 '23 09:01

turtle


2 Answers

@user1317221_G showed the approach I would take, but resorted to loading an extra package and explicitly generating the groups. The groups (the ID variable) is the key to getting any reshape type answer to work. The matrix answers don't have that limitation.

Here's a closely related approach in base R:

mydf <- read.table(header = FALSE, stringsAsFactors=FALSE, 
                   text = "age 20
                   weight 185
                   height 72

                   age 87
                   weight 109
                   height 60

                   age 15
                   weight 109
                   height 58
                   ")

# Create your id variable
mydf <- within(mydf, {
  id <- ave(V1, V1, FUN = seq_along)
})

With an id variable, your transformation is easy:

reshape(mydf, direction = "wide", 
        idvar = "id", timevar="V1")
#   id V2.age V2.weight V2.height
# 1  1     20       185        72
# 4  2     87       109        60
# 7  3     15       109        58

Or:

# Your ids become the "rownames" with this approach
as.data.frame.matrix(xtabs(V2 ~ id + V1, mydf))
#   age height weight
# 1  20     72    185
# 2  87     60    109
# 3  15     58    109
like image 55
A5C1D2H2I1M1N2O1R2T1 Avatar answered Jan 16 '23 21:01

A5C1D2H2I1M1N2O1R2T1


To expand on @BlueMagister's answer you can use scan with some options to read this directly into a list, then convert the list to a data frame:

tmp <- scan(text = "
age     20
weight  185
height  72

age     87
weight  109
height  60

age     15
weight  109
height  58", multi.line=TRUE, 
  what=list('',0,'',0,'',0), 
  blank.lines.skip=TRUE)

mydf <- as.data.frame( tmp[ c(FALSE,TRUE) ] )
names(mydf) <- sapply( tmp[ c(TRUE,FALSE) ], '[', 1 )

This assumes that the variables within a record are always in the same order.

like image 22
Greg Snow Avatar answered Jan 16 '23 22:01

Greg Snow