I have a data file where individual samples are seperated by a blank line and each field is on it's own line:
age 20
weight 185
height 72
age 87
weight 109
height 60
age 15
weight 109
height 58
...
How can I read this file into a dataframe such that each row represents a sample with columns of age, weight, height?
    age    weight    height
1   20      185        72  
2   87      109        60
3   15      109        58
...
                @user1317221_G showed the approach I would take, but resorted to loading an extra package and explicitly generating the groups. The groups (the ID variable) is the key to getting any reshape type answer to work. The matrix answers don't have that limitation.
Here's a closely related approach in base R:
mydf <- read.table(header = FALSE, stringsAsFactors=FALSE, 
                   text = "age 20
                   weight 185
                   height 72
                   age 87
                   weight 109
                   height 60
                   age 15
                   weight 109
                   height 58
                   ")
# Create your id variable
mydf <- within(mydf, {
  id <- ave(V1, V1, FUN = seq_along)
})
With an id variable, your transformation is easy:
reshape(mydf, direction = "wide", 
        idvar = "id", timevar="V1")
#   id V2.age V2.weight V2.height
# 1  1     20       185        72
# 4  2     87       109        60
# 7  3     15       109        58
Or:
# Your ids become the "rownames" with this approach
as.data.frame.matrix(xtabs(V2 ~ id + V1, mydf))
#   age height weight
# 1  20     72    185
# 2  87     60    109
# 3  15     58    109
                        To expand on @BlueMagister's answer you can use scan with some options to read this directly into a list, then convert the list to a data frame:
tmp <- scan(text = "
age     20
weight  185
height  72
age     87
weight  109
height  60
age     15
weight  109
height  58", multi.line=TRUE, 
  what=list('',0,'',0,'',0), 
  blank.lines.skip=TRUE)
mydf <- as.data.frame( tmp[ c(FALSE,TRUE) ] )
names(mydf) <- sapply( tmp[ c(TRUE,FALSE) ], '[', 1 )
This assumes that the variables within a record are always in the same order.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With