I'm new to R and have a simple question, as I'm still learning the style of R data manipulation/management.
I have a dataset of observations of basic clinical features (blood pressure, cholesterol, etc) over a period of time. Each observation has a patient ID and date, but are entered as separate line items. Something like this:
Patient ID Date Blood Pressure
1 21/1/14 120
1 19/3/14 134
1 3/5/14 127
I want to transform the data such that for a given variable (e.g. blood pressure), I have a data frame with one line per patient and all of the blood pressure values observed throughout time in chronological order. Something like this:
Patient ID BP1 BP2 BP3
1 120 134 127
I want to do this because I want to be able to write code to select the mean of the first three observed blood pressures, for example.
Any advice or reading recommendations would greatly be appreciated.
You can achieve the desired formatting by reshaping your data with a number of methods including using the reshape()
function in Base R or dcast()
in the reshape2
package, but it might be easier to just be able to get to your answer directly using a form of aggregation. Here's one method using ddply()
from the plyr
package:
library(plyr)
df <- read.table(text="id date bp
1 21/1/14 120
1 19/3/14 134
1 3/5/14 127",header=TRUE)
df1 <- ddply(df, .(id), summarize, mean.bp = mean(bp[1:3]))
df1
# id mean.bp
# 1 1 127
Of course, if you really just want to do what you asked about, you can do the following:
library(reshape2)
df$bp.id <- ave(df$id,df$id,FUN=function(x) paste0("BP",seq(along=x)))
df2 <- dcast(df[df$bp.id %in% paste0("BP",1:3)], id~bp.id, value.var="bp")
df2
# id BP1 BP2 BP3
# 1 1 120 134 127
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With