My data is set up as so:
site date amb ppm1 ppm2 ppm3 time0 time1 time2 time3
A 5/6/12 350 370 380 385 0 3 6 9
I need it in a format with 2 columns (one being concentration and the other time)
conc time
350 0
370 3
380 6
385 9
So that I can run a regression on it. Or help with how to run a regression on the original set up would be great.
Using your sample data, and assuming your data.frame is called "mydf", you can use stack for each "set" of columns to get the output you show:
setNames(data.frame(stack(mydf[, grep("^ppm|^amb", names(mydf))])[-2],
stack(mydf[, grep("^time", names(mydf))])[-2]),
c("conc", "time"))
# conc time
# 1 350 0
# 2 370 3
# 3 380 6
# 4 385 9
grep was used, just as an example for if you have many similarly named columns and don't want to count to identify their column indexes. If this is truly representative of your data, stack could also just be stack(mydf[, 3:6])[-2] and stack(mydf[, 7:10]).setNames is just a convenience function to rename the column names in the output. [-2] just removes the second column from each stack command (which is a column of the column names from which the values were taken).Another option, if you don't mind changing the variable name of "abm" to "ppm0" would be to use reshape:
names(mydf)[3] <- "ppm0"
reshape(mydf, direction = "long", idvar = 1:2,
timevar = "measure", varying = 3:ncol(mydf), sep = "")
# site date measure ppm time
# A.5/6/12.0 A 5/6/12 0 350 0
# A.5/6/12.1 A 5/6/12 1 370 3
# A.5/6/12.2 A 5/6/12 2 380 6
# A.5/6/12.3 A 5/6/12 3 385 9
You can, of course, drop the first three columns very easily.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With