Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

creating a dataframe

My data is set up as so:

site    date      amb   ppm1   ppm2   ppm3   time0   time1   time2   time3
A       5/6/12     350   370    380   385     0       3        6       9

I need it in a format with 2 columns (one being concentration and the other time)

conc   time
350      0
370      3
380      6
385      9

So that I can run a regression on it. Or help with how to run a regression on the original set up would be great.

like image 298
user1612622 Avatar asked Dec 19 '25 03:12

user1612622


1 Answers

Using your sample data, and assuming your data.frame is called "mydf", you can use stack for each "set" of columns to get the output you show:

setNames(data.frame(stack(mydf[, grep("^ppm|^amb", names(mydf))])[-2], 
                    stack(mydf[, grep("^time", names(mydf))])[-2]), 
         c("conc", "time"))
#   conc time
# 1  350    0
# 2  370    3
# 3  380    6
# 4  385    9
  • grep was used, just as an example for if you have many similarly named columns and don't want to count to identify their column indexes. If this is truly representative of your data, stack could also just be stack(mydf[, 3:6])[-2] and stack(mydf[, 7:10]).
  • setNames is just a convenience function to rename the column names in the output.
  • The [-2] just removes the second column from each stack command (which is a column of the column names from which the values were taken).

Another option, if you don't mind changing the variable name of "abm" to "ppm0" would be to use reshape:

names(mydf)[3] <- "ppm0"
reshape(mydf, direction = "long", idvar = 1:2, 
        timevar = "measure", varying = 3:ncol(mydf), sep = "")
#            site   date measure ppm time
# A.5/6/12.0    A 5/6/12       0 350    0
# A.5/6/12.1    A 5/6/12       1 370    3
# A.5/6/12.2    A 5/6/12       2 380    6
# A.5/6/12.3    A 5/6/12       3 385    9

You can, of course, drop the first three columns very easily.

like image 68
A5C1D2H2I1M1N2O1R2T1 Avatar answered Dec 20 '25 16:12

A5C1D2H2I1M1N2O1R2T1