Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mean columns with almost the same name

Tags:

r

I have a data frame containing only one row with named columns. The data frame looks somewhat like this:

  poms_tat1 poms_tat2 poms_tat3      tens1      tens2      tens3 ...
1 0.3708821 0.4915922 0.3958195 -0.1139606 -0.1462545 -0.4411494 ...

I need to calculate the mean of all the columns with similar names. The result should look somewhat like this:


  poms_tat    tens      ...
1 0.4194551  -0.2337881667 ...

My first approach was to use a for loop and a nested while loop to find the indices of the relevant columns and then mean those, but unfortunately I couldn't make it work.

I also found this stackoverflow post which seemed promising but the agrep function seems to match columns in my data frame that should not be matched. I wasn't able to fix that using the max.distance parameter. For example it matches "threat1-3" with "reat1-3". I know those variable names are terrible, but unfortunately that's what I have to work with. What makes this even more complicated is that the number of columns in each category isn't always 3.

I hope I was able to articulate my problem well enough. Thank you.

Edit: Here is a reproducible piece of data:

structure(list(poms_tat1 = 0.370882118644872, poms_tat2 = 0.491592168116328, 
    poms_tat3 = 0.395819547420188, tens1 = -0.113960576459638, 
    tens2 = -0.146254484825426, tens3 = -0.44114940169153, bat_ratio1 = 1, 
    isi1 = 0.0944068640061701, isi2 = 0.597785124823513, isi3 = 0.676617801589949, 
    isi4 = 0.143940321201716, sleepqual = 0.378902118888194, 
    se1 = 0.393610946830482, se2 = 0.0991899501072693, se3 = 0.501745206004254, 
    challenge1 = 0.417855447018672, challenge2 = 0.393610946830482, 
    challenge3 = 0.417855447018672, threat1 = -0.13014390184863, 
    threat2 = -0.34027852368936, threat3 = -0.269679944985297, 
    reat1 = 0.565825152115738, reat2 = 0.571605347479646, reat3 = 0.497468338163091, 
    reat4 = 0.484881137876427, reat5 = 0.494727444918154, selfman1 = 0.389249472080761, 
    selfman2 = 0.40609787800914, selfman3 = 0.418121005003545, 
    selfman4 = 0.467099366496914, selfman5 = 0.205356548067582, 
    selfman6 = 0.464385939554693, selfman7 = 0.379071252751718, 
    eli1 = 0.250872603002127, eli2 = 0, eli3 = 0.265908011739155), row.names = 1L, class = "data.frame")
like image 961
Leo Avatar asked Dec 09 '22 23:12

Leo


1 Answers

We could use split.default to split based on the substring of column namesinto a list and then loop over the list with sapply, get the rowMeans in base R

sapply(split.default(df1, sub("\\d+$", "", names(df1))), rowMeans, na.rm = TRUE)
like image 145
akrun Avatar answered Dec 30 '22 03:12

akrun