I have a dataframe that looks like this:
df <- data.frame("CB_1.1"=c(0,5,6,2), "CB_1.16"=c(1,5,3,6), "HC_2.11"=c(3,3,4,5), "HC_1.12"=c(2,3,4,5), "HC_1.13"=c(1,0,0,5))
> df
CB_1.1 CB_1.16 HC_2.11 HC_1.12 HC_1.13
1 0 1 3 2 1
2 5 5 3 3 0
3 6 3 4 4 0
4 2 6 5 5 5
I would like to take the mean of rows that share substring of the column name, before the ".". Resulting in a dataframe like this:
CB_1 HC_2 HC_1
1 0.5 3 1.5
2 5.0 3 1.5
3 4.5 4 2.0
4 4.0 5 5.0
You'll notice that the column HC_2.11 values remain the same, because no other column has HC_2 in this dataframe.
Any help would be appreciated!
1) apply/tapply For each row use tapply on it using an INDEX of the name prefixes and a function mean. Transpose the result. No packages are used.
prefix <- sub("\\..*", "", names(df))
t(apply(df, 1, tapply, prefix, mean))
giving this matrix (wrap it in data.frame(...) if you need a data frame result):
CB_1 HC_1 HC_2
[1,] 0.5 1.5 3
[2,] 5.0 1.5 3
[3,] 4.5 2.0 4
[4,] 4.0 5.0 5
2) lm Run the indicated regression. The +0 in the formula means don't add on an intercept. The transpose of the coefficients will be the required matrix, m. In the next line make the names nicer. prefix is from (1). No packages are used.
m <- t(coef(lm(t(df) ~ prefix + 0)))
colnames(m) <- sub("prefix", "", colnames(m))
m
giving this matrix
CB_1 HC_1 HC_2
[1,] 0.5 1.5 3
[2,] 5.0 1.5 3
[3,] 4.5 2.0 4
[4,] 4.0 5.0 5
This follows from the facts that (1) the model matrix X contains only ones and zeros and (2) distinct columns of it are orthogonal. The model matrix is shown here:
X <- model.matrix(~ prefix + 0) # model matrix
X
giving:
prefixCB_1 prefixHC_1 prefixHC_2
1 1 0 0
2 1 0 0
3 0 0 1
4 0 1 0
5 0 1 0
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$prefix
[1] "contr.treatment"
Because the columns of the model matrix X are orthogonal the coefficient corresponding to any column for a particular row, y, of df (column of t(df)) is just sum(x * y) / sum(x * x) and since x is a 0/1 vector that equals the mean of the values of y corresponding to the 1's in x.
3) stack/tapply Convert to long form inserting an id column at the same time. Then use tapply to convert back to wide form tapply-ing mean. No packages are used.
long <- transform(stack(df), ind = sub("\\..*", "", ind), id = c(row(df)))
with(long, tapply(values, long[c("id", "ind")], mean))
giving this table. Wrap it in as.data.frame.matrix if you want a data.frame.
ind
id CB_1 HC_1 HC_2
1 0.5 1.5 3
2 5.0 1.5 3
3 4.5 2.0 4
4 4.0 5.0 5
rowMeans + split.default, i.e.,dfout <- as.data.frame(Map(rowMeans, split.default(df,factor(s <- gsub("\\..*$","",names(df)), levels = unique(s)))))
such that
> dfout
CB_1 HC_2 HC_1
1 0.5 3 1.5
2 5.0 3 1.5
3 4.5 4 2.0
4 4.0 5 5.0
dfout <- as.data.frame(Map(rowMeans,split.default(df,gsub("\\..*$","",names(df)))))
such that
> dfout
CB_1 HC_1 HC_2
1 0.5 1.5 3
2 5.0 1.5 3
3 4.5 2.0 4
4 4.0 5.0 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With