Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Average every duplicated column for each row?

I have the following dataframe.

temp = structure(list(A = c(0, 0, 0, 3.72900887033786, 1.94860084749336, 
0), C = c(0, 0, 0, 3.44095219802964, 2.35049724708413, 0.0285691521967709
), A = c(0, 0, 0, 3.29572302453997, 0.933572638261024, 0), D = c(0, 
0, 0, 2.4905701304462, 1.54101915313356, 0), E = c(0, 0, 0, 4.23189316164533, 
1.7311832415722, 0), E = c(0, 0, 0, 4.37851162325373, 2.50080205305716, 
0), D = c(0, 0, 0, 3.68929916053589, 2.4905701304462, 0.189033824390017
), F = c(0, 2.27500704749987, 0, 3.68032435684402, 1.77820857639809, 
0), A = c(0, 0, 0, 3.5668151540109, 1.72683121703249, 0.0285691521967709
), G = c(0, 0, 0, 5.6450098843911, 3.09929520433778, 0)), row.names = c("5_8S_rRNA", 
"5S_rRNA", "7SK", "A1BG", "A1BG-AS1", "A1CF"), class = "data.frame")

It looks like this.

                 A          C         A        D        E        E         D        F          A        G
5_8S_rRNA 0.000000 0.00000000 0.0000000 0.000000 0.000000 0.000000 0.0000000 0.000000 0.00000000 0.000000
5S_rRNA   0.000000 0.00000000 0.0000000 0.000000 0.000000 0.000000 0.0000000 2.275007 0.00000000 0.000000
7SK       0.000000 0.00000000 0.0000000 0.000000 0.000000 0.000000 0.0000000 0.000000 0.00000000 0.000000
A1BG      3.729009 3.44095220 3.2957230 2.490570 4.231893 4.378512 3.6892992 3.680324 3.56681515 5.645010
A1BG-AS1  1.948601 2.35049725 0.9335726 1.541019 1.731183 2.500802 2.4905701 1.778209 1.72683122 3.099295
A1CF      0.000000 0.02856915 0.0000000 0.000000 0.000000 0.000000 0.1890338 0.000000 0.02856915 0.000000

What I'd like to do is collapse any column that are duplicates by averaging duplicates, but I want to do this for each row.

The ideal dataframe would contain the same amount of rows, but would only contain columns A, C, D, E, F, G. Is this possible?

like image 719
Ahdee Avatar asked Jun 04 '26 17:06

Ahdee


2 Answers

We could use split.default to split by the column names and loop over the list, apply the rowMeans

 sapply(split.default(temp, names(temp)), rowMeans)
                    A          C          D        E        F        G
5_8S_rRNA 0.000000000 0.00000000 0.00000000 0.000000 0.000000 0.000000
5S_rRNA   0.000000000 0.00000000 0.00000000 0.000000 2.275007 0.000000
7SK       0.000000000 0.00000000 0.00000000 0.000000 0.000000 0.000000
A1BG      3.530515683 3.44095220 3.08993465 4.305202 3.680324 5.645010
A1BG-AS1  1.536334901 2.35049725 2.01579464 2.115993 1.778209 3.099295
A1CF      0.009523051 0.02856915 0.09451691 0.000000 0.000000 0.000000
like image 65
akrun Avatar answered Jun 07 '26 07:06

akrun


Another base R solution with rowsum:

t(rowsum(t(temp), names(temp)) / c(table(names(temp))))

                    A          C          D        E        F        G
5_8S_rRNA 0.000000000 0.00000000 0.00000000 0.000000 0.000000 0.000000
5S_rRNA   0.000000000 0.00000000 0.00000000 0.000000 2.275007 0.000000
7SK       0.000000000 0.00000000 0.00000000 0.000000 0.000000 0.000000
A1BG      3.530515683 3.44095220 3.08993465 4.305202 3.680324 5.645010
A1BG-AS1  1.536334901 2.35049725 2.01579464 2.115993 1.778209 3.099295
A1CF      0.009523051 0.02856915 0.09451691 0.000000 0.000000 0.000000
like image 31
Maël Avatar answered Jun 07 '26 07:06

Maël



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!