Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sum duplicated columns in dataframe in R

Hello i have the following dataframe :

colnames(tv_viewing time) <-c("channel_1", "channel_2", "channel_1", "channel_2")

Each row gives a the viewing time for an individual on channel 1 and channel 2, for instance for individual 1 i get :

tv_viewing_time[1,] <- c(1,2,4,5)

What I would like is actually a dataframe that sums up the values of duplicated columns. I.e. I would get

colnames(tv_viewing time) <-c("channel_1", "channel_2")

Where for instance for individual 1 i would get :

tv_viewing_time[1,] <- c(5,7)

As all two row entries are summed when they correspond to duplicated column names.

I have looked for an answer but all suggested on other threads did not work for my dataframe case. Note that there are many more duplicated columns, so i am looking for a solution that can be efficiently applied to all my duplicates.

like image 933
Lola1993 Avatar asked Nov 30 '25 18:11

Lola1993


1 Answers

We could use split.default with rowSums

sapply(split.default(tv_viewing_time, 
       sub("\\.\\d+$", "", names(tv_viewing_time))), rowSums)

-output

# channel_1 channel_2 
#       5         7 

Or using tidyverse

library(dplyr)
library(tidyr)
library(stringr)
tv_viewing_time %>% 
  pivot_longer(cols = everything()) %>%
  group_by(name = str_remove(name, "\\.\\d+$")) %>% 
  summarise(value = sum(value)) %>% 
  pivot_wider(names_from = name, values_from = value)
# A tibble: 1 x 2
#  channel_1 channel_2
#      <dbl>     <dbl>
#1         5         7

data

tv_viewing_time <- data.frame(channel_1 = 1, channel_2 = 2, 
        channel_1 = 4, channel_2 = 5)
like image 154
akrun Avatar answered Dec 02 '25 10:12

akrun