Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sum rows by rows?

Im kinda newbie in R right now... So im doing a census research as an university project. for illustration that is part of my data.frame

             MUN          X1990  X1991  X1992 X1993
1     Angra dos Reis (RJ)    11    10    10    10
2            Aperibé (RJ)    NA    NA    NA    NA
3           Araruama (RJ)  12040 14589 14231 14231
4              Areal (RJ)    NA    NA    NA     3
5 Armação dos Búzios (RJ)    NA    NA    NA    NA

My problem is that i need to sum some municipalities rows whose name i know/will specify, (because i dont know the order it will apear, or if they ever will apear, in all of my tables), and the result should be displayed in a row.

as an example, i would like to sum the row "Areal" with the row "Angra dos Reis", with the result stored in another created row( let's call the result row: X) so the result should be:

             MUN          X1990  X1991  X1992 X1993
1     Angra dos Reis (RJ)    11    10    10    10
2            Aperibé (RJ)    NA    NA    NA    NA
3           Araruama (RJ)  12040 14589 14231 14231
4              Areal (RJ)    NA    NA    NA     3
5 Armação dos Búzios (RJ)    NA    NA    NA    NA
6          X                 11    10    10    13

I have tried to create a for loop and an if loop, but i can't get to do that right.

like image 946
tnorio Avatar asked Jan 30 '23 08:01

tnorio


2 Answers

This is pretty similar to Jaap's comment, but a little more spelled out and uses the row names explicitly:

mat = as.matrix(dat[, 2:5])
row.names(mat) = dat$MUN
mat = rbind(mat, colSums(mat[c("Angra dos Reis (RJ)", "Areal (RJ)"), ], na.rm = T))
row.names(mat)[nrow(mat)] = "X"
mat
#                         X1990 X1991 X1992 X1993
# Angra dos Reis (RJ)        11    10    10    10
# Aperibé (RJ)               NA    NA    NA    NA
# Araruama (RJ)           12040 14589 14231 14231
# Areal (RJ)                 NA    NA    NA     3
# Armação dos Búzios (RJ)    NA    NA    NA    NA
# X                          11    10    10    13

The result is a matrix, you can convert it back to a data frame if needed:

dat_result = data.frame(MUN = row.names(mat), mat, row.names = NULL)

I dislike the format of your data as a data frame. I would either convert it to a matrix (as above) or convert it to long format with, e.g., tidyr::gather(dat, key = year, value = value, -MUN) and work with it "by group" using data.table or dplyr.


Using this data:

dat = read.table(text = "             MUN          X1990  X1991  X1992 X1993
1     'Angra dos Reis (RJ)'    11    10    10    10
2            'Aperibé (RJ)'    NA    NA    NA    NA
3           'Araruama (RJ)'  12040 14589 14231 14231
4              'Areal (RJ)'    NA    NA    NA     3
5 'Armação dos Búzios (RJ)'    NA    NA    NA    NA", header= T)
like image 95
Gregor Thomas Avatar answered Feb 01 '23 08:02

Gregor Thomas


A solution can be using sqldf package. If the name of the data frame is df, you can do it likes the following:

library(sqldf)
result <- sqldf("SELECT * FROM df UNION 
       SELECT 'X', SUM(X1990), SUM(X1991), SUM(X1992), SUM(X1993) FROM df
       WHERE MUN IN ('Angra dos Reis (RJ)', 'Areal (RJ)')")
like image 43
OmG Avatar answered Feb 01 '23 08:02

OmG