Im kinda newbie in R right now... So im doing a census research as an university project. for illustration that is part of my data.frame
MUN X1990 X1991 X1992 X1993
1 Angra dos Reis (RJ) 11 10 10 10
2 Aperibé (RJ) NA NA NA NA
3 Araruama (RJ) 12040 14589 14231 14231
4 Areal (RJ) NA NA NA 3
5 Armação dos Búzios (RJ) NA NA NA NA
My problem is that i need to sum some municipalities rows whose name i know/will specify, (because i dont know the order it will apear, or if they ever will apear, in all of my tables), and the result should be displayed in a row.
as an example, i would like to sum the row "Areal" with the row "Angra dos Reis", with the result stored in another created row( let's call the result row: X) so the result should be:
MUN X1990 X1991 X1992 X1993
1 Angra dos Reis (RJ) 11 10 10 10
2 Aperibé (RJ) NA NA NA NA
3 Araruama (RJ) 12040 14589 14231 14231
4 Areal (RJ) NA NA NA 3
5 Armação dos Búzios (RJ) NA NA NA NA
6 X 11 10 10 13
I have tried to create a for loop and an if loop, but i can't get to do that right.
This is pretty similar to Jaap's comment, but a little more spelled out and uses the row names explicitly:
mat = as.matrix(dat[, 2:5])
row.names(mat) = dat$MUN
mat = rbind(mat, colSums(mat[c("Angra dos Reis (RJ)", "Areal (RJ)"), ], na.rm = T))
row.names(mat)[nrow(mat)] = "X"
mat
# X1990 X1991 X1992 X1993
# Angra dos Reis (RJ) 11 10 10 10
# Aperibé (RJ) NA NA NA NA
# Araruama (RJ) 12040 14589 14231 14231
# Areal (RJ) NA NA NA 3
# Armação dos Búzios (RJ) NA NA NA NA
# X 11 10 10 13
The result is a matrix
, you can convert it back to a data frame if needed:
dat_result = data.frame(MUN = row.names(mat), mat, row.names = NULL)
I dislike the format of your data as a data frame. I would either convert it to a matrix (as above) or convert it to long format with, e.g., tidyr::gather(dat, key = year, value = value, -MUN)
and work with it "by group" using data.table
or dplyr
.
Using this data:
dat = read.table(text = " MUN X1990 X1991 X1992 X1993
1 'Angra dos Reis (RJ)' 11 10 10 10
2 'Aperibé (RJ)' NA NA NA NA
3 'Araruama (RJ)' 12040 14589 14231 14231
4 'Areal (RJ)' NA NA NA 3
5 'Armação dos Búzios (RJ)' NA NA NA NA", header= T)
A solution can be using sqldf package. If the name of the data frame is df
, you can do it likes the following:
library(sqldf)
result <- sqldf("SELECT * FROM df UNION
SELECT 'X', SUM(X1990), SUM(X1991), SUM(X1992), SUM(X1993) FROM df
WHERE MUN IN ('Angra dos Reis (RJ)', 'Areal (RJ)')")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With