Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Subtract values from rows based on another column

Tags:

r

I have a dataset as follows:

Group    Type   Income
 A        X       1000
 A        Y       500
 B        Y       2000
 B        X       1500
 C        X       700
 D        Y       600

I need the output as follows:

Group    Diff
  A       500
  B      -500
  C       700
  D      -600

One approach I can think of is by separating data with Type X and Y, then adding income as 0 for Groups where either X or Y in not present, then merging the data such as for each group there is a column named IncomeX and another named IncomeY, then subtracting the two columns.

Is there an easier way to do this?

like image 411
Nadeem Hussain Avatar asked Jan 05 '23 04:01

Nadeem Hussain


2 Answers

I would do it like this: (using dplyr and reshape2 package)

library("dplyr")
library("reshape2")

t <- read.table(text = "Group    Type   Income
 A        X       1000
                A        Y       500
                B        Y       2000
                B        X       1500
                C        X       700
                D        Y       600", header = TRUE)

t %>% 
    dcast(Group ~ Type, value.var = "Income", fill = 0) %>% 
    mutate(Diff = X - Y) %>% 
    select(Group, Diff)

# Group Diff
# 1     A  500
# 2     B -500
# 3     C  700
# 4     D -600

dcast changes the format of the table and mutate creates new column.

like image 189
Marta Avatar answered Jan 13 '23 09:01

Marta


Try this in base R:

aggregate(Diff~Group, 
          with(df, data.frame(Group=Group, Diff=ifelse(Type=="X", 1, -1)*Income)), sum)

# Group Diff
#1     A    500
#2     B   -500
#3     C    700
#4     D   -600

data

df <- structure(list(Group = structure(c(1L, 1L, 2L, 2L, 3L, 4L), .Label = c("A", 
"B", "C", "D"), class = "factor"), Type = structure(c(1L, 2L, 
2L, 1L, 1L, 2L), .Label = c("X", "Y"), class = "factor"), Income = c(1000L, 
500L, 2000L, 1500L, 700L, 600L)), .Names = c("Group", "Type", 
"Income"), class = "data.frame", row.names = c(NA, -6L))
like image 33
989 Avatar answered Jan 13 '23 08:01

989