Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate data.frame for each day

Tags:

r

aggregate

I have a data.frame dat about car sells (Buy=0 in the data frame) and buys (Buy=1 in the data frame) of a used car seller.

  Date       Buy   Price
29-06-2015    1    5000
29-06-2015    0    8000
29-06-2015    1    10000
30-06-2015    0    3500
30-06-2015    0    12000 
...          ...  ...

What I need is a new, aggregated data.frame that gives me the number of buys and the number of sells per day together with the summed prices of all the buys and sells for that day:

  Date      Buys   Sells   Price_Buys  Price_Sells
29-06-2015    2    1         15000        8000
30-06-2015    0    2           0          15500
...          ...  ...

I tried to use aggregate(dat$Buy, by=list(Date=dat$Date, FUN=sum)). However, I am still struggling how to aggregate the sells as well.

like image 300
jeffrey Avatar asked Jan 28 '16 21:01

jeffrey


People also ask

How do you aggregate data in R?

In order to use the aggregate function for mean in R, you will need to specify the numerical variable on the first argument, the categorical (as a list) on the second and the function to be applied (in this case mean ) on the third. An alternative is to specify a formula of the form: numerical ~ categorical .

How do I convert daily data into weekly data in pandas?

Method 1: using Python for-loops. Function new_case_count() takes in DataFrame object, iterates over it and converts indexes, which are dates in string format, to Pandas Datetime format. Based on the date's day of the week, each week's new cases count is calculated and stored in a list.

How do you display the count of a data frame?

Get Number of Rows in DataFrame You can use len(df. index) to find the number of rows in pandas DataFrame, df. index returns RangeIndex(start=0, stop=8, step=1) and use it on len() to get the count.


4 Answers

This can be done pretty cleanly in dplyr, grouping by date using group_by and then summarizing with summarize:

library(dplyr)
(out <- dat %>%
  group_by(Date) %>%
  summarize(Buys=sum(Buy == 1), Sells=sum(Buy == 0),
            Price_Buys=sum(Price[Buy == 1]), Price_Sells=sum(Price[Buy == 0])))
#         Date  Buys Sells Price_Buys Price_Sells
#       (fctr) (int) (int)      (int)       (int)
# 1 29-06-2015     2     1      15000        8000
# 2 30-06-2015     0     2          0       15500

You can now manipulate this object as you would a normal data frame, e.g. with something like:

out$newvar <- with(out, Sells*Price_Sells - Buys*Price_Buys)
out
# Source: local data frame [2 x 6]
#         Date  Buys Sells Price_Buys Price_Sells newvar
#       (fctr) (int) (int)      (int)       (int)  (int)
# 1 29-06-2015     2     1      15000        8000 -22000
# 2 30-06-2015     0     2          0       15500  31000
like image 74
josliber Avatar answered Oct 04 '22 13:10

josliber


Using data.table V 1.9.6+ you can now provide a list of functions to the fun argument, so we can easily solve this with dcast (without specifying any conditions by hand)

library(data.table) # V1.9.6+
dcast(setDT(dat), Date ~ Buy , value.var = "Price", fun = list(length, sum))
#          Date Price_length_0 Price_length_1 Price_sum_0 Price_sum_1
# 1: 29-06-2015              1              2        8000       15000
# 2: 30-06-2015              2              0       15500           0

Or if we want to to try dplyr, a robust way of solving this (again, without specifying any conditions) could be

library(dplyr)
df %>%
  group_by(Date, Buy) %>%
  summarise_each(funs(sum, length), Price)

# Source: local data frame [3 x 4]
# Groups: Date [?]
# 
#         Date   Buy   sum length
#       (fctr) (int) (int)  (int)
# 1 29-06-2015     0  8000      1
# 2 29-06-2015     1 15000      2
# 3 30-06-2015     0 15500      2
like image 44
David Arenburg Avatar answered Oct 04 '22 13:10

David Arenburg


You can use library dplyr to do this:

df %>% group_by(Date) %>% summarise(buys = sum(Buy == 1), sells = sum(Buy == 0), Price_Buys = sum(Price[Buy == 1]), Price_Sells = sum(Price[Buy == 0]))
Source: local data frame [2 x 5]

        Date  buys sells Price_Buys Price_Sells
      (fctr) (int) (int)      (int)       (int)
1 29-06-2015     2     1      15000        8000
2 30-06-2015     0     2          0       15500
like image 39
Gopala Avatar answered Oct 04 '22 11:10

Gopala


I would use one of the dpylr solutions myself, but I think it is still noteworthy, that it can also be done with aggregate(), since this is how you started out:

aggregate(cbind(Buys = Buy, Sells = !Buy,
                Price_Buys = Price * Buy, Price_Sells = Price * !Buy) ~ Date,
          data = dat, sum)
##         Date Buys Sells Price_Buys Price_Sells
## 1 29-06-2015    2     1      15000        8000
## 2 30-06-2015    0     2          0       15500

The idea here is to get the sales as !Buy. This will convert Buy to a logical (0 => TRUE, 1 => FALSE) and then apply the NOT-operator (!) to it. In this way, 0 is converted to 1 and 1 is converted to 0. The same trick can be used when calculating the price.

The comparison of this solution to the others should also show you, that dplyr produces much more readable code.

like image 42
Stibu Avatar answered Oct 04 '22 13:10

Stibu