I have a data.frame dat about car sells (Buy=0 in the data frame) and buys (Buy=1 in the data frame) of a used car seller. 
  Date       Buy   Price
29-06-2015    1    5000
29-06-2015    0    8000
29-06-2015    1    10000
30-06-2015    0    3500
30-06-2015    0    12000 
...          ...  ...
What I need is a new, aggregated data.frame that gives me the number of buys and the number of sells per day together with the summed prices of all the buys and sells for that day:
  Date      Buys   Sells   Price_Buys  Price_Sells
29-06-2015    2    1         15000        8000
30-06-2015    0    2           0          15500
...          ...  ...
I tried to use aggregate(dat$Buy, by=list(Date=dat$Date, FUN=sum)). However, I am still struggling how to aggregate the sells as well.
In order to use the aggregate function for mean in R, you will need to specify the numerical variable on the first argument, the categorical (as a list) on the second and the function to be applied (in this case mean ) on the third. An alternative is to specify a formula of the form: numerical ~ categorical .
Method 1: using Python for-loops. Function new_case_count() takes in DataFrame object, iterates over it and converts indexes, which are dates in string format, to Pandas Datetime format. Based on the date's day of the week, each week's new cases count is calculated and stored in a list.
Get Number of Rows in DataFrame You can use len(df. index) to find the number of rows in pandas DataFrame, df. index returns RangeIndex(start=0, stop=8, step=1) and use it on len() to get the count.
This can be done pretty cleanly in dplyr, grouping by date using group_by and then summarizing with summarize:
library(dplyr)
(out <- dat %>%
  group_by(Date) %>%
  summarize(Buys=sum(Buy == 1), Sells=sum(Buy == 0),
            Price_Buys=sum(Price[Buy == 1]), Price_Sells=sum(Price[Buy == 0])))
#         Date  Buys Sells Price_Buys Price_Sells
#       (fctr) (int) (int)      (int)       (int)
# 1 29-06-2015     2     1      15000        8000
# 2 30-06-2015     0     2          0       15500
You can now manipulate this object as you would a normal data frame, e.g. with something like:
out$newvar <- with(out, Sells*Price_Sells - Buys*Price_Buys)
out
# Source: local data frame [2 x 6]
#         Date  Buys Sells Price_Buys Price_Sells newvar
#       (fctr) (int) (int)      (int)       (int)  (int)
# 1 29-06-2015     2     1      15000        8000 -22000
# 2 30-06-2015     0     2          0       15500  31000
                        Using data.table V 1.9.6+ you can now provide a list of functions to the fun argument, so we can easily solve this with dcast (without specifying any conditions by hand)
library(data.table) # V1.9.6+
dcast(setDT(dat), Date ~ Buy , value.var = "Price", fun = list(length, sum))
#          Date Price_length_0 Price_length_1 Price_sum_0 Price_sum_1
# 1: 29-06-2015              1              2        8000       15000
# 2: 30-06-2015              2              0       15500           0
Or if we want to to try dplyr, a robust way of solving this (again, without specifying any conditions) could be
library(dplyr)
df %>%
  group_by(Date, Buy) %>%
  summarise_each(funs(sum, length), Price)
# Source: local data frame [3 x 4]
# Groups: Date [?]
# 
#         Date   Buy   sum length
#       (fctr) (int) (int)  (int)
# 1 29-06-2015     0  8000      1
# 2 29-06-2015     1 15000      2
# 3 30-06-2015     0 15500      2
                        You can use library dplyr to do this:
df %>% group_by(Date) %>% summarise(buys = sum(Buy == 1), sells = sum(Buy == 0), Price_Buys = sum(Price[Buy == 1]), Price_Sells = sum(Price[Buy == 0]))
Source: local data frame [2 x 5]
        Date  buys sells Price_Buys Price_Sells
      (fctr) (int) (int)      (int)       (int)
1 29-06-2015     2     1      15000        8000
2 30-06-2015     0     2          0       15500
                        I would use one of the dpylr solutions myself, but I think it is still noteworthy, that it can also be done with aggregate(), since this is how you started out:
aggregate(cbind(Buys = Buy, Sells = !Buy,
                Price_Buys = Price * Buy, Price_Sells = Price * !Buy) ~ Date,
          data = dat, sum)
##         Date Buys Sells Price_Buys Price_Sells
## 1 29-06-2015    2     1      15000        8000
## 2 30-06-2015    0     2          0       15500
The idea here is to get the sales as !Buy. This will convert Buy to a logical (0 => TRUE, 1 => FALSE) and then apply the NOT-operator (!) to it. In this way, 0 is converted to 1 and 1 is converted to 0. The same trick can be used when calculating the price.
The comparison of this solution to the others should also show you, that dplyr produces much more readable code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With