I have a data.frame dat
about car sells (Buy=0
in the data frame) and buys (Buy=1
in the data frame) of a used car seller.
Date Buy Price
29-06-2015 1 5000
29-06-2015 0 8000
29-06-2015 1 10000
30-06-2015 0 3500
30-06-2015 0 12000
... ... ...
What I need is a new, aggregated data.frame that gives me the number of buys and the number of sells per day together with the summed prices of all the buys and sells for that day:
Date Buys Sells Price_Buys Price_Sells
29-06-2015 2 1 15000 8000
30-06-2015 0 2 0 15500
... ... ...
I tried to use aggregate(dat$Buy, by=list(Date=dat$Date, FUN=sum))
. However, I am still struggling how to aggregate the sells as well.
In order to use the aggregate function for mean in R, you will need to specify the numerical variable on the first argument, the categorical (as a list) on the second and the function to be applied (in this case mean ) on the third. An alternative is to specify a formula of the form: numerical ~ categorical .
Method 1: using Python for-loops. Function new_case_count() takes in DataFrame object, iterates over it and converts indexes, which are dates in string format, to Pandas Datetime format. Based on the date's day of the week, each week's new cases count is calculated and stored in a list.
Get Number of Rows in DataFrame You can use len(df. index) to find the number of rows in pandas DataFrame, df. index returns RangeIndex(start=0, stop=8, step=1) and use it on len() to get the count.
This can be done pretty cleanly in dplyr
, grouping by date using group_by
and then summarizing with summarize
:
library(dplyr)
(out <- dat %>%
group_by(Date) %>%
summarize(Buys=sum(Buy == 1), Sells=sum(Buy == 0),
Price_Buys=sum(Price[Buy == 1]), Price_Sells=sum(Price[Buy == 0])))
# Date Buys Sells Price_Buys Price_Sells
# (fctr) (int) (int) (int) (int)
# 1 29-06-2015 2 1 15000 8000
# 2 30-06-2015 0 2 0 15500
You can now manipulate this object as you would a normal data frame, e.g. with something like:
out$newvar <- with(out, Sells*Price_Sells - Buys*Price_Buys)
out
# Source: local data frame [2 x 6]
# Date Buys Sells Price_Buys Price_Sells newvar
# (fctr) (int) (int) (int) (int) (int)
# 1 29-06-2015 2 1 15000 8000 -22000
# 2 30-06-2015 0 2 0 15500 31000
Using data.table
V 1.9.6+ you can now provide a list of functions to the fun
argument, so we can easily solve this with dcast
(without specifying any conditions by hand)
library(data.table) # V1.9.6+
dcast(setDT(dat), Date ~ Buy , value.var = "Price", fun = list(length, sum))
# Date Price_length_0 Price_length_1 Price_sum_0 Price_sum_1
# 1: 29-06-2015 1 2 8000 15000
# 2: 30-06-2015 2 0 15500 0
Or if we want to to try dplyr
, a robust way of solving this (again, without specifying any conditions) could be
library(dplyr)
df %>%
group_by(Date, Buy) %>%
summarise_each(funs(sum, length), Price)
# Source: local data frame [3 x 4]
# Groups: Date [?]
#
# Date Buy sum length
# (fctr) (int) (int) (int)
# 1 29-06-2015 0 8000 1
# 2 29-06-2015 1 15000 2
# 3 30-06-2015 0 15500 2
You can use library dplyr
to do this:
df %>% group_by(Date) %>% summarise(buys = sum(Buy == 1), sells = sum(Buy == 0), Price_Buys = sum(Price[Buy == 1]), Price_Sells = sum(Price[Buy == 0]))
Source: local data frame [2 x 5]
Date buys sells Price_Buys Price_Sells
(fctr) (int) (int) (int) (int)
1 29-06-2015 2 1 15000 8000
2 30-06-2015 0 2 0 15500
I would use one of the dpylr
solutions myself, but I think it is still noteworthy, that it can also be done with aggregate()
, since this is how you started out:
aggregate(cbind(Buys = Buy, Sells = !Buy,
Price_Buys = Price * Buy, Price_Sells = Price * !Buy) ~ Date,
data = dat, sum)
## Date Buys Sells Price_Buys Price_Sells
## 1 29-06-2015 2 1 15000 8000
## 2 30-06-2015 0 2 0 15500
The idea here is to get the sales as !Buy
. This will convert Buy
to a logical (0 => TRUE
, 1 => FALSE
) and then apply the NOT-operator (!) to it. In this way, 0 is converted to 1 and 1 is converted to 0. The same trick can be used when calculating the price.
The comparison of this solution to the others should also show you, that dplyr
produces much more readable code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With