Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is groupby and rolling not working together?

I have a df that I scraped from coinmarketcap. I am trying to calculate volitlity metrics for the close_price column but when I use a groupby I'm getting an error message:

final_coin_data['vol'] = final_coin_data.groupby('coin_name')['close_price'].rolling(window=30).std()
TypeError: incompatible index of inserted column with frame index

df structure (the 'Unnamed:0' came after I loaded my CSV):

    Unnamed: 0  close_price coin_name   date            high_price  low_price    market_cap         open_price  volume
0   1           9578.63     Bitcoin     Mar 11, 2018    9711.89     8607.12      149,716,000,000    8852.78     6,296,370,000
1   2           8866.00     Bitcoin     Mar 10, 2018    9531.32     8828.47      158,119,000,000    9350.59     5,386,320,000
2   3           9337.55     Bitcoin     Mar 09, 2018    9466.35     8513.03      159,185,000,000    9414.69     8,704,190,000
3   1           9578.63     Monero      Mar 11, 2018    9711.89     8607.12      149,716,000,000    8852.78     6,296,370,000
4   2           8866.00     Monero      Mar 10, 2018    9531.32     8828.47      158,119,000,000    9350.59     5,386,320,000
5   3           9337.55     Monero      Mar 09, 2018    9466.35     8513.03      159,185,000,000    9414.69     8,704,190,000

(ignore the incorrect prices, this is the basics of the df)

When using the following code:

final_coin_data1['vol'] = final_coin_data.groupby('coin_name')['close_price'].rolling(window=30).std().reset_index(0,drop=True)

I got a MemoryError. I thought I was using groupby correctly. If I take out the final_coin_data1['vol'] = then I get a series which appears correct, but it won't let me insert back into the df.

When I first started this project. I had just 1 coin and used the code below and it calculated volatility no problem.

 final_coin_data1['vol'] = final_coin_data['close_price'].rolling(window=30).std()
like image 747
Martin Bobak Avatar asked Mar 12 '18 23:03

Martin Bobak


People also ask

Is GroupBy faster on index pandas?

Although Groupby is much faster than Pandas GroupBy. apply and GroupBy. transform with user-defined functions, Pandas is much faster with common functions like mean and sum because they are implemented in Cython.

What are the three phases of the pandas GroupBy () function?

The “group by” process: split-apply-combine (1) Splitting the data into groups. (2). Applying a function to each group independently, (3) Combining the results into a data structure.

Does pandas GroupBy keep order?

Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.

Can you use GroupBy with multiple columns in pandas?

groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.


1 Answers

When I ran this,

final_coin_data['close_price'].rolling(window=30).std()

an index column and result column are generated. When I tried to merge back to the original df as a new column final_coin_data1['vol'] I was getting an error TypeError: incompatible index of inserted column with frame index so to correct this, I reset_index(drop=True) then this eliminated the index which allowed the result to be joined on the final_coin_data1['vol'].

The final functioning code looks like this:

final_coin_data1['vol'] = final_coin_data.groupby('coin_name')['close_price'].rolling(window=30).std().reset_index(0,drop=True)
like image 73
Martin Bobak Avatar answered Oct 03 '22 17:10

Martin Bobak