I have a df that I scraped from coinmarketcap. I am trying to calculate volitlity metrics for the close_price column but when I use a groupby I'm getting an error message:
final_coin_data['vol'] = final_coin_data.groupby('coin_name')['close_price'].rolling(window=30).std()
TypeError: incompatible index of inserted column with frame index
df structure (the 'Unnamed:0' came after I loaded my CSV):
Unnamed: 0 close_price coin_name date high_price low_price market_cap open_price volume
0 1 9578.63 Bitcoin Mar 11, 2018 9711.89 8607.12 149,716,000,000 8852.78 6,296,370,000
1 2 8866.00 Bitcoin Mar 10, 2018 9531.32 8828.47 158,119,000,000 9350.59 5,386,320,000
2 3 9337.55 Bitcoin Mar 09, 2018 9466.35 8513.03 159,185,000,000 9414.69 8,704,190,000
3 1 9578.63 Monero Mar 11, 2018 9711.89 8607.12 149,716,000,000 8852.78 6,296,370,000
4 2 8866.00 Monero Mar 10, 2018 9531.32 8828.47 158,119,000,000 9350.59 5,386,320,000
5 3 9337.55 Monero Mar 09, 2018 9466.35 8513.03 159,185,000,000 9414.69 8,704,190,000
(ignore the incorrect prices, this is the basics of the df)
When using the following code:
final_coin_data1['vol'] = final_coin_data.groupby('coin_name')['close_price'].rolling(window=30).std().reset_index(0,drop=True)
I got a MemoryError. I thought I was using groupby correctly. If I take out the final_coin_data1['vol'] =
then I get a series which appears correct, but it won't let me insert back into the df.
When I first started this project. I had just 1 coin and used the code below and it calculated volatility no problem.
final_coin_data1['vol'] = final_coin_data['close_price'].rolling(window=30).std()
Although Groupby is much faster than Pandas GroupBy. apply and GroupBy. transform with user-defined functions, Pandas is much faster with common functions like mean and sum because they are implemented in Cython.
The “group by” process: split-apply-combine (1) Splitting the data into groups. (2). Applying a function to each group independently, (3) Combining the results into a data structure.
Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.
When I ran this,
final_coin_data['close_price'].rolling(window=30).std()
an index column and result column are generated. When I tried to merge back to the original df as a new column final_coin_data1['vol']
I was getting an error TypeError: incompatible index of inserted column with frame index
so to correct this, I reset_index(drop=True)
then this eliminated the index which allowed the result to be joined on the final_coin_data1['vol']
.
The final functioning code looks like this:
final_coin_data1['vol'] = final_coin_data.groupby('coin_name')['close_price'].rolling(window=30).std().reset_index(0,drop=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With