I have a dataframe df
with the following structure:
val newidx Code
Idx
0 1.0 1220121127 706
1 1.0 1220121030 706
2 1.0 1620120122 565
It has 1000000 lines.
In total we have 600 unique Code
value and 200000 unique newidx
values.
If I perform the following operation
df.pivot_table(values='val', index='newidx', columns='Code', aggfunc='max')
I get a MemoryError
. but this sounds strange as the size of the resulting dataframe should be sustainable: 200000x600.
How much memory requires such operation? Is there a way to fix this memory error?
Try to see if this fits in your memory:
df.groupby(['newidx', 'Code'])['val'].max().unstack()
pivot_table
is unfortunately very memory intensive as it may make multiple copies of data.
If the groupby
does not work, you will have to split your DataFrame into smaller pieces. Try not to assign multiple times. For example, if reading from csv:
df = pd.read_csv('file.csv').groupby(['newidx', 'Code'])['val'].max().unstack()
avoids multiple assignments.
I've had a very similar problem when carrying out a merge between 4 dataframes recently.
What worked for me was disabling the index during the groupby, then merging.
if @Kartiks answer doesn't work, try this before chunking the DataFrame.
df.groupby(['newidx', 'Code'], as_index=False)['val'].max().unstack()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With