I cannot merge dataframes and cannot understand why:
Simple dataframe
df1 = pd.DataFrame({'id': np.random.randint(1,5,100),
'c': np.random.random(100),
's': np.random.random(100)})
grouped to 3 groupes
grouped = pd.qcut(df1.c, 3)
df_grouped = df1.groupby([grouped, 'id'])
df_cross = df_grouped['s'].sum()
df_unstacked = df_cross.unstack(level=0)
df_unstacked
Out:
c [0.018, 0.372] (0.372, 0.771] (0.771, 0.995]
id
1 3.081537 6.329819 3.386422
2 4.270542 2.553301 3.778536
3 3.125476 2.525016 3.013912
4 5.762223 3.763183 7.953551
Second simple dataframe:
df2 = pd.DataFrame({'one': range(5),
'two': np.random.randint(1,5,5),
'three': ['a', 'a', 'a', 'b', 'b']})
one three two
0 0 a 4
1 1 a 2
2 2 a 1
3 3 b 2
4 4 b 2
Trying to merge both:
pd.merge(df_unstacked, df2, left_index=True, right_on='one')
I would expect:
c [0.018, 0.372] (0.372, 0.771] (0.771, 0.995] one three two
id
1 3.081537 6.329819 3.386422 1 a 2
2 4.270542 2.553301 3.778536 2 a 1
3 3.125476 2.525016 3.013912 3 b 2
4 5.762223 3.763183 7.953551 4 b 2
But I get TypeError:
TypeError: cannot append a non-category item to a CategoricalIndex
Also, trying to reset_index() on df_unstacked, gives TypeError:
TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category
Making a .copy() does not help :) what to do?
p.s. pandas 0.17.1
One way to make this work is to switch the order of the left and right tables. Pandas allows you to join the Categorical columns to non-Categorical ones, but not the other way around.
pd.merge(df2,df_unstacked, right_index=True, left_on='one')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With