I'm having a hard time reorganizing this dataframe. I think I'm supposed to use pd.pivot_table
or pd.crosstab
, but I'm not sure how to get the job done.
Here is my DataFrame:
vicro = pd.read_csv(vicroURL)
vicro_subset = vicro.ix[:,['P1', 'P10', 'P30', 'P71', 'P82', 'P90']]
In [6]: vicro
vicro vicroURL vicro_subset
In [6]: vicro_subset.head()
Out[6]:
P1 P10 P30 P71 P82 P90
0 - I - - - M
1 - I - V T M
2 - I - V A M
3 - I - T - M
4 - - - - A -
What I what to do is take all possible values in this data frame and make them into rows. The new values will be counts. Something that would look like:
Out[6]:
P1 P10 P30 P71 P82 P90
I 0 4 0 0 0 0
V 0 0 0 2 0 0
A 0 0 0 0 2 0
M 0 0 0 0 0 4
T 0 0 0 1 1 0
Any help would be greatly appreciated! Thank you.
Edit: Elaborating on answer using melt, both helped me understand pandas a bit more, but there was more unknowns for me in "melt" answer:
In [8]: melted_df = pd.melt(vicro_subset)
In [9]: melted_df.head()
Out[9]:
variable value
0 P1 -
1 P1 -
2 P1 -
3 P1 -
4 P1 -
In [13]: grouped_melt = melted_df.groupby(['variable','value'])['value'].count()
In [14]: grouped_melt.head()
Out[14]:
variable value
P1 - 797
. 269
P10 - 339
. 1
F 132
In [15]: unstacked_group = grouped_melt.unstack()
In [16]: unstacked_group.head()
Out[16]:
<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, P1 to P82
Data columns:
- 5 non-null values
. 2 non-null values
A 1 non-null values
AITV 1 non-null values
AT 2 non-null values
In [17]: transpose_unstack = unstacked_group.T
In [18]: transpose_unstack.head()
Out[18]:
variable P1 P10 P30 P71 P82 P90
value
- 797 339 1005 452 604 634
. 269 1 NaN NaN NaN NaN
A NaN NaN NaN NaN 282 NaN
AITV NaN NaN NaN NaN 1 NaN
AT NaN NaN NaN 1 2 NaN
Alternatively, something like this should work:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: df = pd.DataFrame(np.random.randint(0,5,12).reshape(3,4),
columns=list('abcd'))
In [4]: print df
a b c d
0 2 2 3 1
1 0 1 0 2
2 1 3 0 4
In [5]: new = pd.concat([df[col].value_counts() for col in df.columns], axis=1)
In [6]: new.columns = df.columns
In [7]: print new
a b c d
0 1 NaN 2 NaN
1 1 1 NaN 1
2 1 1 NaN 1
3 NaN 1 1 NaN
4 NaN NaN NaN 1
I figured the key is to use melt
, and a bit of acrobatics afterwards. So here's your DataFrame:
In [21]: df
Out[21]:
P1 P10 P30 P71 P82 P90
0 - I - - - M
1 - I - V T M
2 - I - V A M
3 - I - T - M
4 - - - - A -
Now if you do the following (you might want to step it through in IPython to see the intermediate results):
In [22]: pd.melt(df).groupby(['variable', 'value'])['value'].count().unstack().T
.fillna(0)
Out[22]:
variable P1 P10 P30 P71 P82 P90
value
- 5 1 5 2 2 1
A 0 0 0 0 2 0
I 0 4 0 0 0 0
M 0 0 0 0 0 4
T 0 0 0 1 1 0
V 0 0 0 2 0 0
Say you save the result in df2
, you can then remove the '-' row:
In [25]: df2.drop('-')
Out[25]:
variable P1 P10 P30 P71 P82 P90
value
A 0 0 0 0 2 0
I 0 4 0 0 0 0
M 0 0 0 0 0 4
T 0 0 0 1 1 0
V 0 0 0 2 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With