How can I add to my crosstab an additional row and an additional column for the totals?
df = pd.DataFrame({"A": np.random.randint(0,2,100), "B" : np.random.randint(0,2,100)})
ct = pd.crosstab(new.A, new.B)
ct
I thought I would add the new column (obtained by summing over the rows) by
ct["Total"] = ct.0 + ct.1
but this does not work.
In fact pandas.crosstab
already provides an option margins
, which does exactly what you want.
> df = pd.DataFrame({"A": np.random.randint(0,2,100), "B" : np.random.randint(0,2,100)})
> pd.crosstab(df.A, df.B, margins=True)
B 0 1 All
A
0 26 21 47
1 25 28 53
All 51 49 100
Basically, by setting margins=True
, the resulting frequency table will add an "All" column and an "All" row that compute the subtotals.
This is because 'attribute-like' column access does not work with integer column names. Using the standard indexing:
In [122]: ct["Total"] = ct[0] + ct[1]
In [123]: ct
Out[123]:
B 0 1 Total
A
0 26 24 50
1 30 20 50
See the warnings at the end of this section in the docs: http://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
When you want to work with the rows, you can use .loc
:
In [126]: ct.loc["Total"] = ct.loc[0] + ct.loc[1]
In this case ct.loc["Total"]
is equivalent to ct.loc["Total", :]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With