Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Missing data in pandas.crosstab

Tags:

python

pandas

I'm making some crosstabs with pandas:

a = np.array(['foo', 'foo', 'foo', 'bar', 'bar', 'foo', 'foo'], dtype=object)
b = np.array(['one', 'one', 'two', 'one', 'two', 'two', 'two'], dtype=object)
c = np.array(['dull', 'dull', 'dull', 'dull', 'dull', 'shiny', 'shiny'], dtype=object)

pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])

b     one   two       
c    dull  dull  shiny
a                     
bar     1     1      0
foo     2     1      2

But what I actually want is the following:

b     one        two       
c    dull  shiny dull  shiny
a                     
bar     1     0    1      0
foo     2     0    1      2

I found workaround by adding new column and set levels as new MultiIndex, but it seems to be difficult...

Is there any way to pass MultiIndex to crosstabs function to predefine output columns?

like image 751
norecces Avatar asked Jun 08 '13 19:06

norecces


People also ask

How do you deal with missing values in pandas?

In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.

How do I get rid of missing rows in pandas?

DataFrame. dropna() also gives you the option to remove the rows by searching for null or missing values on specified columns. To search for null values in specific columns, pass the column names to the subset parameter. It can take a list of column names or column positions.

How do you use the crosstab in pandas?

crosstab() function in Python. This method is used to compute a simple cross-tabulation of two (or more) factors. By default, computes a frequency table of the factors unless an array of values and an aggregation function are passed.

What is pandas missing value?

Within pandas, a missing value is denoted by NaN . In most cases, the terms missing and null are interchangeable, but to abide by the standards of pandas, we'll continue using missing throughout this tutorial.


1 Answers

The crosstab function has a parameter called dropna which is set to True by default. This parameter defines whether empty columns (such as the one-shiny column) should be displayed or not.

I tried calling the funcion like this:

pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'], dropna = False)

and this is what I got:

b     one          two       
c    dull  shiny  dull  shiny
a                            
bar     1      0     1      0
foo     2      0     1      2

Hope that was still helpful.

like image 174
Pintas Avatar answered Oct 05 '22 06:10

Pintas