Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which columns are binary in a Pandas DataFrame?

I have a pandas dataframe with a large number of columns and I need to find which columns are binary (with values 0 or 1 only) without looking at the data. Which function should be used?

like image 311
na899 Avatar asked Oct 07 '15 00:10

na899


People also ask

What are binary operators in pandas?

In mathematics a binary operator or a dyadic operator is a function that combines two values to produce a new value. The binary operator function could perform addition, subtraction and so on to return the new value. Pandas. DataFrame has several binary operator functions defined for combining two DataFrames.

What are the columns in pandas DataFrame?

Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects.

Where is Dtype of pandas column?

To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column. And the column names of the DataFrame are represented as the index of the resultant series object and the corresponding data types are returned as values of the series object.


1 Answers

To my knowledge, there is no direct function to test for this. Rather, you need to build something based on how the data was encoded (e.g. 1/0, T/F, True/False, etc.). In addition, if your column has a missing value, the entire column will be encoded as a float instead of an int.

In the example below, I test whether all unique non null values are either '1' or '0'. It returns a list of all such columns.

df = pd.DataFrame({'bool': [1, 0, 1, None], 
                   'floats': [1.2, 3.1, 4.4, 5.5], 
                   'ints': [1, 2, 3, 4], 
                   'str': ['a', 'b', 'c', 'd']})

bool_cols = [col for col in df 
             if df[[col]].dropna().unique().isin([0, 1]).all().values]

# 2019-09-10 EDIT (per Hardik Gupta)
bool_cols = [col for col in df 
             if np.isin(df[col].dropna().unique(), [0, 1]).all()]

>>> bool_cols
['bool']

>>> df[bool_cols]
   bool
0     1
1     0
2     1
3   NaN
like image 89
Alexander Avatar answered Sep 28 '22 05:09

Alexander