I have a pandas dataframe with a large number of columns and I need to find which columns are binary (with values 0 or 1 only) without looking at the data. Which function should be used?
In mathematics a binary operator or a dyadic operator is a function that combines two values to produce a new value. The binary operator function could perform addition, subtraction and so on to return the new value. Pandas. DataFrame has several binary operator functions defined for combining two DataFrames.
Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects.
To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column. And the column names of the DataFrame are represented as the index of the resultant series object and the corresponding data types are returned as values of the series object.
To my knowledge, there is no direct function to test for this. Rather, you need to build something based on how the data was encoded (e.g. 1/0, T/F, True/False, etc.). In addition, if your column has a missing value, the entire column will be encoded as a float instead of an int.
In the example below, I test whether all unique non null values are either '1' or '0'. It returns a list of all such columns.
df = pd.DataFrame({'bool': [1, 0, 1, None],
'floats': [1.2, 3.1, 4.4, 5.5],
'ints': [1, 2, 3, 4],
'str': ['a', 'b', 'c', 'd']})
bool_cols = [col for col in df
if df[[col]].dropna().unique().isin([0, 1]).all().values]
# 2019-09-10 EDIT (per Hardik Gupta)
bool_cols = [col for col in df
if np.isin(df[col].dropna().unique(), [0, 1]).all()]
>>> bool_cols
['bool']
>>> df[bool_cols]
bool
0 1
1 0
2 1
3 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With