I want to get the names of the columns which have same values across all rows for each column.
My data:
A B C D
0 1 hi 2 a
1 3 hi 2 b
2 4 hi 2 c
Desired output:
['B', 'C']
Code:
import pandas as pd
d = {'A': [1,3,4], 'B': ['hi','hi','hi'], 'C': [2,2,2], 'D': ['a','b','c']}
df = pd.DataFrame(data=d)
I've been playing around with df.columns
and .any()
, but can't figure out how to do this.
You can use one of the following methods to select rows in a pandas DataFrame based on column values: df.loc[df ['col1'].isin( [value1, value2, value3, ...])] The following example shows how to use each method with the following pandas DataFrame:
Pandas DataFrame consists of three principal components, the data, rows, and columns. Column in DataFrame : In Order to pick a column in Pandas DataFrame, we will either access the columns by calling them by their columns name.
This can be very useful in many situations, suppose we have to get marks of all the students in a particular subject, get phone numbers of all employees, etc. Let’s see how we can achieve this with the help of some examples. Example 1: We can have all values of a column in a list, by using the tolist () method.
The following syntax shows how to select all rows of the DataFrame that contain the value 25 in any of the columns: df [df.isin( [25]).any(axis=1)] points assists rebounds 0 25 5 11 The following syntax shows how to select all rows of the DataFrame that contain the values 25, 9, or 6 in any of the columns:
Use the pandas not-so-well-known builtin nunique()
:
df.columns[df.nunique() <= 1]
Index(['B', 'C'], dtype='object')
Notes:
nunique(dropna=False)
option if you want na's counted as a separate valueSolution 1:
c = [c for c in df.columns if len(set(df[c])) == 1]
print (c)
['B', 'C']
Solution 2:
c = df.columns[df.eq(df.iloc[0]).all()].tolist()
print (c)
['B', 'C']
Explanation for Solution 2:
First compare all rows to the first row with DataFrame.eq
...
print (df.eq(df.iloc[0]))
A B C D
0 True True True True
1 False True True False
2 False True True False
... then check each column is all True
s with DataFrame.all
...
print (df.eq(df.iloc[0]).all())
A False
B True
C True
D False
dtype: bool
... finally filter columns' names for which result is True:
print (df.columns[df.eq(df.iloc[0]).all()])
Index(['B', 'C'], dtype='object')
Timings:
np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(1000,100)))
df[np.random.randint(100, size=20)] = 100
print (df)
# Solution 1 (second-fastest):
In [243]: %timeit ([c for c in df.columns if len(set(df[c])) == 1])
3.59 ms ± 43.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Solution 2 (fastest):
In [244]: %timeit df.columns[df.eq(df.iloc[0]).all()].tolist()
1.62 ms ± 13.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#Mohamed Thasin ah solution
In [245]: %timeit ([col for col in df.columns if len(df[col].unique())==1])
6.8 ms ± 352 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#jpp solution
In [246]: %%timeit
...: vals = df.apply(set, axis=0)
...: res = vals[vals.map(len) == 1].index
...:
5.59 ms ± 64.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#smci solution 1
In [275]: %timeit df.columns[ df.nunique()==1 ]
11 ms ± 105 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#smci solution 2
In [276]: %timeit [col for col in df.columns if not df[col].is_unique]
9.25 ms ± 80 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#smci solution 3
In [277]: %timeit df.columns[ df.apply(lambda col: not col.is_unique) ]
11.1 ms ± 511 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With