Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering a dataframe by column name based on multiple conditions

I have a pandas dataframe with a number of columns and I want to filter the dataframe based on the column names but using two different criteria. I tried to us df.filter with both items and regex specified but that is not allowed.

If the column names are "User name", "XYZ 1001", "XYZ 1002", "XYY 1001", "XYY 1002", "XZZ 1001" and "XZZ 1002". I want to be able to filter the dataframe to only include columns where the column name is equal to "User name" OR contains the sub string XYZ.

like image 459
notquitethere04 Avatar asked Oct 26 '25 23:10

notquitethere04


1 Answers

Use DataFrame.filter with regex parameter and regex:

c = ["User name", "XYZ 1001", "XYZ 1002", "XYY 1001", "XYY 1002", "XZZ 1001"]
df = pd.DataFrame(columns=c)
print (df)
Empty DataFrame
Columns: [User name, XYZ 1001, XYZ 1002, XYY 1001, XYY 1002, XZZ 1001]
Index: []

df = df.filter(regex='User name|XYZ')
print (df)
Empty DataFrame
Columns: [User name, XYZ 1001, XYZ 1002]
Index: []

If need exact match User name with specify ^ for start of string and $ for end of columns names:

df = df.filter(regex='^User name$|XYZ')
like image 74
jezrael Avatar answered Oct 29 '25 14:10

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!