I have a pandas dataframe with a number of columns and I want to filter the dataframe based on the column names but using two different criteria. I tried to us df.filter with both items and regex specified but that is not allowed.
If the column names are "User name", "XYZ 1001", "XYZ 1002", "XYY 1001", "XYY 1002", "XZZ 1001" and "XZZ 1002". I want to be able to filter the dataframe to only include columns where the column name is equal to "User name" OR contains the sub string XYZ.
Use DataFrame.filter with regex parameter and regex:
c = ["User name", "XYZ 1001", "XYZ 1002", "XYY 1001", "XYY 1002", "XZZ 1001"]
df = pd.DataFrame(columns=c)
print (df)
Empty DataFrame
Columns: [User name, XYZ 1001, XYZ 1002, XYY 1001, XYY 1002, XZZ 1001]
Index: []
df = df.filter(regex='User name|XYZ')
print (df)
Empty DataFrame
Columns: [User name, XYZ 1001, XYZ 1002]
Index: []
If need exact match User name with specify ^ for start of string and $ for end of columns names:
df = df.filter(regex='^User name$|XYZ')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With