I need to filter columns by the last character, testing against multiple characters.
import numpy as np
import pandas as pd
df = pd.read_table("F:\\bridges.txt", names = ['IDENTIF','RIVER', 'LOCATION', 'ERECTED', 'PURPOSE', 'LENGTH', 'LANES',
'CLEAR-G', 'T-OR-D', 'MATERIAL', 'SPAN', 'REL-L', 'TYPE'])
print(df.columns[df.columns.str.endswith('N' or 'H' or 's') ])
Output:
Index(['LOCATION', 'SPAN'], dtype='object')
Here I am not getting all columns ending with either N, H or s.
You can use pd.Index.str.endswith with a tuple, followed by Boolean indexing:
L = ['IDENTIF','RIVER', 'LOCATION', 'ERECTED', 'PURPOSE', 'LENGTH',
'LANES', 'CLEAR-G', 'T-OR-D', 'MATERIAL', 'SPAN', 'REL-L', 'TYPE']
df = pd.DataFrame(columns=L)
cols = df.columns[df.columns.str.endswith(tuple('HNS'))]
Index(['LOCATION', 'LENGTH', 'LANES', 'SPAN'], dtype='object')
The functionality mimics Python's built-in str.endswith, which allows you to supply a tuple to match against multiple items as alternative conditions.
[col for col in df.columns if col[-1] in ['N', 'H', 'S']]
If I remember correctly, the columns attribute of a dataframe is not a series so you can't treat it as such. It's a list.
To clarify, the columns aren't technically lists. They are some variation of a special type of pandas Index. But for 99% of all intents and purposes they can be treated as lists. The point I'm trying to make clear is that they are not Series and thus don't have series methods.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With