I have a data frame like below
A_Name B_Detail Value_B Value_C Value_D ......
0 AA X1 1.2 0.5 -1.3 ......
1 BB Y1 0.76 -0.7 0.8 ......
2 CC Z1 0.7 -1.3 2.5 ......
3 DD L1 0.9 -0.5 0.4 ......
4 EE M1 1.3 1.8 -1.3 ......
5 FF N1 0.7 -0.8 0.9 ......
6 GG K1 -2.4 -1.9 2.1 ......
This is just a sample of data frame, I can have n number of columns like (Value_A, Value_B, Value_C, ........... Value_N)
Now i want to filter all rows where absolute value of all columns (Value_A, Value_B, Value_C, ....) is less than 1.
If you have limited number of columns, you can filter the data by simply putting 'and' condition on columns in dataframe, but I am not able to figure out what to do in this case.
I don't know what would be number of such columns, the only thing I know that such columns would be prefixed with 'Value'.
In above case output should be like
A_Name B_Detail Value_B Value_C Value_D ......
1 BB Y1 0.76 -0.7 0.8 ......
3 DD L1 0.9 -0.5 0.4 ......
5 FF N1 0.7 -0.8 0.9 ......
Use filter
with abs
and all
for creating mask
and then boolean indexing
:
mask = (df.filter(like='Value').abs() < 1).all(axis=1)
print (mask)
0 False
1 True
2 False
3 True
4 False
5 True
6 False
dtype: bool
print (df[mask])
A_Name B_Detail Value_B Value_C Value_D
1 BB Y1 0.76 -0.7 0.8
3 DD L1 0.90 -0.5 0.4
5 FF N1 0.70 -0.8 0.9
All combination in timings:
#len df = 70k, 5 columns
df = pd.concat([df]*10000).reset_index(drop=True)
In [47]: %timeit (df[(df.filter(like='Value').abs() < 1).all(axis=1)])
100 loops, best of 3: 7.48 ms per loop
In [48]: %timeit (df[df.filter(regex=r'Value').abs().lt(1).all(1)])
100 loops, best of 3: 7.02 ms per loop
In [49]: %timeit (df[df.filter(like='Value').abs().lt(1).all(1)])
100 loops, best of 3: 7.02 ms per loop
In [50]: %timeit (df[(df.filter(regex=r'Value').abs() < 1).all(axis=1)])
100 loops, best of 3: 7.3 ms per loop
#len df = 70k, 5k columns
df = pd.concat([df]*10000).reset_index(drop=True)
df = pd.concat([df]*1000, axis=1)
#only for testing, create unique columns names
df.columns = df.columns.str[:-1] + [str(col) for col in list(range(df.shape[1]))]
print (df)
In [75]: %timeit ((df[(df.filter(like='Value').abs() < 1).all(axis=1)]))
1 loop, best of 3: 10.3 s per loop
In [76]: %timeit ((df[(df.filter(regex=r'Value').abs() < 1).all(axis=1)]))
1 loop, best of 3: 10.3 s per loop
In [77]: %timeit (df[df.filter(regex=r'Value').abs().lt(1).all(1)])
1 loop, best of 3: 10.4 s per loop
In [78]: %timeit (df[df.filter(like='Value').abs().lt(1).all(1)])
1 loop, best of 3: 10.1 s per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With