I have a dataframe with 3 columns tel1,tel2,tel3 I want to keep row that contains a specific value in one or more columns:
For exemple i want to keep row where columns tel1 or tel2 or tel3 start with '06'
How can i do that ? Thanks
Let's use this df as an example DataFrame:
In [54]: df = pd.DataFrame({'tel{}'.format(j):
['{:02d}'.format(i+j)
for i in range(10)] for j in range(3)})
In [71]: df
Out[71]:
tel0 tel1 tel2
0 00 01 02
1 01 02 03
2 02 03 04
3 03 04 05
4 04 05 06
5 05 06 07
6 06 07 08
7 07 08 09
8 08 09 10
9 09 10 11
You can find which values in df['tel0'] starts with '06' using
StringMethods.startswith:
In [72]: df['tel0'].str.startswith('06')
Out[72]:
0 False
1 False
2 False
3 False
4 False
5 False
6 True
7 False
8 False
9 False
Name: tel0, dtype: bool
To combine two boolean Series with logical-or, use |:
In [73]: df['tel0'].str.startswith('06') | df['tel1'].str.startswith('06')
Out[73]:
0 False
1 False
2 False
3 False
4 False
5 True
6 True
7 False
8 False
9 False
dtype: bool
Or, if you want to combine a list of boolean Series using logical-or, you could use reduce:
In [79]: import functools
In [80]: import numpy as np
In [80]: mask = functools.reduce(np.logical_or, [df['tel{}'.format(i)].str.startswith('06') for i in range(3)])
In [81]: mask
Out[81]:
0 False
1 False
2 False
3 False
4 True
5 True
6 True
7 False
8 False
9 False
Name: tel0, dtype: bool
Once you have the boolean mask, you can select the associated rows using df.loc:
In [75]: df.loc[mask]
Out[75]:
tel0 tel1 tel2
4 04 05 06
5 05 06 07
6 06 07 08
Note there are many other vectorized str methods besides startswith.
You might find str.contains useful for finding which rows contain a string. Note that str.contains interprets its argument as a regex pattern by default:
In [85]: df['tel0'].str.contains(r'6|7')
Out[85]:
0 False
1 False
2 False
3 False
4 False
5 False
6 True
7 True
8 False
9 False
Name: tel0, dtype: bool
I like to use dataframe.apply in such situations:
#search dataframe multip columns
#generate some random numbers
import random as r
rand_numbers = [[r.randint(100000, 9999999) for __ in range(3)] for _ in range(20)]
df = pd.DataFrame.from_records(rand_numbers, columns=['tel1','tel2','tel3'])
df.head()
#a really simple search function
#if you need speed use cpython here ;-)
def searchfilter(row, search='5'):
#df.apply returns the rows or columns as list
for string in row:
#string is a number here, so we must cast it.
if str(string).startswith(search):
return True
else:
return False
#apply the searchfunction to each row
result_bool_array =df.apply(searchfilter, axis=1) #the axis argument is to run it rowise
df[result_bool_array]
#other search with lambda in apply
result_bool_array =df.apply(lambda row: searchfilter(row, search='6'), axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With