Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find string in multiple columns ?

Tags:

pandas

I have a dataframe with 3 columns tel1,tel2,tel3 I want to keep row that contains a specific value in one or more columns:

For exemple i want to keep row where columns tel1 or tel2 or tel3 start with '06'

How can i do that ? Thanks

like image 321
steboc Avatar asked Apr 02 '26 18:04

steboc


2 Answers

Let's use this df as an example DataFrame:

In [54]: df = pd.DataFrame({'tel{}'.format(j): 
                            ['{:02d}'.format(i+j) 
                             for i in range(10)] for j in range(3)})

In [71]: df
Out[71]: 
  tel0 tel1 tel2
0   00   01   02
1   01   02   03
2   02   03   04
3   03   04   05
4   04   05   06
5   05   06   07
6   06   07   08
7   07   08   09
8   08   09   10
9   09   10   11

You can find which values in df['tel0'] starts with '06' using StringMethods.startswith:

In [72]: df['tel0'].str.startswith('06')
Out[72]: 
0    False
1    False
2    False
3    False
4    False
5    False
6     True
7    False
8    False
9    False
Name: tel0, dtype: bool

To combine two boolean Series with logical-or, use |:

In [73]: df['tel0'].str.startswith('06') | df['tel1'].str.startswith('06')
Out[73]: 
0    False
1    False
2    False
3    False
4    False
5     True
6     True
7    False
8    False
9    False
dtype: bool

Or, if you want to combine a list of boolean Series using logical-or, you could use reduce:

In [79]: import functools
In [80]: import numpy as np
In [80]: mask = functools.reduce(np.logical_or, [df['tel{}'.format(i)].str.startswith('06') for i in range(3)])

In [81]: mask
Out[81]: 
0    False
1    False
2    False
3    False
4     True
5     True
6     True
7    False
8    False
9    False
Name: tel0, dtype: bool

Once you have the boolean mask, you can select the associated rows using df.loc:

In [75]: df.loc[mask]
Out[75]: 
  tel0 tel1 tel2
4   04   05   06
5   05   06   07
6   06   07   08

Note there are many other vectorized str methods besides startswith. You might find str.contains useful for finding which rows contain a string. Note that str.contains interprets its argument as a regex pattern by default:

In [85]: df['tel0'].str.contains(r'6|7')
Out[85]: 
0    False
1    False
2    False
3    False
4    False
5    False
6     True
7     True
8    False
9    False
Name: tel0, dtype: bool
like image 54
unutbu Avatar answered Apr 11 '26 23:04

unutbu


I like to use dataframe.apply in such situations:


#search dataframe multip columns

#generate some random numbers
import random as r
rand_numbers = [[r.randint(100000, 9999999) for __ in range(3)] for _ in range(20)]
df = pd.DataFrame.from_records(rand_numbers, columns=['tel1','tel2','tel3'])

df.head()

#a really simple search function
#if you need speed use cpython here ;-)
def searchfilter(row, search='5'):
    #df.apply returns the rows or columns as list
    for string in row:
        #string is a number here, so we must cast it.
        if str(string).startswith(search):
            return True
        else:
            return False

#apply the searchfunction to each row    
result_bool_array =df.apply(searchfilter, axis=1) #the axis argument is to run it rowise

df[result_bool_array]
#other search with lambda in apply
result_bool_array =df.apply(lambda row: searchfilter(row, search='6'), axis=1)
like image 34
PlagTag Avatar answered Apr 12 '26 00:04

PlagTag



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!