Using in operator with Pandas series [duplicate]

Tags:

1 Answers

In the first case:

Because the in operator is interpreted as a call to df['name'].__contains__('Adam'). If you look at the implementation of __contains__ in pandas.Series, you will find that it's the following (inhereted from pandas.core.generic.NDFrame) :

def __contains__(self, key):
    """True if the key is in the info axis"""
    return key in self._info_axis

so, your first use of in is interpreted as:

'Adam' in df['name']._info_axis

This gives False, expectedly, because df['name']._info_axis actually contains information about the range/index and not the data itself:

In [37]: df['name']._info_axis 
Out[37]: RangeIndex(start=0, stop=3, step=1)

In [38]: list(df['name']._info_axis) 
Out[38]: [0, 1, 2]

In the second case:

'Adam' in list(df['name'])

The use of list, converts the pandas.Series to a list of the values. So, the actual operation is this:

In [42]: list(df['name'])
Out[42]: ['Adam', 'Ben', 'Chris']

In [43]: 'Adam' in ['Adam', 'Ben', 'Chris']
Out[43]: True

Here are few more idiomatic ways to do what you want (with the associated speed):

In [56]: df.name.str.contains('Adam').any()
Out[56]: True

In [57]: timeit df.name.str.contains('Adam').any()
The slowest run took 6.25 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 144 µs per loop

In [58]: df.name.isin(['Adam']).any()
Out[58]: True

In [59]: timeit df.name.isin(['Adam']).any()
The slowest run took 5.13 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 191 µs per loop

In [60]: df.name.eq('Adam').any()
Out[60]: True

In [61]: timeit df.name.eq('Adam').any()
10000 loops, best of 3: 178 µs per loop

Note: the last way is also suggested by @Wen in the comment above

134

answered Nov 13 '22 06:11

Mohamed Ali JAMAOUI

Related questions
                            
                                Pandas: Collapse first n rows in each group by aggregation
                            
                                catch exception and return empty dataframe
                            
                                Dividing Pandas Dataframe by Week
                            
                                Issues with Python pandas: read_html and python3-lxml installation
                            
                                Pandas plot hist sharex=False does not behave as expected
                            
                                Parallelize pandas apply
                            
                                Create gantt chart with hlines?
                            
                                Pandas: read_csv ignore rows after a blank line
                            
                                How do I keep the timezone of my index when serializing/deserializing a Pandas DataFrame using JSON
                            
                                Pandas/Numpy Get matrix from column of arrays
                            
                                How to get the percent change of values in a dataframe while caring about NaN values?
                            
                                Combine multiple styles in pandas
                            
                                Pandas DataFrame eval with space in column names [duplicate]
                            
                                pandas- changing the start and end date of resampled timeseries
                            
                                Pandas Select last 20 days of data.
                            
                                How to aggregate a column by a value on another column?
                            
                                pandas cut multiple columns
                            
                                How to efficiently add multiple columns to pandas dataframe with values that depend on other columns
                            
                                Jupyter pandas.DataFrame output table format configuration
                            
                                pandas: Replicate / Broadcast single indexed DataFrame on MultiIndex DataFrame: HowTo and Memory Efficiency

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using in operator with Pandas series [duplicate]

Tags:

pandas

ceiling cat

People also ask

1 Answers

In the first case:

In the second case:

Mohamed Ali JAMAOUI

Recent Activity

Donate For Us