I want to select rows that the values do not start with some str. For example, I have a pandas <code>df</code>, and I want to select data do not start with <code>t</code>, and <code>c</code>. In this sample, the output should be <code>mext1</code> and <code>okl1</code>. <pre class="prettyprint"><code>import pandas as pd df=pd.DataFrame({'col':['text1','mext1','cext1','okl1']}) df col 0 text1 1 mext1 2 cext1 3 okl1 </code></pre> I want this: <pre class="prettyprint"><code> col 0 mext1 1 okl1 </code></pre>

You can use the str accessor to get string functionality. The <code>get</code> method can grab a given index of the string. <pre class="prettyprint"><code>df[~df.col.str.get(0).isin(['t', 'c'])] col 1 mext1 3 okl1 </code></pre> Looks like you can use <code>startswith</code> as well with a tuple (and not a list) of the values you want to exclude. <pre class="prettyprint"><code>df[~df.col.str.startswith(('t', 'c'))] </code></pre>

You can use <code>str.startswith</code> and negate it. <pre class="prettyprint"><code> df[~df['col'].str.startswith('t') & ~df['col'].str.startswith('c')] col 1 mext1 3 okl1 </code></pre> Or the better option, with multiple characters in a tuple as per @Ted Petrou: <pre class="prettyprint"><code>df[~df['col'].str.startswith(('t','c'))] col 1 mext1 3 okl1 </code></pre>

How to select rows that do not start with some str in pandas?

Tags:

python

pandas

numpy

I want to select rows that the values do not start with some str. For example, I have a pandas df, and I want to select data do not start with t, and c. In this sample, the output should be mext1 and okl1.

import pandas as pd

df=pd.DataFrame({'col':['text1','mext1','cext1','okl1']})
df

    col
0   text1
1   mext1
2   cext1
3   okl1

I want this:

    col
0   mext1
1   okl1

764

asked Jan 17 '17 05:01

running man

Video Answer

3 Answers

You can use the str accessor to get string functionality. The get method can grab a given index of the string.

df[~df.col.str.get(0).isin(['t', 'c'])]

     col
1  mext1
3   okl1

Looks like you can use startswith as well with a tuple (and not a list) of the values you want to exclude.

df[~df.col.str.startswith(('t', 'c'))]

115

answered Oct 22 '22 03:10

Ted Petrou

option 1
use str.match and negative look ahead

df[df.col.str.match('^(?![tc])')]

option 2
within query

df.query('col.str[0] not list("tc")')

option 3
numpy broadcasting

df[(df.col.str[0][:, None] == ['t', 'c']).any(1)]

         col
1  mext1
3   okl1

time testing

def ted(df):
    return df[~df.col.str.get(0).isin(['t', 'c'])]

def adele(df):
    return df[~df['col'].str.startswith(('t','c'))]

def yohanes(df):
    return df[df.col.str.contains('^[^tc]')]

def pir1(df):
    return df[df.col.str.match('^(?![tc])')]

def pir2(df):
    return df.query('col.str[0] not in list("tc")')

def pir3(df):
    df[(df.col.str[0][:, None] == ['t', 'c']).any(1)]

functions = pd.Index(['ted', 'adele', 'yohanes', 'pir1', 'pir2', 'pir3'], name='Method')
lengths = pd.Index([10, 100, 1000, 5000, 10000], name='Length')
results = pd.DataFrame(index=lengths, columns=functions)

from string import ascii_lowercase

for i in lengths:
    a = np.random.choice(list(ascii_lowercase), i)
    df = pd.DataFrame(dict(col=a))
    for j in functions:
        results.set_value(
            i, j,
            timeit(
                '{}(df)'.format(j),
                'from __main__ import df, {}'.format(j),
                number=1000
            )
        )

fig, axes = plt.subplots(3, 1, figsize=(8, 12))
results.plot(ax=axes[0], title='All Methods')
results.drop('pir2', 1).plot(ax=axes[1], title='Drop `pir2`')
results[['ted', 'adele', 'pir3']].plot(ax=axes[2], title='Just the fast ones')
fig.tight_layout()

enter image description here

answered Oct 22 '22 03:10

piRSquared

You can use str.startswith and negate it.

    df[~df['col'].str.startswith('t') & 
       ~df['col'].str.startswith('c')]

col
1   mext1
3   okl1

Or the better option, with multiple characters in a tuple as per @Ted Petrou:

df[~df['col'].str.startswith(('t','c'))]

    col
1   mext1
3   okl1

answered Oct 22 '22 02:10

nipy

Related questions
                            
                                Python empty constructor
                            
                                in Python, How to join a list of tuples into one list? [duplicate]
                            
                                MongoDB return True if document exists
                            
                                MySQL Improperly Configured Reason: unsafe use of relative path
                            
                                How to resolve relative paths in python?
                            
                                Python string to Django timezone (aware datetime)
                            
                                Package for listing version of packages used in a Jupyter notebook
                            
                                Python comments: # vs. strings
                            
                                Why does python disallow usage of hyphens within function and variable names?
                            
                                Removing Item From List - during iteration - what's wrong with this idiom?
                            
                                Python: PIL replace a single RGBA color
                            
                                Why does id({}) == id({}) and id([]) == id([]) in CPython?
                            
                                Querying full name in Django
                            
                                Compare multiple variables to the same value in "if"? [duplicate]
                            
                                Error installing gnureadline via pip
                            
                                How to add custom parameters to an URL query string with Python?
                            
                                Reset Column Index Pandas?
                            
                                Python: Converting file to base64 encoding
                            
                                pandas fillna not working
                            
                                How to byte-swap a 32-bit integer in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With