Conditional data selection with text string data in pandas dataframe

Question

I've looked but seem to be coming up dry for an answer to the following question.

I have a pandas dataframe analogous to this (call it 'df'):

        Type              Set
    1   theGreen          Z
    2   andGreen          Z           
    3   yellowRed         X
    4   roadRed           Y

I want to add another column to the dataframe (or generate a series) of the same length as the dataframe (= equal number of records/rows) which assigns a numerical coding variable (1) if the Type contains the string "Green", (0) otherwise.

Essentially, I'm trying to find a way of doing this:

   df['color'] = np.where(df['Type'] == 'Green', 1, 0)

Except instead of the usual numpy operators (<,>,==,!=, etc.) I need a way of saying "in" or "contains". Is this possible? Any and all help appreciated!

jezrael · Accepted Answer

Use str.contains:

df['color'] = np.where(df['Type'].str.contains('Green'), 1, 0)
print (df)
        Type Set  color
1   theGreen   Z      1
2   andGreen   Z      1
3  yellowRed   X      0
4    roadRed   Y      0

Another solution with apply:

df['color'] = np.where(df['Type'].apply(lambda x: 'Green' in x), 1, 0)
print (df)
        Type Set  color
1   theGreen   Z      1
2   andGreen   Z      1
3  yellowRed   X      0
4    roadRed   Y      0

Second solution is faster, but doesn't work with NaN in column Type, then return error:

TypeError: argument of type 'float' is not iterable

Timings:

#[400000 rows x 4 columns]
df = pd.concat([df]*100000).reset_index(drop=True)  

In [276]: %timeit df['color'] = np.where(df['Type'].apply(lambda x: 'Green' in x), 1, 0)
10 loops, best of 3: 94.1 ms per loop

In [277]: %timeit df['color1'] = np.where(df['Type'].str.contains('Green'), 1, 0)
1 loop, best of 3: 256 ms per loop

Conditional data selection with text string data in pandas dataframe

Tags:

python

string

pandas

dataframe

numpy

Cole Robertson

1 Answers

jezrael

Recent Activity

Donate For Us

Conditional data selection with text string data in pandas dataframe

Tags:

python

string

pandas

dataframe

numpy

Cole Robertson

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us