Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get substring from pandas dataframe while filtering

Tags:

python

pandas

Say I have a dataframe with the following information:

Name    Points          String
John        24     FTS8500001A
Richard     35     FTS6700001B
John        29     FTS2500001A
Richard     35     FTS3800001B
John        34     FTS4500001A

Here is the way to get a DataFrame with the sample above:

import pandas as pd
keys = ('Name', 'Points', 'String')
names = pd.Series(('John', 'Richard', 'John', 'Richard', 'John'))
ages = pd.Series((24,35,29,35,34))
strings = pd.Series(('FTS8500001A','FTS6700001B','FTS2500001A','FTS3800001B','FTS4500001A'))
df = pd.concat((names, ages, strings), axis=1, keys=keys)

I want to select every row that meet the following criteria: Name=Richard And Points=35. And for such rows I want to read the 4th and 5th char of the String column (the two numbers just after FTS).

The output I want is the numbers 67 and 38.

I’ve tried several ways to achieve it but with zero results. Can you please help?

Thank you very much.
Eduardo

like image 208
Eduardo Avatar asked Jun 11 '15 12:06

Eduardo


2 Answers

Use a boolean mask to filter your df and then call str and slice the string:

In [77]:
df.loc[(df['Name'] == 'Richard') & (df['Points']==35),'String'].str[3:5]

Out[77]:
1    67
3    38
Name: String, dtype: object
like image 64
EdChum Avatar answered Nov 17 '22 01:11

EdChum


Pandas string methods

You can mask it on your criteria and then use pandas string methods

mask_richard = df.Name == 'Richard'
mask_points = df.Points == 35
df[mask_richard & mask_points].String.str[3:5]

1    67
3    38
like image 40
firelynx Avatar answered Nov 17 '22 01:11

firelynx