Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting particular characters/ text from DataFrame column

I am trying to get the email provider from the mail column of the Dataframe and create a new column named "Mail_Provider". For example, taking gmail from [email protected] and storing it in "Mail_Provider" column. Also I would like to extract Country ISD fro Phone column and Create a new column for that. Is there any other straight/simpler method other than regex.

data = pd.DataFrame({"Name":["A","B","C"],"mail": 
["[email protected]","[email protected]","[email protected]"],"Adress": 
["Adress1","Adress2","Adress3"],"Phone":["+91-1234567890","+88- 
0987654321","+27-2647589201"]})

Table

Name   mail        Adress       Phone

A    [email protected]   Adress1  +91-1234567890
B    [email protected]   Adress2  +88-0987654321
C    [email protected]   Adress3  +27-2647589201

Result expected:-

Name   mail        Adress       Phone        Mail_Provider   ISD

A    [email protected]   Adress1  +91-1234567890    gmail           91
B    [email protected]   Adress2  +88-0987654321    yahoo           88
C    [email protected]   Adress3  +27-2647589201    gmail           27
like image 662
Devesh Avatar asked Jul 30 '19 14:07

Devesh


People also ask

How extract specific data from column in pandas?

You can simple use it like this: df2 = df[['b','c','d','e','f']] why are you using regex?

How do I extract values from a DataFrame column?

You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression. The blow example returns a Courses column where the Fee column value matches with 25000.


2 Answers

Regex is rather simple as these:

data['Mail_Provider'] = data['mail'].str.extract('\@(\w+)\.')

data['ISD'] = data['Phone'].str.extract('\+(\d+)-')

If you really want to avoid regex, @Eva's answer would be the way to go.

like image 190
Quang Hoang Avatar answered Sep 21 '22 07:09

Quang Hoang


Mixed approach (regex and simple slicing):

In [693]: df['Mail_Provider'] = df['mail'].str.extract('@([^.]+)')

In [694]: df['ISD'] = df['Phone'].str[1:3]

In [695]: df
Out[695]: 
  Name         mail   Adress           Phone Mail_Provider ISD
0    A  [email protected]  Adress1  +91-1234567890         gmail  91
1    B  [email protected]  Adress2  +88-0987654321         yahoo  88
2    C  [email protected]  Adress3  +27-2647589201         gmail  27
like image 31
RomanPerekhrest Avatar answered Sep 23 '22 07:09

RomanPerekhrest