I am trying to get the email provider from the mail column of the Dataframe and create a new column named "Mail_Provider". For example, taking gmail from [email protected] and storing it in "Mail_Provider" column. Also I would like to extract Country ISD fro Phone column and Create a new column for that. Is there any other straight/simpler method other than regex.
data = pd.DataFrame({"Name":["A","B","C"],"mail":
["[email protected]","[email protected]","[email protected]"],"Adress":
["Adress1","Adress2","Adress3"],"Phone":["+91-1234567890","+88-
0987654321","+27-2647589201"]})
Table
Name mail Adress Phone
A [email protected] Adress1 +91-1234567890
B [email protected] Adress2 +88-0987654321
C [email protected] Adress3 +27-2647589201
Result expected:-
Name mail Adress Phone Mail_Provider ISD
A [email protected] Adress1 +91-1234567890 gmail 91
B [email protected] Adress2 +88-0987654321 yahoo 88
C [email protected] Adress3 +27-2647589201 gmail 27
You can simple use it like this: df2 = df[['b','c','d','e','f']] why are you using regex?
You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression. The blow example returns a Courses column where the Fee column value matches with 25000.
Regex is rather simple as these:
data['Mail_Provider'] = data['mail'].str.extract('\@(\w+)\.')
data['ISD'] = data['Phone'].str.extract('\+(\d+)-')
If you really want to avoid regex, @Eva's answer would be the way to go.
Mixed approach (regex and simple slicing):
In [693]: df['Mail_Provider'] = df['mail'].str.extract('@([^.]+)')
In [694]: df['ISD'] = df['Phone'].str[1:3]
In [695]: df
Out[695]:
Name mail Adress Phone Mail_Provider ISD
0 A [email protected] Adress1 +91-1234567890 gmail 91
1 B [email protected] Adress2 +88-0987654321 yahoo 88
2 C [email protected] Adress3 +27-2647589201 gmail 27
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With