Good day,
I have a dataframe where I want to isolate a part of the string for each row for that column. The problem I am having is that each row needs to have a substring of a different length, specifically I want to keep the string only up until the first occurs of "." (a period) plus the next two letters.
Example:
import pandas as pd
x = [ [ 34, 'Sydney.Au123XX'] ,
[30, 'Delhi.As1q' ] ,
[16, 'New York.US3qqa']]
x = pd.DataFrame(x)
x.columns = ["a", "b"]
#now I want to substring each row based on where "." occurs.
#I have tried the following:
y = x["b"].str.slice( stop = x["b"].str.find(".") + 2)
y = x["b"].str[0: x["b"].str.find(".")+ 2]
#desired output
desired = [[ 34, 'Sydney.Au'] ,
[30, 'Delhi.As' ] ,
[16, 'New York.US'] ]
desired = pd.DataFrame(desired )
desired .columns = ["a", "b"]
Please see my code for the desired output.
I do not want to use a loop.
Thanks in advance.
By using apply and specifying one as the axis, we can run a function on every row of a dataframe. This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes.
Definition and Usage The values property returns all values in the DataFrame. The return value is a 2-dimensional array with one array for each row.
IIUC try:
x['b'] = x['b'].str.split('.').str[0]
print(x)
Also you can do an one-liner:
print(x.assign(b=x['b'].str.split('.').str[0]))
They both output:
a b
0 34 Sydney
1 30 Delhi
2 16 New York
Edit:
Do:
x['b'] = x['b'].str.extract('(.*\...)')
print(x)
Or use:
print(x.assign(b=x['b'].str.extract('(.*\...)')))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With