Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - substring each row with a different length

Good day,

I have a dataframe where I want to isolate a part of the string for each row for that column. The problem I am having is that each row needs to have a substring of a different length, specifically I want to keep the string only up until the first occurs of "." (a period) plus the next two letters.

Example:

import pandas as pd

x = [ [ 34, 'Sydney.Au123XX'] ,
             [30, 'Delhi.As1q' ] ,
             [16, 'New York.US3qqa']]
x = pd.DataFrame(x)
x.columns = ["a", "b"]

#now I want to substring each row based on where "." occurs.
#I have tried the following:
y = x["b"].str.slice( stop = x["b"].str.find(".") + 2)
y = x["b"].str[0: x["b"].str.find(".")+ 2]

#desired output
desired = [[ 34, 'Sydney.Au'] ,
             [30, 'Delhi.As' ] ,
             [16, 'New York.US'] ]
desired  = pd.DataFrame(desired )
desired .columns = ["a", "b"] 

Please see my code for the desired output.

I do not want to use a loop.

Thanks in advance.

like image 325
rich Avatar asked Jul 26 '19 07:07

rich


People also ask

Is Pandas apply faster than Iterrows?

By using apply and specifying one as the axis, we can run a function on every row of a dataframe. This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes.

What does .values do in Pandas?

Definition and Usage The values property returns all values in the DataFrame. The return value is a 2-dimensional array with one array for each row.


1 Answers

IIUC try:

x['b'] = x['b'].str.split('.').str[0]
print(x)

Also you can do an one-liner:

print(x.assign(b=x['b'].str.split('.').str[0]))

They both output:

    a         b
0  34    Sydney
1  30     Delhi
2  16  New York

Edit:

Do:

x['b'] = x['b'].str.extract('(.*\...)')
print(x)

Or use:

print(x.assign(b=x['b'].str.extract('(.*\...)')))
like image 132
U12-Forward Avatar answered Oct 21 '22 18:10

U12-Forward