Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas add new columns based on splitting another column

I have a pandas dataframe like the following:

A              B
US,65,AMAZON   2016
US,65,EBAY     2016

My goal is to get to look like this:

A              B      country    code    com
US.65.AMAZON   2016   US         65      AMAZON
US.65.AMAZON   2016   US         65      EBAY

I know this question has been asked before here and here but none of them works for me. I have tried:

df['country','code','com'] = df.Field.str.split('.')

and

df2 = pd.DataFrame(df.Field.str.split('.').tolist(),columns = ['country','code','com','A','B'])

Am I missing something? Any help is much appreciated.

like image 249
dagg3r Avatar asked Aug 15 '16 14:08

dagg3r


1 Answers

You can use split with parameter expand=True and add one [] to left side:

df[['country','code','com']] = df.A.str.split(',', expand=True)

Then replace , to .:

df.A = df.A.str.replace(',','.')

print (df)
              A     B country code     com
0  US.65.AMAZON  2016      US   65  AMAZON
1    US.65.EBAY  2016      US   65    EBAY

Another solution with DataFrame constructor if there are no NaN values:

df[['country','code','com']] = pd.DataFrame([ x.split(',') for x in df['A'].tolist() ])
df.A = df.A.str.replace(',','.')
print (df)
              A     B country code     com
0  US.65.AMAZON  2016      US   65  AMAZON
1    US.65.EBAY  2016      US   65    EBAY

Also you can use column names in constructor, but then concat is necessary:

df1=pd.DataFrame([x.split(',') for x in df['A'].tolist()],columns= ['country','code','com'])
df.A = df.A.str.replace(',','.')
df = pd.concat([df, df1], axis=1)
print (df)
              A     B country code     com
0  US.65.AMAZON  2016      US   65  AMAZON
1    US.65.EBAY  2016      US   65    EBAY
like image 102
jezrael Avatar answered Nov 10 '22 12:11

jezrael