Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Pandas, how do I split based on the first space.

So i have a column of codes: "dataset.csv"

0020-004241 purple
00532 - Blue
00121 - Yellow
055 - Greem
0025-097 - Orange

Desired Output:

code              name_of_code
    0020-004241         purple 
    00532               blue 

I want the codes and the words for the codes to be split into two different columns.

I tried:

df =pandas.read_csv(dataset.txt)

df = pandas.concat([df, df.columnname.str.split('/s', expand=True)], 1)
df = pandas.concat([df, df.columnname.str.split('-', expand=True)], 1)

` It gave the unexpected output of: purple none blue none yellow none green none orange none

How should I split this data correctly?

like image 674
Jessica Warren Avatar asked Jul 11 '18 16:07

Jessica Warren


2 Answers

Using str.split(" ", 1)

Ex:

import pandas as pd
df = pd.read_csv(filename,names=['code'])
df[['code','name_of_code']] = df["code"].str.split(" ", 1, expand=True)
df["name_of_code"] = df["name_of_code"].str.strip("-")
print(df)

Output:

          code name_of_code
0  0020-004241       purple
1        00532         Blue
2        00121       Yellow
3          055        Greem
4     0025-097       Orange
like image 169
Rakesh Avatar answered Sep 18 '22 06:09

Rakesh


You can process this via a couple of split calls:

df = pd.DataFrame({'col': ['0020-004241 purple', '00532 - Blue',
                           '00121 - Yellow', '055 - Greem',
                           '0025-097 - Orange']})

df[['col1', 'col2']] = df['col'].str.split(n=1, expand=True)
df['col2'] = df['col2'].str.split().str[-1]

print(df)

                  col         col1    col2
0  0020-004241 purple  0020-004241  purple
1        00532 - Blue        00532    Blue
2      00121 - Yellow        00121  Yellow
3         055 - Greem          055   Greem
4   0025-097 - Orange     0025-097  Orange
like image 24
jpp Avatar answered Sep 22 '22 06:09

jpp