How to split the column values of dataframe into multiple columns

Question

Below is my dataframe having a column that are merged together,

   PLUGS
DESIGN
GEAR
0  700
Daewoo 8000  Gearless   
1  300
Hyundai 4400  Gearless   
2  600
STX 2600  Gearless   
3  200
B170 
Geared   
4  362 Wenchong 1700 Mk II 
Geared   
5  252
RichMax 1550  Gearless   
6  220
CV 1100 Plus 
Geared   
7  232
Orskov Mk VII  Gearless   
8  119
Kouan 1000  Gearless   
9  100
Hanjin 700  Gearless

I want to split the columns into three different columns namely PLUGS, DESIGN, GEAR. Is there any way to do this?

Below is the code which i tried:

new_df[['PLUGS', 'DESIGN', 'GEAR']] = new_df['PLUGS
DESIGN
GEAR'].str.split(' ')
                print(new_df)

expected output:

   PLUGS  DESIGN               GEAR
0  700    Daewoo 8000          Gearless   
1  300    Hyundai 4400         Gearless   
2  600    STX 2600             Gearless   
3  200    B170                 Geared   
4  362    Wenchong 1700 Mk II  Geared   
5  252    RichMax 1550         Gearless   
6  220    CV 1100 Plus         Geared   
7  232    Orskov Mk VII        Gearless   
8  119    Kouan 1000           Gearless   
9  100    Hanjin 700           Gearless

Karn Kumar · Accepted Answer

As Suggested in the comment section, the regex should work pretty well here,

DataFrame Sample:

>>> df
                   PLUGS
DESIGN
GEAR
0        700
Daewoo 8000  Gearless
1       300
Hyundai 4400  Gearless
2           600
STX 2600  Gearless
3                200
B170 
Geared
4  362 Wenchong 1700 Mk II 
Geared
5       252
RichMax 1550  Gearless
6        220
CV 1100 Plus 
Geared
7      232
Orskov Mk VII  Gearless
8         119
Kouan 1000  Gearless
9            100
Hanjin 700  Gearless

Just removing the newline char from the column name to make readability easy for use as well.

>>> df.columns = df.columns.str.replace(r"\n", " ", regex=True)

Now, Column name does not have any special cars:

>>> df
                     PLUGS DESIGN GEAR
0        700
Daewoo 8000  Gearless
1       300
Hyundai 4400  Gearless
2           600
STX 2600  Gearless
3                200
B170 
Geared
4  362 Wenchong 1700 Mk II 
Geared
5       252
RichMax 1550  Gearless
6        220
CV 1100 Plus 
Geared
7      232
Orskov Mk VII  Gearless
8         119
Kouan 1000  Gearless
9            100
Hanjin 700  Gearless

Now, we can use pandas.Series.str.extract. While using regex method, all the Named groups () will become column names in the result.

As, the named group will become columns with predefined names like 0,1,2 thus we can rename them altogether with desired names to get the desired result as follows:

>>> df = df['PLUGS DESIGN GEAR'].str.extract(r"^(\d+)[\n\s]+([^\]+)[\n\s]+([\n|^Gear][a-z]+)").rename(columns={0: 'PLUGS', 1: 'DESIGN', 2: 'GEAR'})

Result:

>>> print(df)
  PLUGS                DESIGN      GEAR
0   700          Daewoo 8000   Gearless
1   300         Hyundai 4400   Gearless
2   600             STX 2600   Gearless
3   200                 B170     Geared
4   362  Wenchong 1700 Mk II     Geared
5   252         RichMax 1550   Gearless
6   220         CV 1100 Plus     Geared
7   232        Orskov Mk VII   Gearless
8   119           Kouan 1000   Gearless
9   100           Hanjin 700   Gearless

regex Explanation:

You can check at regex101.com

(\d+)[\n\s]+([^\]+)[\n\s]+([\|^Gear][a-z]+)

1st Capturing Group (\d+)

    \d matches a digit (equivalent to [0-9])
    + matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
    Match a single character present in the list below [\n\s]
    + matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
    \ matches the character \ literally (case sensitive)
    n matches the character n literally (case sensitive)
    \s matches any whitespace character (equivalent to [
	\f\v ])

2nd Capturing Group ([^\]+)

    Match a single character not present in the list below [^\]
    + matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
    \ matches the character \ literally (case sensitive)
    Match a single character present in the list below [\n\s]
    + matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
    \ matches the character \ literally (case sensitive)
    n matches the character n literally (case sensitive)
    \s matches any whitespace character (equivalent to [
	\f\v ])

3rd Capturing Group ([|^Gear][a-z]+)

Match a single character present in the list below [\|^Gear]
\| matches the character | literally (case sensitive)
^Gear matches a single character in the list ^Gear (case sensitive)
Match a single character present in the list below [a-z]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

tlentali · Answer

Starting from your Dataframe :

>>> import pandas as pd

>>> df = pd.DataFrame({'PLUGS
DESIGN
GEAR': ['700
Daewoo 8000  Gearless', '300
Hyundai 4400  Gearless', '600
STX 2600  Gearless', '200
B170 
Geared', '362 Wenchong 1700 Mk II 
Geared', '252
RichMax 1550  Gearless'], }, 
...                   index = [0, 1, 2, 3, 4, 5]) 
>>> df
    PLUGS
DESIGN
GEAR
0   700
Daewoo 8000 Gearless
1   300
Hyundai 4400 Gearless
2   600
STX 2600 Gearless
3   200
B170 
Geared
4   362 Wenchong 1700 Mk II 
Geared
5   252
RichMax 1550 Gearless

You can indeed use the split method on several separators, here and space:

>>> df = pd.DataFrame(df['PLUGS
DESIGN
GEAR'].str.split('
| '))
    PLUGS
DESIGN
GEAR
0   [700, Daewoo, 8000, , Gearless]
1   [300, Hyundai, 4400, , Gearless]
2   [600, STX, 2600, , Gearless]
3   [200, B170, , Geared]
4   [362, Wenchong, 1700, Mk, II, , Geared]
5   [252, RichMax, 1550, , Gearless]

Then, you can assign the first and last element to the correct column, and the rest to the DESIGN column :

>>> df['PLUGS'] = df['PLUGS
DESIGN
GEAR'].str[0]
>>> df['DESIGN'] = df['PLUGS
DESIGN
GEAR'].str[1:-1]
>>> df['GEAR'] = df['PLUGS
DESIGN
GEAR'].str[-1]
>>> df
    PLUGS
DESIGN
GEAR                         PLUGS   DESIGN                      GEAR
0   [700, Daewoo, 8000, , Gearless]             700     [Daewoo, 8000, ]            Gearless
1   [300, Hyundai, 4400, , Gearless]            300     [Hyundai, 4400, ]           Gearless
2   [600, STX, 2600, , Gearless]                600     [STX, 2600, ]               Gearless
3   [200, B170, , Geared]                       200     [B170, ]                    Geared
4   [362, Wenchong, 1700, Mk, II, , Geared]     362     [Wenchong, 1700, Mk, II, ]  Geared
5   [252, RichMax, 1550, , Gearless]            252     [RichMax, 1550, ]           Gearless

The last thing to do is to improve the DESIGN column to map it as a string instead of a list using the join method, and drop the PLUGS DESIGN GEAR column like so :

>>> df['DESIGN'] = df['DESIGN'].apply(lambda x: ' '.join(map(str, x)))
>>> df.drop(['PLUGS
DESIGN
GEAR'], axis=1)
    PLUGS   DESIGN               GEAR
0   700     Daewoo 8000          Gearless
1   300     Hyundai 4400         Gearless
2   600     STX 2600             Gearless
3   200     B170                 Geared
4   362     Wenchong 1700 Mk II  Geared
5   252     RichMax 1550         Gearless

How to split the column values of dataframe into multiple columns

Tags:

python

split

pandas

dataframe

Stark

2 Answers

DataFrame Sample:

Result:

Karn Kumar

tlentali

Recent Activity

Donate For Us

How to split the column values of dataframe into multiple columns

Tags:

python

split

pandas

dataframe

Stark

2 Answers

DataFrame Sample:

Result:

Karn Kumar

tlentali

Related questions

Recent Activity

Donate For Us