Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split one column into multiple columns in Python

I have a Python dataframe like this with one column:

index  Train_station

0      Adenauerplatz 52° 29′ 59″ N, 13° 18′ 26″ O
1      Afrikanische Straße 52° 33′ 38″ N, 13° 20′ 3″ O
2      Alexanderplatz 52° 31′ 17″ N, 13° 24′ 48″ O

And I want to split it into 3 columns: Train station, Latitude, Longitude. The dataframe should look like this:

index  Train_station         Latitude       Longitude

0      Adenauerplatz         52° 29′ 59″ N  13° 18′ 26″ O
1      Afrikanische Straße   52° 33′ 38″ N  13° 20′ 3″ O
2      Alexanderplatz        52° 31′ 17″ N  13° 24′ 48″ O

I've tried using df[['Latitude', 'Longitude']] = df.Train_station.str.split(',', expand=True) but it only split between latitude and longitude coordinates. How can I split a column with more than one condition that I define?

I've thought about method to check the string starting from the left and then split the when it meets an integer or the defined string but I've found no answer for this method so far.

like image 550
Minh Mai Avatar asked Jun 20 '20 23:06

Minh Mai


People also ask

How do I split a single column into multiple columns in Python?

We can use str. split() to split one column to multiple columns by specifying expand=True option. We can use str. extract() to exract multiple columns using regex expression in which multiple capturing groups are defined.

How do you split a column in Python?

We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.

How do you split multiple columns in Python?

split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.

How do I split a column in a Dataframe into two columns?

You can use the following basic syntax to split a string column in a pandas DataFrame into multiple columns: #split column A into two columns: column A and column B df [ ['A', 'B']] = df ['A'].str.split(',', 1, expand=True)

How to split a column into two different columns in MySQL?

Method #1 : Using Series.str.split () functions. Split Name column into two different columns. By default splitting is done on the basis of single space by str.split () function.

How do I split a column by delimiter in pandas?

Split Column by delimiter in Pandas Now let's say that instead of storing lists like: ['C+', 'C+'] you have only the values separated by delimiter. In this case is a comma like 'C+', 'C+'. And we would like to split the column skills by delimiter into multiple columns. This time the number of elements is not fixed!

How do I separate the values in a string into columns?

You can utilize the .split () method for separating the values in the strings. Use .apply () to create new data-frame columns for each desired column name.


3 Answers

df = df.Train_station.str.split(r'(.*?)(\d+°[^,]+),(.*)', expand=True)
print(df.loc[:, 1:3].rename(columns={1:'Train_station', 2:'Latitude', 3:'Longitude'}) )

Prints:

          Train_station       Latitude       Longitude
0        Adenauerplatz   52° 29′ 59″ N   13° 18′ 26″ O
1  Afrikanische Straße   52° 33′ 38″ N    13° 20′ 3″ O
2       Alexanderplatz   52° 31′ 17″ N   13° 24′ 48″ O

EDIT: Thanks @ALollz, you can use str.extract():

df = df.Train_station.str.extract(r'(?P<Train_station>.*?)(?P<Latitude>\d+°[^,]+),(?P<Longitude>.*)', expand=True)
print(df)
like image 83
Andrej Kesely Avatar answered Nov 15 '22 00:11

Andrej Kesely


You can utilize the .split() method for separating the values in the strings.

Use .apply() to create new data-frame columns for each desired column name.

import pandas as pd

data = ["Adenauerplatz 52° 29′ 59″ N, 13° 18′ 26″ O",
        "Afrikanische Straße 52° 33′ 38″ N, 13° 20′ 3″ O",
        "Alexanderplatz 52° 31′ 17″ N, 13° 24′ 48″ O"]

df = pd.DataFrame(data, columns=['Train_station'])


def train_station(x):
    x = x.split(' ', 1)
    return x[0]


def latitude(x):
    x = x.split(' ', 1)
    x = x[1].split(', ', 1)
    return x[0]


def longitude(x):
    x = x.split(' ', 1)
    x = x[1].split(', ', 1)
    return x[1]


df['Latitude'] = df['Train_station'].apply(latitude)
df['Longitude'] = df['Train_station'].apply(longitude)
df['Train_station'] = df['Train_station'].apply(train_station)

print(df)

What you see above is a recreation of your original data-frame and then modified with .split() and .apply()

Output:

    Train_station              Latitude      Longitude
0   Adenauerplatz         52° 29′ 59″ N  13° 18′ 26″ O
1    Afrikanische  Straße 52° 33′ 38″ N   13° 20′ 3″ O
2  Alexanderplatz         52° 31′ 17″ N  13° 24′ 48″ O
like image 43
Tyler Russin Avatar answered Nov 14 '22 22:11

Tyler Russin


You can try something like this:

df['Latitude']=df['Train_station'].apply(lambda x: ' '.join([i for i in x.split(' ') if any((lett.replace(',','') in '°′″') for lett in i)]).split(',')[0])
df['Longitude']=df['Train_station'].apply(lambda x: ' '.join([i for i in x.split(' ') if any((lett.replace(',','') in '°′″O') for lett in i)]).split(',')[1])
df['Train_station']=df['Train_station'].apply(lambda x: ''.join([i for i in x.split(' ') if not any((lett.replace(',','') in '°′″') for lett in i) ]))

Output:

               Train_station       Latitude       Longitude
0          Adenauerplatz          52° 29′ 59″ N   13° 18′ 26″ O
1    Afrikanische Straße          52° 33′ 38″ N    13° 20′ 3″ O
2         Alexanderplatz          52° 31′ 17″ N   13° 24′ 48″ O
like image 44
MrNobody33 Avatar answered Nov 14 '22 23:11

MrNobody33