I have a Python dataframe like this with one column:
index Train_station
0 Adenauerplatz 52° 29′ 59″ N, 13° 18′ 26″ O
1 Afrikanische Straße 52° 33′ 38″ N, 13° 20′ 3″ O
2 Alexanderplatz 52° 31′ 17″ N, 13° 24′ 48″ O
And I want to split it into 3 columns: Train station, Latitude, Longitude. The dataframe should look like this:
index Train_station Latitude Longitude
0 Adenauerplatz 52° 29′ 59″ N 13° 18′ 26″ O
1 Afrikanische Straße 52° 33′ 38″ N 13° 20′ 3″ O
2 Alexanderplatz 52° 31′ 17″ N 13° 24′ 48″ O
I've tried using df[['Latitude', 'Longitude']] = df.Train_station.str.split(',', expand=True) but it only split between latitude and longitude coordinates. How can I split a column with more than one condition that I define?
I've thought about method to check the string starting from the left and then split the when it meets an integer or the defined string but I've found no answer for this method so far.
We can use str. split() to split one column to multiple columns by specifying expand=True option. We can use str. extract() to exract multiple columns using regex expression in which multiple capturing groups are defined.
We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.
split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.
You can use the following basic syntax to split a string column in a pandas DataFrame into multiple columns: #split column A into two columns: column A and column B df [ ['A', 'B']] = df ['A'].str.split(',', 1, expand=True)
Method #1 : Using Series.str.split () functions. Split Name column into two different columns. By default splitting is done on the basis of single space by str.split () function.
Split Column by delimiter in Pandas Now let's say that instead of storing lists like: ['C+', 'C+'] you have only the values separated by delimiter. In this case is a comma like 'C+', 'C+'. And we would like to split the column skills by delimiter into multiple columns. This time the number of elements is not fixed!
You can utilize the .split () method for separating the values in the strings. Use .apply () to create new data-frame columns for each desired column name.
df = df.Train_station.str.split(r'(.*?)(\d+°[^,]+),(.*)', expand=True)
print(df.loc[:, 1:3].rename(columns={1:'Train_station', 2:'Latitude', 3:'Longitude'}) )
Prints:
Train_station Latitude Longitude
0 Adenauerplatz 52° 29′ 59″ N 13° 18′ 26″ O
1 Afrikanische Straße 52° 33′ 38″ N 13° 20′ 3″ O
2 Alexanderplatz 52° 31′ 17″ N 13° 24′ 48″ O
EDIT: Thanks @ALollz, you can use str.extract()
:
df = df.Train_station.str.extract(r'(?P<Train_station>.*?)(?P<Latitude>\d+°[^,]+),(?P<Longitude>.*)', expand=True)
print(df)
You can utilize the .split()
method for separating the values in the strings.
Use .apply()
to create new data-frame columns for each desired column name.
import pandas as pd
data = ["Adenauerplatz 52° 29′ 59″ N, 13° 18′ 26″ O",
"Afrikanische Straße 52° 33′ 38″ N, 13° 20′ 3″ O",
"Alexanderplatz 52° 31′ 17″ N, 13° 24′ 48″ O"]
df = pd.DataFrame(data, columns=['Train_station'])
def train_station(x):
x = x.split(' ', 1)
return x[0]
def latitude(x):
x = x.split(' ', 1)
x = x[1].split(', ', 1)
return x[0]
def longitude(x):
x = x.split(' ', 1)
x = x[1].split(', ', 1)
return x[1]
df['Latitude'] = df['Train_station'].apply(latitude)
df['Longitude'] = df['Train_station'].apply(longitude)
df['Train_station'] = df['Train_station'].apply(train_station)
print(df)
What you see above is a recreation of your original data-frame and then modified with .split()
and .apply()
Output:
Train_station Latitude Longitude
0 Adenauerplatz 52° 29′ 59″ N 13° 18′ 26″ O
1 Afrikanische Straße 52° 33′ 38″ N 13° 20′ 3″ O
2 Alexanderplatz 52° 31′ 17″ N 13° 24′ 48″ O
You can try something like this:
df['Latitude']=df['Train_station'].apply(lambda x: ' '.join([i for i in x.split(' ') if any((lett.replace(',','') in '°′″') for lett in i)]).split(',')[0])
df['Longitude']=df['Train_station'].apply(lambda x: ' '.join([i for i in x.split(' ') if any((lett.replace(',','') in '°′″O') for lett in i)]).split(',')[1])
df['Train_station']=df['Train_station'].apply(lambda x: ''.join([i for i in x.split(' ') if not any((lett.replace(',','') in '°′″') for lett in i) ]))
Output:
Train_station Latitude Longitude
0 Adenauerplatz 52° 29′ 59″ N 13° 18′ 26″ O
1 Afrikanische Straße 52° 33′ 38″ N 13° 20′ 3″ O
2 Alexanderplatz 52° 31′ 17″ N 13° 24′ 48″ O
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With