Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: columns must be same length as key when splitting a column

Tags:

python

pandas

I have two address columns and I want to extract the last word from the first column and the first word from the second column. In the provided example there aren't two words in column 'Address2', but I want to build the code in such a way that it will work regardless of how the dataset will look like. Sometimes the address2 can be one word, something it will have 2, etc..

data = {
    'Address1': ['3 Steel Street', '1 Arnprior Crescent', '40 Bargeddie Street Blackhill'],
    'Address2': ['Saltmarket', 'Castlemilk', 'Blackhill']
}

df = pd.DataFrame(data)

I have no problem with column 'Address1':

df[['StringStart', 'LastWord']] = df['Address1'].str.rsplit(' ', n=1, expand=True)

The problem comes with column 'Address2' where if I apply the above code I an error: Columns must be same length as key

I understand where the problem is coming from - I am trying to split one column which has one element into two columns. I am sure there is a way in which this can be handled to allow the split anyway and return Null if there isn't a word and a value if there is.

like image 359
MariaT Avatar asked Sep 17 '25 22:09

MariaT


1 Answers

Using str.extract() might be better for several reasons: it handles all cases, offers precision with regular expressions, and eliminates the risk of value errors.

import pandas as pd

data = {
    'Address1': ['3 Steel Street', '1 Arnprior Crescent', '40 Bargeddie Street Blackhill'],
    'Address2': ['Saltmarket', 'Castlemilk East', 'Blackhill']
}
df = pd.DataFrame(data)

df[['StringStart', 'LastWord']] = df['Address1'].str.rsplit(' ', n=1, expand=True)

df[['FirstWord_Address2', 'Remaining_Address2']] = (
    df['Address2'].str.extract(r'^(\S+)\s*(.*)$')
)

print(df)

Or:

df[['Address1_Prefix', 'Address1_LastWord']] = df['Address1'].str.extract(r'^(.*\b)\s+(\S+)$')

df[['Address2_FirstWord', 'Address2_Remaining']] = df['Address2'].str.extract(r'^(\S+)\s*(.*)$')

Output:

                        Address1         Address2          StringStart   LastWord FirstWord_Address2 Remaining_Address2
0                 3 Steel Street       Saltmarket              3 Steel     Street         Saltmarket
1            1 Arnprior Crescent  Castlemilk East           1 Arnprior   Crescent         Castlemilk               East
2  40 Bargeddie Street Blackhill        Blackhill  40 Bargeddie Street  Blackhill          Blackhill
like image 105
Subir Chowdhury Avatar answered Sep 20 '25 11:09

Subir Chowdhury