I am trying to parse out a column by the comma (also stripping the white space) and then pivoting all of the origin/destination combinations into new rows. Here is a sample of the data:
Origin Destination Weight
PVG AMS, FRA 10,000
CAN, XMN LAX, ORD 25,000
I am having trouble reproducing the dataframe above using pd.read_clipboard, so here is the dataframe code:
df = pd.DataFrame({'Origin': ['PVG', 'CAN, XMN'],
'Destination': ['AMS, FRA', 'LAX, ORD'],
'Weight': [10000, 25000]})
The desired output would be:
Origin Destination Weight
PVG AMS 10,000
PVG FRA 10,000
CAN LAX 25,000
CAN ORD 25,000
XMN LAX 25,000
XMN ORD 25,000
I have been trying to use:
df['Origin'].str.split(',', expand = True)
I had tried doing this for both the origin and destination, which works for parsing the strings into separate columns. I am struggling to then create all of the possible combinations into separate rows (I have tried using pivot_table and melt with no luck).
Use itertools.product
with list comprehenion
, only first create lists with applymap
:
from itertools import product
df1 = df.applymap(lambda x: x.split(', ') if isinstance (x, str) else [x])
df2 = pd.DataFrame([j for i in df1.values for j in product(*i)], columns=df.columns)
print (df2)
Origin Destination Weight
0 PVG AMS 10000
1 PVG FRA 10000
2 CAN LAX 25000
3 CAN ORD 25000
4 XMN LAX 25000
5 XMN ORD 25000
You should be using itertools.product
here.
from itertools import product
df_dict = {
x[-1]: pd.DataFrame(list(
product(x[0].split(', '), x[1].split(', ')))
) for x in df.values.tolist()
}
df = pd.concat(df_dict).reset_index(level=-1, drop=True).reset_index()
df.columns = ['Weight', 'Destination', 'Origin']
df
Weight Destination Origin
0 10000 AMS PVG
1 10000 FRA PVG
2 25000 LAX CAN
3 25000 LAX XMN
4 25000 ORD CAN
5 25000 ORD XMN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With