I'm having an issue where I have nested lists in multiple columns of my dataframe. Image below for reference -
image of dataframe
df_final['Product Name'] = ('dr. jart+ mask heroes face savers',
'moon juice beauty shroom exfoliating acid
potion',
'laneige kiss and make up set')
df_final['Subreddit'] = (None, ['scacjdiscussion'], ['AsianBeauty',
'AsianBeautyAdvice','SkincareAddiction',
'abdiscussion'])
df_final['Times Mentioned'] = (None, [1], [4,1,1,1])
I want the cells in the 'Subreddit' column to become the new columns, and the cells in the 'Times Mentioned' column to fill in the new cells. The 'Product Name' column will be the new index.
I tried using pivot_table -
from pandas import pivot_table
table = pivot_table(df_final, values='Times Mentioned', index=['Product
Name'], columns=['Subreddit'], aggfunc='size')
This successfully turned all the nested lists of the 'Subreddit' column into new columns, but the 'Times Mentioned' just repeats the first number for every column (example below)
wrong cell fillers
This should be 4, 1, 1, 1 like in the original image. Does anyone know how to fix this?
Thank you in advance!
There are some cells in the DF that contain a list
['AsianBeauty', 'AsianBeautyAdvice','SkincareAddiction', 'abdiscussion']
and this is a single cell which needs to be exploded into separate rows of the same column (Product Name). However, this has to be done while preserving the association between the Product Name column and each of the other 2 columns (which contain rows that must be exploded). I used this SO post to do exactly this, while keeping the association intact. Here is the approach I used, with comments in the code and top-level explanation shown separately
Raw DF from OP
import pandas as pd
df_final = pd.DataFrame()
df_final['Product Name'] = ('dr. jart+ mask heroes face savers',
'moon juice beauty shroom exfoliating acid potion',
'laneige kiss and make up set')
df_final['Subreddit'] = (None, ['scacjdiscussion'], ['AsianBeauty',
'AsianBeautyAdvice','SkincareAddiction',
'abdiscussion'])
df_final['Times Mentioned'] = (None, [1], [4,1,1,1])
print(df_final)
Raw data (df_final)
Product Name Subreddit Times Mentioned
0 dr. jart+ mask heroes face savers None None
1 moon juice beauty shroom exfoliating acid potion [scacjdiscussion] [1]
2 laneige kiss and make up set [AsianBeauty, AsianBeautyAdvice, SkincareAddiction, abdiscussion] [4, 1, 1, 1]
Raw data column dtypes
print(df_final.dtypes)
Product Name object
Subreddit object
Times Mentioned object
dtype: object
Code to explode rows and create final DF
exploded_dfs = []
for _, row in df_final.iterrows():
if all(row): # if a row does contain all non-None values
# Put 1st pair of columns into single DF, exploding single
# cell into multiple rows as needed
df1 = pd.concat([pd.Series(row['Product Name'], row['Subreddit'][:])])\
.reset_index()
# Assign column names
df1.columns = ['Subreddit', 'Product Name']
# Put 2nd pair of columns into single DF, exploding single
# cell into multiple rows as needed
df2 = pd.concat([pd.Series(row['Product Name'], row['Times Mentioned'][:])])\
.reset_index()
# Assign column names
df2.columns = ['Times Mentioned', 'Product Name']
# Perform INNER JOIN on DFs with exploded row contents
# & drop duplicated column
merged = pd.concat([df1, df2], axis=1)
merged = merged.loc[:,~merged.columns.duplicated()]
# Swap 1st and 2nd columns
cols = list(merged)
cols.insert(0, cols.pop(cols.index('Product Name')))
merged = merged.loc[:, cols]
else: # if a row does not contain all non-None values
# Create single row DF with no changes
merged = pd.DataFrame(columns=['Product Name', 'Subreddit',
'Times Mentioned'])
# Append row to DF
merged.loc[0] = row
exploded_dfs.append(merged)
# Vertically concatenate DFs in list
print(pd.concat(exploded_dfs, axis=0).reset_index(drop=True))
Here is the output
Product Name Subreddit Times Mentioned
0 dr. jart+ mask heroes face savers None None
1 moon juice beauty shroom exfoliating acid potion scacjdiscussion 1
2 laneige kiss and make up set AsianBeauty 4
3 laneige kiss and make up set AsianBeautyAdvice 1
4 laneige kiss and make up set SkincareAddiction 1
5 laneige kiss and make up set abdiscussion 1
Brief explanation of steps
None values, it will just be taken as is, so it is assumed that this row does not need cleaning: this row will just be appended to a single row DFNone
Subreddit) into column, if necessary (explained in this question)Product Name); this gives cleaned DF df1Times Mentioned); this gives cleaned DF df1mergedPackage info
pandas==0.23.4
Python version
Python 2.7.15rc1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With