I have a dataframe:
Name Sub_Marks
0 Tom {'Maths': 30, 'English': 40, 'Science': 35}
1 Harry {'Maths': 35, 'English': 30, 'Science': 25}
2 Nick {'Maths': 32, 'English': 23, 'Science': 20}
I need to explode the dictionary into multiple rows. E.g.
Name Subject Marks
0 Tom Maths 30
1 Tom English 40
2 Tom Science 35
3 Harry Maths 35
4 Harry English 30
5 Harry Science 25
6 Nick Maths 32
7 Nick English 23
8 Nick Science 20
I know we can explode list in dataframe. Is there any way to do it with dictionary?
The explode() function is used to transform each element of a list-like to a row, replicating the index values. Exploded lists to rows of the subset columns; index will be duplicated for these rows. Raises: ValueError - if columns of the frame are not unique. Download the Pandas DataFrame Notebooks from here.
A pandas DataFrame can be converted into a Python dictionary using the DataFrame instance method to_dict(). The output can be specified of various orientations using the parameter orient. In dictionary orientation, for each column of the DataFrame the column value is listed against the row label in a dictionary.
In this tutorial, we will learn the Python pandas DataFrame.explode () method. It transforms each element of a list-like to a row, replicating index values. It returns DataFrame exploded lists to rows of the subset columns; index will be duplicated for these rows. The below shows the syntax of the DataFrame.explode () method. column: str or tuple.
To split dictionaries into separate columns in Pandas DataFrame, use the apply (pd.Series) method. As an example, consider the following DataFrame: To unpack column A into separate columns: we first fetched column A as a Series we then called apply (pd.Series), which returned a DataFrame where the column labels are the keys of the dictionaries.
Explode a DataFrame from list-like columns to long format. This routine will explode list-likes including lists, tuples, sets, Series, and np.ndarray. The result dtype of the subset rows will be object. Scalars will be returned unchanged, and empty list-likes will result in a np.nan for that row.
The explode () method converts each element of the specified column (s) into a row. The parameter ignore_index is a keyword argument. Required. Specifies the column to explode
We can create a list of tuples of Name
, Marks
and Subjects
by iterating over the values of dataframe inside a list comprehension, then we can create a new dataframe from this list of tuples
out = pd.DataFrame([(n, k, v) for (n, d) in df.values for k, v in d.items()])
out.columns = ['Name', 'Subject', 'Marks']
Alternative pandas
based approach
m = pd.DataFrame([*df['Sub_Marks']], df.index).stack()\
.rename_axis([None,'Subject']).reset_index(1, name='Marks')
out = df[['Name']].join(m)
>>> out
Name Subject Marks
0 Tom Maths 30
1 Tom English 40
2 Tom Science 35
3 Harry Maths 35
4 Harry English 30
5 Harry Science 25
6 Nick Maths 32
7 Nick English 23
8 Nick Science 20
You can extract only the values from the dictionaries
and then expand it to multiple columns like this:
data = {'Name' : ['Tom', "Harry", "Nick"], "Sub_Marks" : [{'Maths': 30, 'English': 40, 'Science': 35},{'Maths': 35, 'English': 42, 'Science': 31},{'Maths': 20, 'English': 14, 'Science': 65}]}
df = pd.DataFrame(data)
df[['Maths','English', 'Science']] = df['Sub_Marks'].apply(pd.Series)
df.drop(columns=['Sub_Marks'], inplace=True)
df = df.set_index('Name').stack().reset_index()
df.columns = ['Name', 'Subject', 'Marks']
Alternate method via explode
-
df['Marks'] = df['Sub_Marks'].apply(lambda x: x.values())
df = df.apply(pd.Series.explode).rename(columns = {'Sub_Marks': 'Subject'})
You might wanna use ast.literal_eval
first (if above method doesn't work) -
import ast
df['Sub_Marks'] = df['Sub_Marks'].apply(ast.literal_eval)
You can use .apply()
with pd.Series()
to 'explode' the dictionary into columns and then use .melt()
to transform the columns into Subject
and Marks
columns, as follows:
(df.drop(columns='Sub_Marks')
.join(df.apply(lambda x: pd.Series(x['Sub_Marks']), axis=1))
.melt(id_vars='Name', value_vars=['Maths', 'English', 'Science'], var_name='Subject', value_name='Marks')
.sort_values('Name')
).reset_index(drop=True)
You can also use pd.DataFrame()
together with to_list()
to 'explode' the dictionary:
(df.join(pd.DataFrame(df.pop('Sub_Marks').to_list()))
.melt(id_vars='Name', value_vars=['Maths', 'English', 'Science'], var_name='Subject', value_name='Marks')
.sort_values('Name')
).reset_index(drop=True)
Output:
Name Subject Marks
0 Harry Maths 35
1 Harry English 30
2 Harry Science 25
3 Nick Maths 32
4 Nick English 23
5 Nick Science 20
6 Tom Maths 30
7 Tom English 40
8 Tom Science 35
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With