Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas explode dictionary to rows

I have a dataframe:

    Name                                    Sub_Marks
0    Tom  {'Maths': 30, 'English': 40, 'Science': 35}
1  Harry  {'Maths': 35, 'English': 30, 'Science': 25}
2   Nick  {'Maths': 32, 'English': 23, 'Science': 20}

I need to explode the dictionary into multiple rows. E.g.

    Name  Subject  Marks
0    Tom    Maths     30
1    Tom  English     40
2    Tom  Science     35
3  Harry    Maths     35
4  Harry  English     30
5  Harry  Science     25
6   Nick    Maths     32
7   Nick  English     23
8   Nick  Science     20

I know we can explode list in dataframe. Is there any way to do it with dictionary?

like image 295
Asif Iqbal Avatar asked Apr 30 '21 15:04

Asif Iqbal


People also ask

How do you explode in pandas?

The explode() function is used to transform each element of a list-like to a row, replicating the index values. Exploded lists to rows of the subset columns; index will be duplicated for these rows. Raises: ValueError - if columns of the frame are not unique. Download the Pandas DataFrame Notebooks from here.

Can a pandas DataFrame contains a dictionary?

A pandas DataFrame can be converted into a Python dictionary using the DataFrame instance method to_dict(). The output can be specified of various orientations using the parameter orient. In dictionary orientation, for each column of the DataFrame the column value is listed against the row label in a dictionary.

How to explode a Dataframe in Python?

In this tutorial, we will learn the Python pandas DataFrame.explode () method. It transforms each element of a list-like to a row, replicating index values. It returns DataFrame exploded lists to rows of the subset columns; index will be duplicated for these rows. The below shows the syntax of the DataFrame.explode () method. column: str or tuple.

How to split dictionaries into separate columns in pandas Dataframe?

To split dictionaries into separate columns in Pandas DataFrame, use the apply (pd.Series) method. As an example, consider the following DataFrame: To unpack column A into separate columns: we first fetched column A as a Series we then called apply (pd.Series), which returned a DataFrame where the column labels are the keys of the dictionaries.

How to explode a Dataframe from list-like columns to long format?

Explode a DataFrame from list-like columns to long format. This routine will explode list-likes including lists, tuples, sets, Series, and np.ndarray. The result dtype of the subset rows will be object. Scalars will be returned unchanged, and empty list-likes will result in a np.nan for that row.

What is the use of explode() method in MySQL?

The explode () method converts each element of the specified column (s) into a row. The parameter ignore_index is a keyword argument. Required. Specifies the column to explode


Video Answer


4 Answers

We can create a list of tuples of Name, Marks and Subjects by iterating over the values of dataframe inside a list comprehension, then we can create a new dataframe from this list of tuples

out = pd.DataFrame([(n, k, v) for (n, d) in df.values for k, v in d.items()])
out.columns = ['Name', 'Subject', 'Marks']

Alternative pandas based approach

m = pd.DataFrame([*df['Sub_Marks']], df.index).stack()\
      .rename_axis([None,'Subject']).reset_index(1, name='Marks')

out = df[['Name']].join(m)

>>> out

    Name  Subject  Marks
0    Tom    Maths     30
1    Tom  English     40
2    Tom  Science     35
3  Harry    Maths     35
4  Harry  English     30
5  Harry  Science     25
6   Nick    Maths     32
7   Nick  English     23
8   Nick  Science     20
like image 84
Shubham Sharma Avatar answered Oct 21 '22 15:10

Shubham Sharma


You can extract only the values from the dictionaries and then expand it to multiple columns like this:

data = {'Name' : ['Tom', "Harry", "Nick"], "Sub_Marks" : [{'Maths': 30, 'English': 40, 'Science': 35},{'Maths': 35, 'English': 42, 'Science': 31},{'Maths': 20, 'English': 14, 'Science': 65}]}
df = pd.DataFrame(data)

df[['Maths','English', 'Science']] = df['Sub_Marks'].apply(pd.Series)
df.drop(columns=['Sub_Marks'], inplace=True)
df = df.set_index('Name').stack().reset_index()
df.columns = ['Name', 'Subject', 'Marks']
like image 2
Aditya Avatar answered Oct 21 '22 14:10

Aditya


Alternate method via explode -

df['Marks'] = df['Sub_Marks'].apply(lambda x: x.values())
df = df.apply(pd.Series.explode).rename(columns = {'Sub_Marks': 'Subject'})

You might wanna use ast.literal_eval first (if above method doesn't work) -

import ast
df['Sub_Marks'] = df['Sub_Marks'].apply(ast.literal_eval)
like image 1
Nk03 Avatar answered Oct 21 '22 13:10

Nk03


You can use .apply() with pd.Series() to 'explode' the dictionary into columns and then use .melt() to transform the columns into Subject and Marks columns, as follows:

(df.drop(columns='Sub_Marks')
   .join(df.apply(lambda x: pd.Series(x['Sub_Marks']), axis=1))
   .melt(id_vars='Name', value_vars=['Maths', 'English', 'Science'], var_name='Subject', value_name='Marks')
   .sort_values('Name')
).reset_index(drop=True)

You can also use pd.DataFrame() together with to_list() to 'explode' the dictionary:

(df.join(pd.DataFrame(df.pop('Sub_Marks').to_list()))
   .melt(id_vars='Name', value_vars=['Maths', 'English', 'Science'], var_name='Subject', value_name='Marks')
   .sort_values('Name')
).reset_index(drop=True)

Output:

    Name  Subject  Marks
0  Harry    Maths     35
1  Harry  English     30
2  Harry  Science     25
3   Nick    Maths     32
4   Nick  English     23
5   Nick  Science     20
6    Tom    Maths     30
7    Tom  English     40
8    Tom  Science     35
like image 1
SeaBean Avatar answered Oct 21 '22 14:10

SeaBean