I'm trying to take a nested DataFrame and convert it to a nested Dictionary.
Here is my original DataFrame with the following unique values:
input: df.head(5)
output:
reviewerName title reviewerRatings
0 Charles Harry Potter Book Seven News:... 3.0
1 Katherine Harry Potter Boxed Set, Books... 5.0
2 Lora Harry Potter and the Sorcerer... 5.0
3 Cait Harry Potter and the Half-Blo... 5.0
4 Diane Harry Potter and the Order of... 5.0
input: len(df['reviewerName'].unique())
output: 66130
Given that there are multiple values in each of the 66130 unqiue values (ie. "Charles" would occur 3 times), I took the 66130 unique "reviewerName" and assign them all as the key in the new nested DataFrame, then assign the value using "title" and "reviewerRatings" as another layer of key:value in the same nested DataFrame.
input: df = df.set_index(['reviewerName', 'title']).sort_index()
output:
reviewerRatings
reviewerName title
Charles Harry Potter Book Seven News:... 3.0
Harry Potter and the Half-Blo... 3.5
Harry Potter and the Order of... 4.0
Katherine Harry Potter Boxed Set, Books... 5.0
Harry Potter and the Half-Blo... 2.5
Harry Potter and the Order of... 5.0
...
230898 rows x 1 columns
As a follow up to the first question, I tried to convert the nested DataFrame to a nested Dictionary.
The new nested DataFrame column indexing above shows "reviewerRatings" in the 1st row (column 3) and "reviewerName" and "title" in the 2nd row (column 1 and 2), and when I run the df.to_dict() method below, output shows {reviewerRatingsIndexName: {(reviewerName, title): reviewerRatings}}
input: df.to_dict()
output:
{'reviewerRatings':
{
('Charles', 'Harry Potter Book Seven News:...'): 3.0,
('Charles', 'Harry Potter and the Half-Blo...'): 3.5,
('Charles', 'Harry Potter and the Order of...'): 4.0,
('Katherine', 'Harry Potter Boxed Set, Books...'): 5.0,
('Katherine', 'Harry Potter and the Half-Blo...'): 2.5,
('Katherine', 'Harry Potter and the Order of...'): 5.0,
...}
}
But for my desired output below, I'm looking to get my output as {reviewerName: {title: reviewerRating}} which is exactly the way I had sorted in the nested DataFrame.
{'Charles':
{'Harry Potter Book Seven News:...': 3.0,
'Harry Potter and the Half-Blo...': 3.5,
'Harry Potter and the Order of...': 4.0},
'Katherine':
{'Harry Potter Boxed Set, Books...': 5.0,
'Harry Potter and the Half-Blo...': 2.5,
'Harry Potter and the Order of...': 5.0},
...}
Is there any way to manipulate the nested DataFrame or nested Dictionary so that when I run df.to_dict() method, it would show {reviewerName: {title: reviewerRating}}.
Thanks!
Use groupby with lambda function for dictionaries per reviewerName and then output Series convert by to_dict:
print (df)
reviewerName title reviewerRatings
0 Charles Harry Potter Book Seven News:... 3.0
1 Charles Harry Potter Boxed Set, Books... 5.0
2 Charles Harry Potter and the Sorcerer... 5.0
3 Katherine Harry Potter and the Half-Blo... 5.0
4 Katherine Harry otter and the Order of... 5.0
d = (df.groupby('reviewerName')['title','reviewerRatings']
.apply(lambda x: dict(x.values))
.to_dict())
print (d)
{
'Charles': {
'Harry Potter Book Seven News:...': 3.0,
'Harry Potter Boxed Set, Books...': 5.0,
'Harry Potter and the Sorcerer...': 5.0
},
'Katherine': {
'Harry Potter and the Half-Blo...': 5.0,
'Harry otter and the Order of...': 5.0
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With