I have a python dictionary of user-item ratings that looks something like this:
sample={'user1': {'item1': 2.5, 'item2': 3.5, 'item3': 3.0, 'item4': 3.5, 'item5': 2.5, 'item6': 3.0},
'user2': {'item1': 2.5, 'item2': 3.0, 'item3': 3.5, 'item4': 4.0},
'user3': {'item2':4.5,'item5':1.0,'item6':4.0}}
I was looking to convert it into a pandas data frame that would be structured like
col1 col2 col3
0 user1 item1 2.5
1 user1 item2 3.5
2 user1 item3 3.0
3 user1 item4 3.5
4 user1 item5 2.5
5 user1 item6 3.0
6 user2 item1 2.5
7 user2 item2 3.0
8 user2 item3 3.5
9 user2 item4 4.0
10 user3 item2 4.5
11 user3 item5 1.0
12 user3 item6 4.0
Any ideas would be much appreciated :)
You can convert a dictionary to Pandas Dataframe using df = pd. DataFrame. from_dict(my_dict) statement.
Method 1: Create DataFrame from Dictionary using default Constructor of pandas. Dataframe class. Method 2: Create DataFrame from Dictionary with user-defined indexes. Method 3: Create DataFrame from simple dictionary i.e dictionary with key and simple value like integer or string value.
We can convert a dictionary to a pandas dataframe by using the pd. DataFrame. from_dict() class-method. Example 1: Passing the key value as a list.
A pandas DataFrame can be converted into a Python dictionary using the DataFrame instance method to_dict(). The output can be specified of various orientations using the parameter orient. In dictionary orientation, for each column of the DataFrame the column value is listed against the row label in a dictionary.
Try following code:
import pandas
sample={'user1': {'item1': 2.5, 'item2': 3.5, 'item3': 3.0, 'item4': 3.5, 'item5': 2.5, 'item6': 3.0},
'user2': {'item1': 2.5, 'item2': 3.0, 'item3': 3.5, 'item4': 4.0},
'user3': {'item2':4.5,'item5':1.0,'item6':4.0}}
df = pandas.DataFrame([
[col1,col2,col3] for col1, d in sample.items() for col2, col3 in d.items()
])
I think the operation you're after -- to unpivot a table -- is called "melting". In this case, the hard part can be done by pd.melt
, and everything else is basically renaming and reordering:
df = pd.DataFrame(sample).reset_index().rename(columns={"index": "item"})
df = pd.melt(df, "item", var_name="user").dropna()
df = df[["user", "item", "value"]].reset_index(drop=True)
Simply calling DataFrame
produces something which has the information we want but has the wrong shape:
>>> df = pd.DataFrame(sample)
>>> df
user1 user2 user3
item1 2.5 2.5 NaN
item2 3.5 3.0 4.5
item3 3.0 3.5 NaN
item4 3.5 4.0 NaN
item5 2.5 NaN 1.0
item6 3.0 NaN 4.0
So let's promote the index to a real column and improve the name:
>>> df = pd.DataFrame(sample).reset_index().rename(columns={"index": "item"})
>>> df
item user1 user2 user3
0 item1 2.5 2.5 NaN
1 item2 3.5 3.0 4.5
2 item3 3.0 3.5 NaN
3 item4 3.5 4.0 NaN
4 item5 2.5 NaN 1.0
5 item6 3.0 NaN 4.0
Then we can call pd.melt
to turn the columns. If we don't specify the variable name we want, "user", it'll give it the boring name of "variable" (just like it gives the data itself the boring name "value").
>>> df = pd.melt(df, "item", var_name="user").dropna()
>>> df
item user value
0 item1 user1 2.5
1 item2 user1 3.5
2 item3 user1 3.0
3 item4 user1 3.5
4 item5 user1 2.5
5 item6 user1 3.0
6 item1 user2 2.5
7 item2 user2 3.0
8 item3 user2 3.5
9 item4 user2 4.0
13 item2 user3 4.5
16 item5 user3 1.0
17 item6 user3 4.0
Finally, we can reorder and renumber the indices:
>>> df = df[["user", "item", "value"]].reset_index(drop=True)
>>> df
user item value
0 user1 item1 2.5
1 user1 item2 3.5
2 user1 item3 3.0
3 user1 item4 3.5
4 user1 item5 2.5
5 user1 item6 3.0
6 user2 item1 2.5
7 user2 item2 3.0
8 user2 item3 3.5
9 user2 item4 4.0
10 user3 item2 4.5
11 user3 item5 1.0
12 user3 item6 4.0
melt
is pretty useful once you get used to it. Usually, as here, you do some renaming/reordering before and after.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With