I have a python dictionary of user-item ratings that looks something like this: <pre class="prettyprint"><code>sample={'user1': {'item1': 2.5, 'item2': 3.5, 'item3': 3.0, 'item4': 3.5, 'item5': 2.5, 'item6': 3.0}, 'user2': {'item1': 2.5, 'item2': 3.0, 'item3': 3.5, 'item4': 4.0}, 'user3': {'item2':4.5,'item5':1.0,'item6':4.0}} </code></pre> I was looking to convert it into a pandas data frame that would be structured like <pre class="prettyprint"><code> col1 col2 col3 0 user1 item1 2.5 1 user1 item2 3.5 2 user1 item3 3.0 3 user1 item4 3.5 4 user1 item5 2.5 5 user1 item6 3.0 6 user2 item1 2.5 7 user2 item2 3.0 8 user2 item3 3.5 9 user2 item4 4.0 10 user3 item2 4.5 11 user3 item5 1.0 12 user3 item6 4.0 </code></pre> Any ideas would be much appreciated :)

I think the operation you're after -- to unpivot a table -- is called "melting". In this case, the hard part can be done by <code>pd.melt</code>, and everything else is basically renaming and reordering: <pre class="prettyprint"><code>df = pd.DataFrame(sample).reset_index().rename(columns={"index": "item"}) df = pd.melt(df, "item", var_name="user").dropna() df = df[["user", "item", "value"]].reset_index(drop=True) </code></pre> <hr> Simply calling <code>DataFrame</code> produces something which has the information we want but has the wrong shape: <pre class="prettyprint"><code>>>> df = pd.DataFrame(sample) >>> df user1 user2 user3 item1 2.5 2.5 NaN item2 3.5 3.0 4.5 item3 3.0 3.5 NaN item4 3.5 4.0 NaN item5 2.5 NaN 1.0 item6 3.0 NaN 4.0 </code></pre> So let's promote the index to a real column and improve the name: <pre class="prettyprint"><code>>>> df = pd.DataFrame(sample).reset_index().rename(columns={"index": "item"}) >>> df item user1 user2 user3 0 item1 2.5 2.5 NaN 1 item2 3.5 3.0 4.5 2 item3 3.0 3.5 NaN 3 item4 3.5 4.0 NaN 4 item5 2.5 NaN 1.0 5 item6 3.0 NaN 4.0 </code></pre> Then we can call <code>pd.melt</code> to turn the columns. If we don't specify the variable name we want, "user", it'll give it the boring name of "variable" (just like it gives the data itself the boring name "value"). <pre class="prettyprint"><code>>>> df = pd.melt(df, "item", var_name="user").dropna() >>> df item user value 0 item1 user1 2.5 1 item2 user1 3.5 2 item3 user1 3.0 3 item4 user1 3.5 4 item5 user1 2.5 5 item6 user1 3.0 6 item1 user2 2.5 7 item2 user2 3.0 8 item3 user2 3.5 9 item4 user2 4.0 13 item2 user3 4.5 16 item5 user3 1.0 17 item6 user3 4.0 </code></pre> Finally, we can reorder and renumber the indices: <pre class="prettyprint"><code>>>> df = df[["user", "item", "value"]].reset_index(drop=True) >>> df user item value 0 user1 item1 2.5 1 user1 item2 3.5 2 user1 item3 3.0 3 user1 item4 3.5 4 user1 item5 2.5 5 user1 item6 3.0 6 user2 item1 2.5 7 user2 item2 3.0 8 user2 item3 3.5 9 user2 item4 4.0 10 user3 item2 4.5 11 user3 item5 1.0 12 user3 item6 4.0 </code></pre> <code>melt</code> is pretty useful once you get used to it. Usually, as here, you do some renaming/reordering before and after.

Pandas data frame from dictionary

Tags:

python

pandas

I have a python dictionary of user-item ratings that looks something like this:

sample={'user1': {'item1': 2.5, 'item2': 3.5, 'item3': 3.0, 'item4': 3.5, 'item5': 2.5, 'item6': 3.0}, 
'user2': {'item1': 2.5, 'item2': 3.0, 'item3': 3.5, 'item4': 4.0}, 
'user3': {'item2':4.5,'item5':1.0,'item6':4.0}}

I was looking to convert it into a pandas data frame that would be structured like

     col1   col2  col3
0   user1  item1   2.5
1   user1  item2   3.5
2   user1  item3   3.0
3   user1  item4   3.5
4   user1  item5   2.5
5   user1  item6   3.0
6   user2  item1   2.5
7   user2  item2   3.0
8   user2  item3   3.5
9   user2  item4   4.0
10  user3  item2   4.5
11  user3  item5   1.0
12  user3  item6   4.0

Any ideas would be much appreciated :)

490

asked Aug 10 '13 12:08

Godel

2 Answers

Try following code:

import pandas

sample={'user1': {'item1': 2.5, 'item2': 3.5, 'item3': 3.0, 'item4': 3.5, 'item5': 2.5, 'item6': 3.0},
        'user2': {'item1': 2.5, 'item2': 3.0, 'item3': 3.5, 'item4': 4.0},
        'user3': {'item2':4.5,'item5':1.0,'item6':4.0}}

df = pandas.DataFrame([
    [col1,col2,col3] for col1, d in sample.items() for col2, col3 in d.items()
])

answered Sep 19 '22 03:09

falsetru

I think the operation you're after -- to unpivot a table -- is called "melting". In this case, the hard part can be done by pd.melt, and everything else is basically renaming and reordering:

df = pd.DataFrame(sample).reset_index().rename(columns={"index": "item"})
df = pd.melt(df, "item", var_name="user").dropna()
df = df[["user", "item", "value"]].reset_index(drop=True)

Simply calling DataFrame produces something which has the information we want but has the wrong shape:

>>> df = pd.DataFrame(sample)
>>> df
       user1  user2  user3
item1    2.5    2.5    NaN
item2    3.5    3.0    4.5
item3    3.0    3.5    NaN
item4    3.5    4.0    NaN
item5    2.5    NaN    1.0
item6    3.0    NaN    4.0

So let's promote the index to a real column and improve the name:

>>> df = pd.DataFrame(sample).reset_index().rename(columns={"index": "item"})
>>> df
    item  user1  user2  user3
0  item1    2.5    2.5    NaN
1  item2    3.5    3.0    4.5
2  item3    3.0    3.5    NaN
3  item4    3.5    4.0    NaN
4  item5    2.5    NaN    1.0
5  item6    3.0    NaN    4.0

Then we can call pd.melt to turn the columns. If we don't specify the variable name we want, "user", it'll give it the boring name of "variable" (just like it gives the data itself the boring name "value").

>>> df = pd.melt(df, "item", var_name="user").dropna()
>>> df
     item   user  value
0   item1  user1    2.5
1   item2  user1    3.5
2   item3  user1    3.0
3   item4  user1    3.5
4   item5  user1    2.5
5   item6  user1    3.0
6   item1  user2    2.5
7   item2  user2    3.0
8   item3  user2    3.5
9   item4  user2    4.0
13  item2  user3    4.5
16  item5  user3    1.0
17  item6  user3    4.0

Finally, we can reorder and renumber the indices:

>>> df = df[["user", "item", "value"]].reset_index(drop=True)
>>> df
     user   item  value
0   user1  item1    2.5
1   user1  item2    3.5
2   user1  item3    3.0
3   user1  item4    3.5
4   user1  item5    2.5
5   user1  item6    3.0
6   user2  item1    2.5
7   user2  item2    3.0
8   user2  item3    3.5
9   user2  item4    4.0
10  user3  item2    4.5
11  user3  item5    1.0
12  user3  item6    4.0

melt is pretty useful once you get used to it. Usually, as here, you do some renaming/reordering before and after.

answered Sep 17 '22 03:09

DSM

Related questions
                            
                                TypeError: unsupported operand type(s) for +: 'PosixPath' and 'str'
                            
                                What does the return value of gc.collect() actually mean?
                            
                                Plotly: How to change figure size?
                            
                                recursive lambda-expressions possible?
                            
                                Eclipse+PyDev+GAE memcache "Undefined variable from import: get"
                            
                                Resident Set Size (RSS) limit has no effect
                            
                                howto uncompress gzipped data in a byte array?
                            
                                Relative imports in python 2.5
                            
                                Login to website using python
                            
                                Convert numbers to grades in python list
                            
                                Python - dealing with mixed-encoding files
                            
                                Python: two-curve gaussian fitting with non-linear least-squares
                            
                                Solving Puzzle in Python
                            
                                Running command lines within your Python script
                            
                                OpenCV 2.4.1 - computing SURF descriptors in Python
                            
                                Is there a C/C++ API for python pandas? [closed]
                            
                                SQLAlchemy introspect column type with inheritance
                            
                                Apply function to pandas DataFrame that can return multiple rows
                            
                                Multiple legends in matplotlib in for loop
                            
                                Calling a function upon button press

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With