Rearrange a pandas data frame to create a 2d ratings matrix

Tags:

I'm trying to build a item-based recommendation system off of the yelp data set. I managed to process the data to an extent where I have the ratings given by all the users that reviewed a restaurant in a given state. Eventually I want to get to the point where I have a ratings matrix with restaurants on one axis and users on the other, and ratings(1-5) in the middle (zero for missing reviews).

Right now the DF looks like this:

               user_id               review_id             business_id  stars
0  Xqd0DzHaiyRqVH3WRG7  15SdjuK7DmYqUAj6rjGowg  vcNAWiLM4dR7D2nwwJ7nCA      5
1  Xqd0DzHaiyRqVH3WRG7  15SdjuK7DmYqUAj6rjGowg  vcNAWiLM4dR7D2nwwJ7nCA      5
2  H1kH6QZV7Le4zqTRNxo  RF6UnRTtG7tWMcrO2GEoAg  vcNAWiLM4dR7D2nwwJ7nCA      2
3  zvJCcrpm2yOZrxKffwG  -TsVN230RCkLYKBeLsuz7A  vcNAWiLM4dR7D2nwwJ7nCA      4
4  KBLW4wJA_fwoWmMhiHR  dNocEAyUucjT371NNND41Q  vcNAWiLM4dR7D2nwwJ7nCA      4
5  zvJCcrpm2yOZrxKffwG  ebcN2aqmNUuYNoyvQErgnA  vcNAWiLM4dR7D2nwwJ7nCA      4
6  Qrs3EICADUKNFoUq2iH  _ePLBPrkrf4bhyiKWEn4Qg  vcNAWiLM4dR7D2nwwJ7nCA      1

but I would like it to look a little bit more like this:

(4 Restaurants x 5 Users)

237

asked Jun 01 '16 18:06

mmera

1 Answers

I think you need pivot with fillna

print (df.pivot(index='business_id', columns='user_id', values='stars').fillna(0))

If:

ValueError: Index contains duplicate entries, cannot reshape

Then use pivot_table:

print (df.pivot_table(index='business_id', columns='user_id', values='stars').fillna(0))
user_id                 H1kH6QZV7Le4zqTRNxo  KBLW4wJA_fwoWmMhiHR  \
business_id                                                        
vcNAWiLM4dR7D2nwwJ7nCA                    2                    4   

user_id                 Qrs3EICADUKNFoUq2iH  Xqd0DzHaiyRqVH3WRG7  \
business_id                                                        
vcNAWiLM4dR7D2nwwJ7nCA                    1                    5   

user_id                 zvJCcrpm2yOZrxKffwG  
business_id                                  
vcNAWiLM4dR7D2nwwJ7nCA                    4

But pivot_table uses aggfunc, default is aggfunc=np.mean if duplicates. Better explanation with sample is here and in docs.

187

answered Oct 06 '22 00:10

jezrael

Related questions
                            
                                Python Pandas, Resampling only specific hours
                            
                                OpenCV how to smooth contour, reducing noise
                            
                                How to covert a list of lists into dataframe and make the first element of the lists as the index
                            
                                A single string in single quotes with PyYAML
                            
                                Using seaborn barplot to plot wide-form dataframes
                            
                                How can i connect pyRserve with Python
                            
                                Why does separating my module into multiple files make it slower?
                            
                                Bad file descriptor in Python 2.7
                            
                                How can I use mock_open with a Python UnitTest decorator?
                            
                                Anonym password protect pages without username with Flask
                            
                                Virtual Environments: python -m venv VS echo layout python3
                            
                                How can one mark a flag as required with gflags?
                            
                                Download azure blob via stream - Exit 137
                            
                                How to scan for a string literal allowing escaped characters?
                            
                                Is it possible to trigger a mousePressEvent artificially on a QWebView?
                            
                                Determinate if class has user defined __init__
                            
                                How can I declare a Column as a categorical feature in a DataFrame for use in ml
                            
                                What does ${python3:Depends} mean in a debian source-package control file?
                            
                                attributeError: can't set attribute with flask-SQLAlchemy [duplicate]
                            
                                Error Installing Pyproj in Python 3.5

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Rearrange a pandas data frame to create a 2d ratings matrix

Tags:

python

pandas

dataframe

recommendation-engine

yelp

mmera

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us