<p>Can we run scikit-learn models on Pandas DataFrames or do we need to convert DataFrames into NumPy arrays?</p>

<p>Pandas DataFrames are very good at acting like Numpy arrays when they need to. If in doubt, you can always use the <code>values</code> attribute to get a Numpy representation (<code>df.values</code> will give you a Numpy array of the values in DataFrame <code>df</code>.</p>

Pandas dataset into an array for modelling in Scikit-Learn

2 Answers

You can use pandas.DataFrame with sklearn, for example:

import pandas as pd
from sklearn.cluster import KMeans

data = [(0.2, 10),
        (0.3, 12),
        (0.24, 14),
        (0.8, 30),
        (0.9, 32),
        (0.85, 33.3),
        (0.91, 31),
        (0.1, 15),
        (-0.23, 45)]

p_df = pd.DataFrame(data)
kmeans = KMeans(init='k-means++', n_clusters=3, n_init=10)
kmeans.fit(p_df)

Result:

>>> kmeans.labels_
array([0, 0, 0, 2, 2, 2, 2, 0, 1], dtype=int32)

answered Oct 08 '22 17:10

Akavall

Pandas DataFrames are very good at acting like Numpy arrays when they need to. If in doubt, you can always use the values attribute to get a Numpy representation (df.values will give you a Numpy array of the values in DataFrame df.

answered Oct 08 '22 17:10

Greg

Related questions
                            
                                Better to add item to a set, or convert final list to a set?
                            
                                What happened on March 16th 1984?
                            
                                Python Not Finding Module
                            
                                How to show database errors to user in Django Admin
                            
                                Accessing the parent object's size parameters in kivy widgets
                            
                                Datetime Field Received a Naive Datetime
                            
                                How suitable is opting for RethinkDB instead of traditional SQL for a JSON API? [closed]
                            
                                Flask default error handler not being called
                            
                                close() never close connections in pymongo?
                            
                                Autopep8 not breaking long comment lines?
                            
                                Flask Button run Python without refreshing page?
                            
                                Scrapy: What's the correct way to use start_requests()?
                            
                                Selenium with GhostDriver in Python on Windows
                            
                                Recursive reference to a list within itself [duplicate]
                            
                                Matplotlib subplots figure - savefig() won't output PDF. "NoneType" error
                            
                                How to generate coverage from setup.py
                            
                                Modify or Delete Exif tag 'Orientation' in Python
                            
                                Why doesn't join() automatically convert its arguments to strings? When would you ever not want them to be strings?
                            
                                Python: How to import from an __init__.py file?
                            
                                How to retrieve Facebook friend's information with Python-Social-auth and Django

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas dataset into an array for modelling in Scikit-Learn

Tags:

python

pandas

scikit-learn

user40465

People also ask

2 Answers

Akavall

Greg

Recent Activity

Donate For Us