Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataset into an array for modelling in Scikit-Learn

Can we run scikit-learn models on Pandas DataFrames or do we need to convert DataFrames into NumPy arrays?

like image 610
user40465 Avatar asked Mar 21 '14 14:03

user40465


People also ask

Can scikit-learn use pandas DataFrame?

Generally, scikit-learn works on any numeric data stored as numpy arrays or scipy sparse matrices. Other types that are convertible to numeric arrays such as pandas DataFrame are also acceptable.

How do you convert a dataset to an array in Python?

To convert Pandas DataFrame to Numpy Array, use the function DataFrame. to_numpy() . to_numpy() is applied on this DataFrame and the method returns object of type Numpy ndarray. Usually the returned ndarray is 2-dimensional.


2 Answers

You can use pandas.DataFrame with sklearn, for example:

import pandas as pd
from sklearn.cluster import KMeans

data = [(0.2, 10),
        (0.3, 12),
        (0.24, 14),
        (0.8, 30),
        (0.9, 32),
        (0.85, 33.3),
        (0.91, 31),
        (0.1, 15),
        (-0.23, 45)]

p_df = pd.DataFrame(data)
kmeans = KMeans(init='k-means++', n_clusters=3, n_init=10)
kmeans.fit(p_df)

Result:

>>> kmeans.labels_
array([0, 0, 0, 2, 2, 2, 2, 0, 1], dtype=int32)
like image 86
Akavall Avatar answered Oct 08 '22 17:10

Akavall


Pandas DataFrames are very good at acting like Numpy arrays when they need to. If in doubt, you can always use the values attribute to get a Numpy representation (df.values will give you a Numpy array of the values in DataFrame df.

like image 23
Greg Avatar answered Oct 08 '22 17:10

Greg