How to convert a Scikit-learn dataset to a Pandas dataset

People also ask

How do I import sklearn dataset to pandas?

You can convert the sklearn dataset to pandas dataframe by using the pd. Dataframe(data=iris. data) method.

Does scikit-learn work with pandas?

Generally, scikit-learn works on any numeric data stored as numpy arrays or scipy sparse matrices. Other types that are convertible to numeric arrays such as pandas DataFrame are also acceptable.

Manually, you can use pd.DataFrame constructor, giving a numpy array (data) and a list of the names of the columns (columns). To have everything in one DataFrame, you can concatenate the features and the target into one numpy array with np.c_[...] (note the []):

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

# save load_iris() sklearn dataset to iris
# if you'd like to check dataset type use: type(load_iris())
# if you'd like to view list of attributes use: dir(load_iris())
iris = load_iris()

# np.c_ is the numpy concatenate function
# which is used to concat iris['data'] and iris['target'] arrays 
# for pandas column argument: concat iris['feature_names'] list
# and string list (in this case one string); you can make this anything you'd like..  
# the original dataset would probably call this ['Species']
data1 = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                     columns= iris['feature_names'] + ['target'])

from sklearn.datasets import load_iris
import pandas as pd

data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df.head()

This tutorial maybe of interest: http://www.neural.cz/dataset-exploration-boston-house-pricing.html

TOMDLt's solution is not generic enough for all the datasets in scikit-learn. For example it does not work for the boston housing dataset. I propose a different solution which is more universal. No need to use numpy as well.

from sklearn import datasets
import pandas as pd

boston_data = datasets.load_boston()
df_boston = pd.DataFrame(boston_data.data,columns=boston_data.feature_names)
df_boston['target'] = pd.Series(boston_data.target)
df_boston.head()

As a general function:

def sklearn_to_df(sklearn_dataset):
    df = pd.DataFrame(sklearn_dataset.data, columns=sklearn_dataset.feature_names)
    df['target'] = pd.Series(sklearn_dataset.target)
    return df

df_boston = sklearn_to_df(datasets.load_boston())

Took me 2 hours to figure this out

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

iris = load_iris()
##iris.keys()


df= pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                 columns= iris['feature_names'] + ['target'])

df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

Get back the species for my pandas

Related questions
                            
                                class method generates "TypeError: ... got multiple values for keyword argument ..."
                            
                                How do you run your own code alongside Tkinter's event loop?
                            
                                Pythonic way of checking if a condition holds for any element of a list
                            
                                Why can't Python find shared objects that are in directories in sys.path?
                            
                                How to convert string to binary?
                            
                                Saving images in Python at a very high quality
                            
                                Check if Python Package is installed
                            
                                Chained method calls indentation style in Python [duplicate]
                            
                                You are trying to add a non-nullable field 'new_field' to userprofile without a default
                            
                                Making an API call in Python with an API that requires a bearer token
                            
                                ImproperlyConfiguredError about app_name when using namespace in include()
                            
                                Compare two columns using pandas
                            
                                How to get instance variables in Python?
                            
                                Problem HTTP error 403 in Python 3 Web Scraping
                            
                                Can anyone explain me StandardScaler?
                            
                                Convert row to column header for Pandas DataFrame,
                            
                                Why are empty strings returned in split() results?
                            
                                Python: Finding differences between elements of a list
                            
                                Underscore vs Double underscore with variables and methods [duplicate]
                            
                                Integrating Python Poetry with Docker

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to convert a Scikit-learn dataset to a Pandas dataset

Tags:

python

pandas

dataset

scikit-learn

People also ask

Recent Activity

Donate For Us