Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keep pandas structure with numpy/scikit functions

I'm using the excellent read_csv()function from pandas, which gives:

In [31]: data = pandas.read_csv("lala.csv", delimiter=",")

In [32]: data
Out[32]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 12083 entries, 0 to 12082
Columns: 569 entries, REGIONC to SCALEKER
dtypes: float64(51), int64(518)

but when i apply a function from scikit-learn i loose the informations about columns:

from sklearn import preprocessing
preprocessing.scale(data)

gives numpy array.

Is there a way to apply scikit or numpy function to DataFrames without loosing the information?

like image 759
Mermoz Avatar asked Feb 11 '13 13:02

Mermoz


People also ask

Can Sklearn work with Pandas?

Generally, scikit-learn works on any numeric data stored as numpy arrays or scipy sparse matrices. Other types that are convertible to numeric arrays such as pandas DataFrame are also acceptable.

Can you use NumPy with Pandas?

Pandas is built on top of NumPy, which means the Python pandas package depends on the NumPy package and also pandas intended with many other 3rd party libraries. So we can say that Numpy is required for operating the Pandas.

What is the advantage of using Pandas series data structure in comparison to a NumPy array?

Pandas has a better performance when a number of rows is 500K or more. Numpy has a better performance when number of rows is 50K or less. Indexing of the pandas series is very slow as compared to numpy arrays.

Is Pandas a wrapper for NumPy?

Pandas is built on top of NumPy. You could roughly define a Series as a wrapper around a NumPy array, and a DataFrame as a collection of Series with a shared index.


1 Answers

This can be done by wrapping the returned data in a dataframe, with index and columns information in.

import pandas as pd
pd.DataFrame(preprocessing.scale(data), index = data.index, columns = data.columns) 
like image 192
Mermoz Avatar answered Sep 28 '22 17:09

Mermoz