Python/Scikit-learn/regressions - from pandas Dataframes to Scikit prediction

Tags:

I have the following pandas DataFrame, called main_frame:

            target_var  input1  input2  input3  input4  input5    input6
Date
2013-09-01        13.0     NaN     NaN     NaN     NaN     NaN       NaN   
2013-10-01        13.0     NaN     NaN     NaN     NaN     NaN       NaN   
2013-11-01        12.2     NaN     NaN     NaN     NaN     NaN       NaN   
2013-12-01        10.9     NaN     NaN     NaN     NaN     NaN       NaN   
2014-01-01        11.7       0      13      42       0       0        16   
2014-02-01        12.0      13       8      58       0       0        14   
2014-03-01        12.8      13      15     100       0       0        24   
2014-04-01        13.1       0      11      50      34       0        18   
2014-05-01        12.2      12      14      56      30      71        18   
2014-06-01        11.7      13      16      43      44       0        22   
2014-07-01        11.2       0      19      45      35       0        18   
2014-08-01        11.4      12      16      37      31       0        24   
2014-09-01        10.9      14      14      47      30      56        20   
2014-10-01        10.5      15      17      54      24      56        22   
2014-11-01        10.7      12      18      60      41      63        21   
2014-12-01         9.6      12      14      42      29      53        16   
2015-01-01        10.2      10      16      37      31       0        20   
2015-02-01        10.7      11      20      39      28       0        19   
2015-03-01        10.9      10      17      75      27      87        22   
2015-04-01        10.8      14      17      73      30      43        25   
2015-05-01        10.2      10      17      55      31      52        24

I've been having trouble to explore the dataset on Scikit-learn and I'm not sure if the problem is the pandas Dataset, the dates as index, the NaN's/Infs/Zeros (which I don't know how to solve), everything, something else I wasn't able to track.

I want to build a simple regression to predict the next target_var item based on the variables named "Input" (1,2,3..).

Note that there are a lot of zeros and NaN's in the time series, and eventually we might find Inf's as well.

462

asked Dec 27 '15 21:12

aabujamra

1 Answers

You should first try to remove any row with a Inf, -Inf or NaN values (other methods include filling in the NaNs with, for example, the mean value of the feature).

df = df.replace(to_replace=[np.Inf, -np.Inf], value=np.NaN)
df = df.dropna()

Now, create a numpy matrix of you features and a vector of your targets. Given that your target variable is in the first column, you can use integer based indexing as follows:

X = df.iloc[:, 1:].values
y = df.iloc[:, 0].values

Then create and fit your model:

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X=X, y=y)

Now you can observe your estimates:

>>> model.intercept_
12.109583092421092

>>> model.coef_
array([-0.05269033, -0.17723251,  0.03627883,  0.02219596, -0.01377465,
        0.0111017 ])

answered Oct 18 '22 23:10

Alexander

Related questions
                            
                                Determine the endianness of a numpy array
                            
                                Python optimization using sympy lambdify and scipy
                            
                                Accessing and altering a global array using python joblib
                            
                                Pandoc Syntax Highlighting in PDF not working
                            
                                sympy installed, however sympy.mpmath not found
                            
                                Is there a good reason for setting up virtualenv for python in Docker containers?
                            
                                Why setting a dict shallow copy to itself?
                            
                                How to get all the statistics for a Github repository using the API?
                            
                                python args not working unless it has a position reference [duplicate]
                            
                                Why a single Numpy array element is not a Python scalar?
                            
                                How to remove all attributes from element
                            
                                BeautifulSoup returns empty list when searching by compound class names
                            
                                Nested JSON from CSV
                            
                                Custom OrderedDict that returns itself
                            
                                Python replace 3 random characters in a string with no duplicates
                            
                                Python Requests POST not working
                            
                                Meld error "Cannot import: GTK+; No module named repository"
                            
                                Trade-off in Python dictionary key types
                            
                                Need help decompressing zlib data stored in Aztec barcode (Deutsche Bahn Ticket)
                            
                                Python (LPTHW) Exercise 36 [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python/Scikit-learn/regressions - from pandas Dataframes to Scikit prediction

Tags:

python

pandas

scikit-learn

linear-regression

aabujamra

People also ask

1 Answers

Alexander

Recent Activity

Donate For Us