I was fitting a logistic regression with a subset dataset. After splitting the dataset and fitting the model, I got a error message of the following:
/Users/Eddie/anaconda/lib/python3.4/site-packages/sklearn/utils/validation.py:526: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
So I use target_newrdn = target_newrdn.ravel()
to modify my target variable but it gave me this:
AttributeError: 'DataFrame' object has no attribute 'ravel'
I am wondering what the problem was and how can I fix? Can anyone help, please?
My code:
from sklearn.datasets import fetch_covtype
import numpy as np
import pandas as pd
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
cov = fetch_covtype()
cov_data = pd.DataFrame(cov.data)
cov_target = pd.DataFrame(cov.target)
data_newrdn = cov_data.head(n=10000)
target_newrdn = cov_target.head(n=10000)
target_newrdn = target_newrdn.ravel() ## I thought this could fix it??
X_train2, X_test2, y_train2, y_test2 = train_test_split(data_newrdn,
target_newrdn, random_state=42)
scaler.fit(X_train2)
X_train_scaled2 = scaler.transform(X_train2)
# Logistic Regression
param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100, 1000]}
print(param_grid)
grid = GridSearchCV(LogisticRegression(), param_grid, cv=kfold)
grid.fit(X_train_scaled2, y_train2)
print("Best cross-validation score w/ kfold:
{:.2f}".format(grid.best_score_))
print("Best parameters: ", grid.best_params_)
Fix error while creating the dataframe If we use dataframe it will throw an error because there is no dataframe attribute in pandas. The method is DataFrame(). We need to pass any dictionary as an argument. Since the dictionary has a key, value pairs we can pass it as an argument.
ravel() function returns the flattened underlying data as an ndarray. Syntax: Series.ravel(order='C') Parameter : order. Returns : ndarray.
the reason of " 'DataFrame' object has no attribute 'Number'/'Close'/or any col name " is because you are looking at the col name and it seems to be "Number" but in reality it is " Number" or "Number " , that extra space is because in the excel sheet col name is written in that format.
The describe() method returns description of the data in the DataFrame. If the DataFrame contains numerical data, the description contains these information for each column: count - The number of not-empty values. mean - The average (mean) value.
Clearly, dataframe does not have ravel
function. Try:
target_newrdn.values.ravel()
target_newrdn.values
returns a numpy ndarray and you perform ravel
on that. Note this returns a flattened numpy array. You may need to convert back to a dataframe.
But I think you need flatten()
instead, because it returns a copy and so if you modify the array returned by ravel, it does not modify the entries in the original array.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With