I am trying to implement linear regression using python.
I did the following steps:
import pandas as p
import numpy as n
data = p.read_csv("...path\Housing.csv", usecols=[1]) # I want the first col
data1 = p.read_csv("...path\Housing.csv", usecols=[3]) # I want the 3rd col
x = data
y = data1
Then I try to obtain the co-efficients, and use the following:
regression_coeff = n.polyfit(x,y,1)
And then I get the following error:
raise TypeError("expected 1D vector for x")
TypeError: expected 1D vector for x
I am unable to get my head around this, as when I print x and y, I can very clearly see that they are both 1D vectors.
Can someone please help?
Dataset can be found here: DataSets
The original code is:
import pandas as p
import numpy as n
data = pd.read_csv('...\housing.csv', usecols = [1])
data1 = pd.read_csv('...\housing.csv', usecols = [3])
x = data
y = data1
regression = n.polyfit(x, y, 1)
This should work:
np.polyfit(data.values.flatten(), data1.values.flatten(), 1)
data is a dataframe and its values are 2D:
>>> data.values.shape
(546, 1)
flatten() turns it into 1D array:
>> data.values.flatten().shape
(546,)
which is needed for polyfit().
Simpler alternative:
df = pd.read_csv("Housing.csv")
np.polyfit(df['price'], df['bedrooms'], 1)
pandas.read_csv() returns a DataFrame, which has two dimensions while np.polyfit wants a 1D vector for both x and y for a single fit. You can simply convert the output of read_csv() to a pd.Series to match the np.polyfit() input format using .squeeze():
data = pd.read_csv('../Housing.csv', usecols = [1]).squeeze()
data1 = p.read_csv("...path\Housing.csv", usecols=[3]).squeeze()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With