This semester I started working with ML. We have only used APIs such as Microsoft's Azure and Amazon's AWS, but we have not gone in depth about how those services work. My good friend, who is a Math major senior, asked me to help him create a stock predictor with TensorFlow based on a .csv
the file he provided me.
There are a few problems I have. The first one is his .csv
file. The file has only dates and closing values, which are not separated, therefore I had to manually separate the dates and values. I've managed to do that, and now I'm having trouble with the MinMaxScaler(). I was told I could pretty much disregard the dates and only test the closing values, normalize them, and make a prediction based off of them.
I keep getting this error:
ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required by MinMaxScaler()
I honestly have not ever used SKLearning
and TensorFlow before, and it is my first time working on such a project. All the guides I see on the topic utilize pandas, but in my case, the .csv
file is a mess and I don't believe I can use pandas for it.
I'm following this guide:
But unfortunately, due to my lack of experience, some things are not really working for me, and I would appreciate a little more clarity of how I should proceed in my case.
Attached below is my (messy) code:
import pandas as pd
import numpy as np
import tensorflow as tf
import sklearn
from sklearn.model_selection import KFold
from sklearn.preprocessing import scale
from sklearn.preprocessing import MinMaxScaler
import matplotlib
import matplotlib.pyplot as plt
from dateutil.parser import parse
from datetime import datetime, timedelta
from collections import deque
stock_data = []
stock_date = []
stock_value = []
f = open("s&p500closing.csv","r")
data = f.read()
rows = data.split("\n")
rows_noheader = rows[1:len(rows)]
#Separating values from messy `.csv`, putting each value to it's list and also a combined list of both
for row in rows_noheader:
[date, value] = row[1:len(row)-1].split('\t')
stock_date.append(date)
stock_value.append((value))
stock_data.append((date, value))
#Numpy array of all closing values converted to floats and normalized against the maximum
stock_value = np.array(stock_value, dtype=np.float32)
normvalue = [i/max(stock_value) for i in stock_value]
#Number of closing values and days. Since there is one closing value for each, they both match and there are 4528 of them (each)
nclose_and_days = 0
for i in range(len(stock_data)):
nclose_and_days+=1
train_data = stock_value[:2264]
test_data = stock_value[2264:]
scaler = MinMaxScaler()
train_data = train_data.reshape(-1,1)
test_data = test_data.reshape(-1,1)
# Train the Scaler with training data and smooth data
smoothing_window_size = 1100
for di in range(0,4400,smoothing_window_size):
#error occurs here
scaler.fit(train_data[di:di+smoothing_window_size,:])
train_data[di:di+smoothing_window_size,:] = scaler.transform(train_data[di:di+smoothing_window_size,:])
# You normalize the last bit of remaining data
scaler.fit(train_data[di+smoothing_window_size:,:])
train_data[di+smoothing_window_size:,:] = scaler.transform(train_data[di+smoothing_window_size:,:])
# Reshape both train and test data
train_data = train_data.reshape(-1)
# Normalize test data
test_data = scaler.transform(test_data).reshape(-1)
# Now perform exponential moving average smoothing
# So the data will have a smoother curve than the original ragged data
EMA = 0.0
gamma = 0.1
for ti in range(1100):
EMA = gamma*train_data[ti] + (1-gamma)*EMA
train_data[ti] = EMA
# Used for visualization and test purposes
all_mid_data = np.concatenate([train_data,test_data],axis=0)
window_size = 100
N = train_data.size
std_avg_predictions = []
std_avg_x = []
mse_errors = []
for pred_idx in range(window_size,N):
std_avg_predictions.append(np.mean(train_data[pred_idx-window_size:pred_idx]))
mse_errors.append((std_avg_predictions[-1]-train_data[pred_idx])**2)
std_avg_x.append(date)
print('MSE error for standard averaging: %.5f'%(0.5*np.mean(mse_errors)))
I know that this post is old, but as I stumbled here, others will.. After running in the same problem and googling quite a bit I found a post https://github.com/llSourcell/Make_Money_with_Tensorflow_2.0/issues/7
so it seems that if you download a too small dataset it will throw that error. Download a .csv from 1962 and it'll be big enough ;).
Now,I just have to find the right parameters for my dataset..as I'm adapting this to another type o prediction.. Hope it helps
The train_data
variable has a length of 2264:
train_data = stock_value[:2264]
Then, when you go to fit the scaler, you go outside of train_data
's bounds on the third iteration of the for loop:
smoothing_window_size = 1100
for di in range(0, 4400, smoothing_window_size):
Notice the size of the data set in the tutorial. The training and testing chunks each have length 11,000, and the smoothing_window_size
is 2500, so it will never go exceed train_data
's boundaries.
You have a column of all 0's in your data. If you try to scale it the MinMaxScaler can't assign a scale and it trips up. You need to filter out empty/0 columns before you scale the data. Try :
stock_value=stock_value[:,~np.all(np.isnan(d), axis=0)]
to filter out nan columns in your data
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With