I have my data as a DataFrame
:
dOpen dHigh dLow dClose dVolume day_of_week_0 day_of_week_1 ... month_6 month_7 month_8 month_9 month_10 month_11 month_12
639 -0.002498 -0.000278 -0.005576 -0.002228 -0.002229 0 0 ... 0 0 1 0 0 0 0
640 -0.004174 -0.005275 -0.005607 -0.005583 -0.005584 0 0 ... 0 0 1 0 0 0 0
641 -0.002235 0.003070 0.004511 0.008984 0.008984 1 0 ... 0 0 1 0 0 0 0
642 0.006161 -0.000278 -0.000281 -0.001948 -0.001948 0 1 ... 0 0 1 0 0 0 0
643 -0.002505 0.001113 0.005053 0.002788 0.002788 0 0 ... 0 0 1 0 0 0 0
644 0.004185 0.000556 -0.000559 -0.001668 -0.001668 0 0 ... 0 0 1 0 0 0 0
645 0.002779 0.003056 0.003913 0.001114 0.001114 0 0 ... 0 0 1 0 0 0 0
646 0.000277 0.004155 -0.002227 -0.002782 -0.002782 1 0 ... 0 0 1 0 0 0 0
647 -0.005540 -0.007448 -0.003348 0.001953 0.001953 0 1 ... 0 0 1 0 0 0 0
648 0.001393 -0.000278 0.001960 -0.003619 -0.003619 0 0 ... 0 0 1 0 0 0 0
My input will be 10 rows (already one-hot encoded). I want to create an n-dimensional auto encoded representation. So as I understand it, my input and output should be the same.
I've seen some examples to construct this, but am still stuck on the first step. Is my training data just a lot of those samples as to make a matrix? What then?
I apologize for the general nature of the question. Any questions, just ask and I will clarify in the comments.
Thank you.
It isn't quite clear from the question what you are trying to achieve. Based on what you wrote you want to create an autoencoder with the same input and output and that doesn't quite make sense to me when I see your data set. In the common case, the encoder part of the autoencoder creates a model which, based on a large set of input features produces a small output vector and decoder is performing an inverse operation of reconstruction of the plausible input features based on the full set of output and input features. A result of using an autoencoder is enhanced (in some meaning, like with noise removed, etc) input.
You can find a few examples here with the 3rd use case providing code for the sequence data, learning random number generation model. Here is another example, which looks closer to your application. A sequential model is constructed to encode a large data set with information loss. If that is what you are trying to achieve, you'll find the code there.
If the goal is a sequence prediction (like future stock prices), this and that example seem to be more appropriate as you likely only want to predict a handful of values in your data sequence (say dHigh
and dLow
) and you don't need to predict day_of_week_n
or the month_n
(even though that part of autoencoder model probably will train much more reliable as the pattern is pretty clear). This approach will allow you to predict a single consequent output feature value (tomorrow's dHigh
and dLow
)
If you want to predict a sequence of future outputs you can use a sequence of outputs, rather than a single one in your model.
In general, the structure of inputs and outputs is totally up to you
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With