I'm approaching a problem trying to classify a data sample as good or bad quality with machine learning.
The data sample is stored in a relational database. A sample contains the attributes id, name, number of up-votes (for good/bad quality indication), number of comments, etc. Also there is a table that has items with foreign keys pointing to a data sample id. The items contain a weight and a name. All items together pointing to a data sample characterizes the data sample, which typically could help classify the data sample. The problem is, that the number of items pointing to one foreign key is different for different samples.
I want to feed the Machine Learning input, of e.g. a neural network, with the items that point to a specific data sample. The problem is that I don't know the number of items, so I don't know how many input nodes I want.
Q1) Is it possible to use neural networks when the input dimension is dynamic? If so, how?
Q2) Are there any best practices for feeding a network with a list of tuples, when the length of the list is unknown?
Q3) Are there any best practices for applying machine learning to relational databases?
Fully convolutional neural network is able to do that. Parameters of conv layers are convolutional kernels. Convolutional kernel not so much care about input size(yes there are certain limitations related to stride, padding input and kernel size).
This depends a lot on the nature of the data and the prediction you are trying to make, but as a simple rule to start with, your training data should be roughly 10X the number of your model parameters. For instance, while training a logistic regression with N features, try to start with 10N training instances.
Synthetic data is used mostly when there is not enough real data, or there is not enough real data for specific patterns you know about. Its usage is mostly the same for training and testing datasets. Synthetic Minority Over-sampling Technique (SMOTE) and Modified-SMOTE are two techniques which generate synthetic data.
There's a field of machine learning called Inductive Logic Programming that deals exclusively with relational data. In your case, if you wish to use a neural network, you would want to transform your relational data set to a propositional data set (single table) - i.e., a table with a fixed number of attributes that can be fed into a neural network or any other propositional learner. These techniques usually construct so-called first-order features, which capture the data from secondary tables. Further, you need to do this only for inducing your learner - once you have the features and the learner, you can evaluate these features for new data points on-the-fly.
Here's an overview paper of some techniques that can be used for such a problem. If you have any further questions, ask away.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With