How to approach Machine Learning problems with dynamically sized input collection?

Tags:

I'm approaching a problem trying to classify a data sample as good or bad quality with machine learning.

The data sample is stored in a relational database. A sample contains the attributes id, name, number of up-votes (for good/bad quality indication), number of comments, etc. Also there is a table that has items with foreign keys pointing to a data sample id. The items contain a weight and a name. All items together pointing to a data sample characterizes the data sample, which typically could help classify the data sample. The problem is, that the number of items pointing to one foreign key is different for different samples.

I want to feed the Machine Learning input, of e.g. a neural network, with the items that point to a specific data sample. The problem is that I don't know the number of items, so I don't know how many input nodes I want.

Q1) Is it possible to use neural networks when the input dimension is dynamic? If so, how?

Q2) Are there any best practices for feeding a network with a list of tuples, when the length of the list is unknown?

Q3) Are there any best practices for applying machine learning to relational databases?

857

asked Dec 06 '12 10:12

user822448

1 Answers

There's a field of machine learning called Inductive Logic Programming that deals exclusively with relational data. In your case, if you wish to use a neural network, you would want to transform your relational data set to a propositional data set (single table) - i.e., a table with a fixed number of attributes that can be fed into a neural network or any other propositional learner. These techniques usually construct so-called first-order features, which capture the data from secondary tables. Further, you need to do this only for inducing your learner - once you have the features and the learner, you can evaluate these features for new data points on-the-fly.

Here's an overview paper of some techniques that can be used for such a problem. If you have any further questions, ask away.

answered Sep 30 '22 04:09

tempi

Related questions
                            
                                How to choose the window size of CNN in deep learning?
                            
                                Cross entropy loss suddenly increases to infinity
                            
                                Homogeneous vs heterogeneous ensembles
                            
                                std::function has performances issues, how to avoid it?
                            
                                How does shuffling work with ImageDataGenerator in Machine Learning?
                            
                                How to model a shared layer in keras?
                            
                                sigmoid_cross_entropy loss function from tensorflow for image segmentation
                            
                                definition of error rate in classification and why some researchers use error rate instead of accuracy
                            
                                Column-dependent bounds in torch.clamp
                            
                                PyTorch LSTM input dimension
                            
                                Are the k-fold cross-validation scores from scikit-learn's `cross_val_score` and `GridsearchCV` biased if we include transformers in the pipeline?
                            
                                FastAi What does the slice(lr) do in fit_one_cycle()
                            
                                Implementing a trainable generalized Bump function layer in Keras/Tensorflow
                            
                                Sequence to Sequence - for time series prediction
                            
                                How to design a neural network to predict arrays from arrays
                            
                                Neural network in MATLAB
                            
                                Can k-means fall into an infinite loop ?
                            
                                NLTK/NLP buliding a many-to-many/multi-label subject classifier
                            
                                10*10 fold cross validation in scikit-learn?
                            
                                Disease named entity recognition

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to approach Machine Learning problems with dynamically sized input collection?

Tags:

relational-database

machine-learning

neural-network

data-mining

feature-extraction

user822448

People also ask

1 Answers

tempi

Recent Activity

Donate For Us