Best approach to what I think is a machine learning problem [closed]

Tags:

I am wanting some expert guidance here on what the best approach is for me to solve a problem. I have investigated some machine learning, neural networks, and stuff like that. I've investigated weka, some sort of baesian solution.. R.. several different things. I'm not sure how to really proceed, though. Here's my problem.

I have, or will have, a large collection of events.. eventually around 100,000 or so. Each event consists of several (30-50) independent variables, and 1 dependent variable that I care about. Some independent variables are more important than others in determining the dependent variable's value. And, these events are time relevant. Things that occur today are more important than events that occurred 10 years ago.

I'd like to be able to feed some sort of learning engine an event, and have it predict the dependent variable. Then, knowing the real answer for the dependent variable for this event (and all the events that have come along before), I'd like for that to train subsequent guesses.

Once I have an idea of what programming direction to go, I can do the research and figure out how to turn my idea into code. But my background is in parallel programming and not stuff like this, so I'd love to have some suggestions and guidance on this.

Thanks!

Edit: Here's a bit more detail about the problem that I'm trying to solve: It's a pricing problem. Let's say that I'm wanting to predict prices for a random comic book. Price is the only thing I care about. But there are lots of independent variables one could come up with. Is it a Superman comic, or a Hello Kitty comic. How old is it? What's the condition? etc etc. After training for a while, I want to be able to give it information about a comic book I might be considering, and have it give me a reasonable expected value for the comic book. OK. So comic books might be a bogus example. But you get the general idea. So far, from the answers, I'm doing some research on Support vector machines and Naive Bayes. Thanks for all of your help so far.

651

asked Feb 07 '09 00:02

Kirby

1 Answers

Sounds like you're a candidate for Support Vector Machines.

Go get libsvm. Read "A practical guide to SVM classification", which they distribute, and is short.

Basically, you're going to take your events, and format them like:

dv1 1:iv1_1 2:iv1_2 3:iv1_3 4:iv1_4 ...
dv2 1:iv2_1 2:iv2_2 3:iv2_3 4:iv2_4 ...

run it through their svm-scale utility, and then use their grid.py script to search for appropriate kernel parameters. The learning algorithm should be able to figure out differing importance of variables, though you might be able to weight things as well. If you think time will be useful, just add time as another independent variable (feature) for the training algorithm to use.

If libsvm can't quite get the accuracy you'd like, consider stepping up to SVMlight. Only ever so slightly harder to deal with, and a lot more options.

Bishop's Pattern Recognition and Machine Learning is probably the first textbook to look to for details on what libsvm and SVMlight are actually doing with your data.

145

answered Sep 25 '22 21:09

Jay Kominek

Related questions
                            
                                Architecture & Essential Components of StumbleUpon's Recommendation Engine
                            
                                Advantages of SVM over decion trees and AdaBoost algorithm
                            
                                What FFT descriptors should be used as feature to implement classification or clustering algorithm?
                            
                                roc curve with sklearn [python]
                            
                                SKLearn how to get decision probabilities for LinearSVC classifier
                            
                                What does the capital letter 'J' mean in cost function J(θ)?
                            
                                ROC curve for binary classification in python
                            
                                Combining heuristics when ranking social network news feed items
                            
                                Why is Keras LSTM on CPU three times faster than GPU?
                            
                                What is the Search/Prediction Time Complexity of Logistic Regression?
                            
                                Resume Training tf.keras Tensorboard
                            
                                Large Scale Image Classifier
                            
                                Are decision trees (e.g. C4.5) considered nonparametric learning?
                            
                                Why won't Perceptron Learning Algorithm converge?
                            
                                Calculating AUC when using Vowpal Wabbit
                            
                                Should I normalize my features before throwing them into RNN?
                            
                                Variation in BLEU Score
                            
                                How to handle log(0) when using cross entropy
                            
                                How can I use tf.keras.Model.summary to see the layers of a child model which in a father model?
                            
                                PyTorch: Learning rate scheduler

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Best approach to what I think is a machine learning problem [closed]

Tags:

machine-learning

neural-network

classification

modeling

regression

Kirby

People also ask

1 Answers

Jay Kominek

Recent Activity

Donate For Us