Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

High volume SVM (machine learning) system

I working on a possible machine learning project that would be expected to do high speed computations for machine learning using SVM (support vector machines) and possibly some ANN.

I'm resonably comfortable working on matlab with these, but primarly in small datasets, just for experimentation. I'm wondering if this matlab based approach will scale? or should i be looking into something else? C++ / gpu based computing? java wrapping of the matlab code and pushing it onto app engine?

Incidentally, there seems to be a lot fo literature on GPUs, but not much on how useful they are on machine learning applications using matlab, & the cheapest CUDA enlabled GPU money can buy? is it even worth the trouble?

like image 789
malangi Avatar asked Mar 05 '10 01:03

malangi


1 Answers

I work on Pattern Recognition problems. Let me please to give you some advices if you plan to work effectively on SVM/ANN problems and if you realy don't have access to a computer cluster:

1) Don't use Matlab. Use Python and its large number of numerical libraries instead for Visualisation/Analysis of your computations.
2) Critical sections better to implement using C. You can integrate them then with your Python scripts very easy .
3) CUDA/GPU is not a solution if you mostly deal with non-polinomial time complexity problems which is typical in Machine Learning, so it brings no great speed-up; dot/matrix products are only a tiny part of SVM calculations - you still will have to deal with feature extractions and lists/objects processing, try instead to optimize your algorithms and devise effective algorithmic methods. If you need parallelism (e.g. for ANNs), use threads or processes.
4) Use GCC compiler to compile your C program - it will build the very fast executable code. To speed-up numerical computations you can try GCC optimization flags (e.g. Streaming SIMD Extensions)
5) Run your program on any modern CPU under Linux OS.

For realy good performance, use Linux clusters.

like image 105
psihodelia Avatar answered Nov 02 '22 15:11

psihodelia