Hye there! I just need the help for implementing Naive Bayes Text Classification Algorithm in Java to just test my Data Set for research purposes. It is compulsory to implement the algorithm in Java; rather using Weka or Rapid Miner tools to get the results! <hr> My Data Set has the following type of Data: <pre class="prettyprint"><code> Doc Words Category </code></pre> Means that I have the Training Words and Categories for each training (String) known in advance. Some of the Data Set is given below: <pre class="prettyprint"><code> Doc Words Category Training 1 Integration Communities Process Oriented Structures...(more string) A 2 Integration Communities Process Oriented Structures...(more string) A 3 Theory Upper Bound Routing Estimate global routing...(more string) B 4 Hardware Design Functional Programming Perfect Match...(more string) C . . . Test 5 Methodology Toolkit Integrate Technological Organisational 6 This test contain string naive bayes test text text test </code></pre> <hr> SO the Data Set comes from a MySQL DataBase and it may contain multiple training strings and test strings as well! The thing is I just need to implement Naive Bayes Text Classification Algorithm in Java. The algorithm should follow the following example mentioned here Table 13.1 Source: Read here <hr> The thing is that I can implement the algorithm in Java Code myself but i just need to know if it is possible that there exist some kind a Java library with source code documentation available to allow me to just test the results. The problem is I just need the results for just one time only means its just a test for results. So, come to the point can somebody tell me about any good java library that helps my code this algorithm in Java and that could made my dataset possible to process the results, or can somebody give me any good ideas how to do it easily...something good that can help me. I will be thankful for your help. Thanks in advance

As per your requirement, you can use the Machine learning library MLlib from apache. The MLlib is Spark’s scalable machine learning library consisting of common learning algorithms and utilities. There is also a java code template to implement the algorithm utilizing the library. So to begin with, you can: Implement the java skeleton for the Naive Bayes provided on their site as given below. <pre class="prettyprint"><code>import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.function.Function; import org.apache.spark.api.java.function.PairFunction; import org.apache.spark.mllib.classification.NaiveBayes; import org.apache.spark.mllib.classification.NaiveBayesModel; import org.apache.spark.mllib.regression.LabeledPoint; import scala.Tuple2; JavaRDD<LabeledPoint> training = ... // training set JavaRDD<LabeledPoint> test = ... // test set final NaiveBayesModel model = NaiveBayes.train(training.rdd(), 1.0); JavaPairRDD<Double, Double> predictionAndLabel = test.mapToPair(new PairFunction<LabeledPoint, Double, Double>() { @Override public Tuple2<Double, Double> call(LabeledPoint p) { return new Tuple2<Double, Double>(model.predict(p.features()), p.label()); } }); double accuracy = predictionAndLabel.filter(new Function<Tuple2<Double, Double>, Boolean>() { @Override public Boolean call(Tuple2<Double, Double> pl) { return pl._1().equals(pl._2()); } }).count() / (double) test.count(); </code></pre> For testing your datasets, there is no best solution here than use the Spark SQL. MLlib fits into Spark's APIs perfectly. To start using it, I would recommend you to go through the MLlib API first, implementing the Algorithm according to your needs. This is pretty easy using the library. For the next step to allow the processing of your datasets possible, just use the Spark SQL. I will recommend you to stick to this. I too have hunted down multiple options before settling for this easy to use library and it's seamless support for inter-operations with some other technologies. I would have posted the complete code here to perfectly fit your answer. But I think you are good to go.

Naive Bayes Text Classification Algorithm

Tags:

java

text

algorithm

mysql

dataset

Hye there! I just need the help for implementing Naive Bayes Text Classification Algorithm in Java to just test my Data Set for research purposes. It is compulsory to implement the algorithm in Java; rather using Weka or Rapid Miner tools to get the results!

My Data Set has the following type of Data:

    Doc  Words   Category

Means that I have the Training Words and Categories for each training (String) known in advance. Some of the Data Set is given below:

             Doc      Words                                                              Category        
    Training
               1      Integration Communities Process Oriented Structures...(more string)       A
               2      Integration Communities Process Oriented Structures...(more string)       A
               3      Theory Upper Bound Routing Estimate global routing...(more string)        B
               4      Hardware Design Functional Programming Perfect Match...(more string)      C
               .
               .
               .
    Test
               5      Methodology Toolkit Integrate Technological  Organisational
               6      This test contain string naive bayes test text text test

SO the Data Set comes from a MySQL DataBase and it may contain multiple training strings and test strings as well! The thing is I just need to implement Naive Bayes Text Classification Algorithm in Java.

The algorithm should follow the following example mentioned here Table 13.1

Source: Read here

The thing is that I can implement the algorithm in Java Code myself but i just need to know if it is possible that there exist some kind a Java library with source code documentation available to allow me to just test the results.

The problem is I just need the results for just one time only means its just a test for results.

So, come to the point can somebody tell me about any good java library that helps my code this algorithm in Java and that could made my dataset possible to process the results, or can somebody give me any good ideas how to do it easily...something good that can help me.

I will be thankful for your help. Thanks in advance

604

asked Jan 08 '15 15:01

Java Nerd

1 Answers

As per your requirement, you can use the Machine learning library MLlib from apache. The MLlib is Spark’s scalable machine learning library consisting of common learning algorithms and utilities. There is also a java code template to implement the algorithm utilizing the library. So to begin with, you can:

Implement the java skeleton for the Naive Bayes provided on their site as given below.

import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.mllib.classification.NaiveBayes;
import org.apache.spark.mllib.classification.NaiveBayesModel;
import org.apache.spark.mllib.regression.LabeledPoint;
import scala.Tuple2;

JavaRDD<LabeledPoint> training = ... // training set
JavaRDD<LabeledPoint> test = ... // test set

final NaiveBayesModel model = NaiveBayes.train(training.rdd(), 1.0);

JavaPairRDD<Double, Double> predictionAndLabel = 
  test.mapToPair(new PairFunction<LabeledPoint, Double, Double>() {
    @Override public Tuple2<Double, Double> call(LabeledPoint p) {
      return new Tuple2<Double, Double>(model.predict(p.features()), p.label());
    }
  });
double accuracy = predictionAndLabel.filter(new Function<Tuple2<Double, Double>, Boolean>() {
    @Override public Boolean call(Tuple2<Double, Double> pl) {
      return pl._1().equals(pl._2());
    }
  }).count() / (double) test.count();

For testing your datasets, there is no best solution here than use the Spark SQL. MLlib fits into Spark's APIs perfectly. To start using it, I would recommend you to go through the MLlib API first, implementing the Algorithm according to your needs. This is pretty easy using the library. For the next step to allow the processing of your datasets possible, just use the Spark SQL. I will recommend you to stick to this. I too have hunted down multiple options before settling for this easy to use library and it's seamless support for inter-operations with some other technologies. I would have posted the complete code here to perfectly fit your answer. But I think you are good to go.

144

answered Sep 27 '22 21:09

Anurag

Related questions
                            
                                WebSocket async send can result in blocked send once queue filled
                            
                                What are Shadow Variables in Java? [duplicate]
                            
                                How to change the textcolor of the JavaDoc view in Eclipse
                            
                                How to disable java out-of-date ActiveX control blocking feature in Internet Explorer
                            
                                create a list, add an element and return it to the caller in one statement
                            
                                Enable scripting mode for nashorn in java
                            
                                oracle.jdbc.ReadTimeout vs. Connection.getNetworkTimeout vs. Statement.setQueryTimeout
                            
                                What is the difference between java.util.Date and java.sql.Date? [duplicate]
                            
                                Strange behavior in sun.misc.Unsafe.compareAndSwap measurement via JMH
                            
                                Android - Get result from change default SMS app dialog
                            
                                select query in hibernate with where clause
                            
                                Can eclipse autocomplete on the left side of assignment operator?
                            
                                Return from void function
                            
                                Why does my SOCKS proxy code throw SocketException: Malformed reply from SOCKS server?
                            
                                Can I combine multiple imports in a JSP?
                            
                                Jsoup HTTP POST with payload
                            
                                How to log SOAP messages on client side?
                            
                                How to test a Command Line Interface (CLI)?
                            
                                Invalid default for field warning message coming from Avro?
                            
                                Intellij IDEA: How to auto-import java packages than "import package.SpecificClass"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With