I'm learning Mahout and reading "Mahout in Action".
When I tried to run the sample code in chapter7 SimpleKMeansClustering.java, an exception popped up:
Exception in thread "main" java.io.IOException: wrong value class: 0.0: null is not class org.apache.mahout.clustering.WeightedPropertyVectorWritable at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1874) at SimpleKMeansClustering.main(SimpleKMeansClustering.java:95)
I successed this code on mahout-0.5, but on mahout-0.6 I saw this exception. Even I changed directory name from clusters-0 to clusters-0-final, I'm still facing this exception.
KMeansDriver.run(conf, vectors, new Path(canopyCentroids, "clusters-0-final"), clusterOutput, new TanimotoDistanceMeasure(), 0.01, 20, true, false);//First, I changed this path.
SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path("output/clusters/clusteredPoints/part-m-00000"), conf);//I double checked this folder and filename.
IntWritable key = new IntWritable();
WeightedVectorWritable value = new WeightedVectorWritable();
int i=0;
while(reader.next(key, value)) {
System.out.println(value.toString() + " belongs to cluster " + key.toString());
i++;
}
System.out.println(i);
reader.close();
Does anyone have any idea about this exception? I have been trying to solve it for a long time and haven't got any idea. And there are few sources on the internet.
Thanks in advance
In order to make this example work in Mahout 0.6, add
import org.apache.mahout.clustering.WeightedPropertyVectorWritable;
to the imports and replace the line:
WeightedVectorWritable value = new WeightedVectorWritable();
by
WeightedPropertyVectorWritable value = new WeightedPropertyVectorWritable();
This happens because the Mahout 0.6 code writes the clustering output values in the new type WeightedPropertyVectorWritable.
To whom it may concern, here is a working MiA sample for mahout 0.9 :
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
import org.apache.mahout.clustering.Cluster;
import org.apache.mahout.clustering.classify.WeightedPropertyVectorWritable;
import org.apache.mahout.clustering.kmeans.KMeansDriver;
import org.apache.mahout.clustering.kmeans.Kluster;
import org.apache.mahout.common.distance.EuclideanDistanceMeasure;
import org.apache.mahout.math.RandomAccessSparseVector;
import org.apache.mahout.math.Vector;
import org.apache.mahout.math.VectorWritable;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class SimpleKMeansClustering {
public static final double[][] points = {
{1, 1}, {2, 1}, {1, 2},
{2, 2}, {3, 3}, {8, 8},
{9, 8}, {8, 9}, {9, 9}};
public static void writePointsToFile(List<Vector> points,
String fileName,
FileSystem fs,
Configuration conf) throws IOException {
Path path = new Path(fileName);
SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf,
path, LongWritable.class, VectorWritable.class);
long recNum = 0;
VectorWritable vec = new VectorWritable();
for (Vector point : points) {
vec.set(point);
writer.append(new LongWritable(recNum++), vec);
}
writer.close();
}
public static List<Vector> getPoints(double[][] raw) {
List<Vector> points = new ArrayList<Vector>();
for (int i = 0; i < raw.length; i++) {
double[] fr = raw[i];
Vector vec = new RandomAccessSparseVector(fr.length);
vec.assign(fr);
points.add(vec);
}
return points;
}
public static void main(String args[]) throws Exception {
int k = 2;
List<Vector> vectors = getPoints(points);
File testData = new File("clustering/testdata");
if (!testData.exists()) {
testData.mkdir();
}
testData = new File("clustering/testdata/points");
if (!testData.exists()) {
testData.mkdir();
}
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
writePointsToFile(vectors, "clustering/testdata/points/file1", fs, conf);
Path path = new Path("clustering/testdata/clusters/part-00000");
SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, path, Text.class, Kluster.class);
for (int i = 0; i < k; i++) {
Vector vec = vectors.get(i);
Kluster cluster = new Kluster(vec, i, new EuclideanDistanceMeasure());
writer.append(new Text(cluster.getIdentifier()), cluster);
}
writer.close();
KMeansDriver.run(conf,
new Path("clustering/testdata/points"),
new Path("clustering/testdata/clusters"),
new Path("clustering/output"),
0.001,
10,
true,
0,
true);
SequenceFile.Reader reader = new SequenceFile.Reader(fs,
new Path("clustering/output/" + Cluster.CLUSTERED_POINTS_DIR + "/part-m-0"), conf);
IntWritable key = new IntWritable();
WeightedPropertyVectorWritable value = new WeightedPropertyVectorWritable();
while (reader.next(key, value)) {
System.out.println(value.toString() + " belongs to cluster " + key.toString());
}
reader.close();
}
}
The example in the book works fine for mahout 05 with the following small changes:
(1) set the paths correctly:
KMeansDriver.run(conf, new Path("testdata/points"), new Path("testdata/clusters"), new Path("testdata/output"), new EuclideanDistanceMeasure(), 0.001, 10, true, false);
and
SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path("testdata/output/clusteredPoints/part-m-0"), conf);
(2) also if you do not have HADOOP installed then you need to change the last parameter of the KMeansDriver.run() call from 'false' to 'true'.
KMeansDriver.run(conf, new Path("testdata/points"), new Path("testdata/clusters"), new Path("testdata/output"), new EuclideanDistanceMeasure(), 0.001, 10, true, true);
Then the example works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With