I'm trying to compute the convert a R script to java using the apache.commons.math library. Can I use org.apache.commons.math.analysis.interpolation.LoessInterpolator in place of R loess ? I cannot get the same result.
EDIT.
here is a java program that creates a random array(x,y) and compute the loess with LoessInterpolator or by calling R. At the end, the results are printed.
import java.io.*;
import java.util.Random;
import org.apache.commons.math.analysis.interpolation.LoessInterpolator;
public class TestLoess
{
private String RScript="/usr/local/bin/Rscript";
private static class ConsummeInputStream
extends Thread
{
private InputStream in;
ConsummeInputStream(InputStream in)
{
this.in=in;
}
@Override
public void run()
{
try
{
int c;
while((c=this.in.read())!=-1)
System.err.print((char)c);
}
catch(IOException err)
{
err.printStackTrace();
}
}
}
TestLoess()
{
}
private void run() throws Exception
{
int num=100;
Random rand=new Random(0L);
double x[]=new double[num];
double y[]=new double[x.length];
for(int i=0;i< x.length;++i)
{
x[i]=rand.nextDouble()+(i>0?x[i-1]:0);
y[i]=Math.sin(i)*100;
}
LoessInterpolator loessInterpolator=new LoessInterpolator(
0.75,//bandwidth,
2//robustnessIters
);
double y2[]=loessInterpolator.smooth(x, y);
Process proc=Runtime.getRuntime().exec(
new String[]{RScript,"-"}
);
ConsummeInputStream errIn=new ConsummeInputStream(proc.getErrorStream());
BufferedReader stdin=new BufferedReader(new InputStreamReader(proc.getInputStream()));
PrintStream out=new PrintStream(proc.getOutputStream());
errIn.start();
out.print("T<-as.data.frame(matrix(c(");
for(int i=0;i< x.length;++i)
{
if(i>0) out.print(',');
out.print(x[i]+","+y[i]);
}
out.println("),ncol=2,byrow=TRUE))");
out.println("colnames(T)<-c('x','y')");
out.println("T2<-loess(y ~ x, T)");
out.println("write.table(residuals(T2),'',col.names= F,row.names=F,sep='\\t')");
out.flush();
out.close();
double y3[]=new double[x.length];
for(int i=0;i< y3.length;++i)
{
y3[i]=Double.parseDouble(stdin.readLine());
}
System.out.println("X\tY\tY.java\tY.R");
for(int i=0;i< y3.length;++i)
{
System.out.println(""+x[i]+"\t"+y[i]+"\t"+y2[i]+"\t"+y3[i]);
}
}
public static void main(String[] args)
throws Exception
{
new TestLoess().run();
}
}
compilation & exec:
javac -cp commons-math-2.2.jar TestLoess.java && java -cp commons-math-2.2.jar:. TestLoess
output:
X Y Y.java Y.R
0.730967787376657 0.0 6.624884763714674 -12.5936186703287
0.9715042030481429 84.14709848078965 6.5263049649584 71.9725380029913
1.6089216283982513 90.92974268256818 6.269100654071115 79.839773167581
2.159358633515885 14.112000805986721 6.051308261720918 3.9270340708818
2.756903911313087 -75.68024953079282 5.818424835586378 -84.9176311089431
3.090122310789737 -95.89242746631385 5.689740879461759 -104.617807889069
3.4753114955304554 -27.941549819892586 5.541837854229562 -36.0902352062634
4.460153035730264 65.6986598718789 5.168028655980764 58.9472823439219
5.339335553602744 98.93582466233818 4.840314399516663 93.3329030534449
6.280584733084859 41.21184852417566 4.49531113985498 36.7282165788057
6.555538699120343 -54.40211108893698 4.395343460231256 -58.5812856445538
6.68443584999412 -99.99902065507035 4.348559404444451 -104.039069260889
6.831037507640638 -53.657291800043495 4.295400167908642 -57.5419313320511
6.854275630124528 42.016703682664094 4.286978656933373 38.1564179414478
7.401015387322993 99.06073556948704 4.089252482141094 95.7504087842369
8.365502247999844 65.02878401571168 3.7422883733498726 62.5865641279576
8.469992934250815 -28.790331666506532 3.704793544880599 -31.145867173504
9.095139297716374 -96.13974918795569 3.4805388562453574 -98.0047896609079
9.505935493207435 -75.09872467716761 3.3330472034239405 -76.6664588290508
the output values for y are clearly not the same between R and Java; TheY.R column looks good (it's close to the original Y column). How should I change this in order to get Y.java ~ Y.R ?
You need to change the default values of three input parameters to make the Java and R versions identical:
The Java LoessInterpolator only does linear local polynomial regression, but R supports linear (degree=1), quadratic (degree=2), and a strange degree=0 option. So you need to specify degree=1
in R to be identical to Java.
LoessInterpolator defaults number of iterations DEFAULT_ROBUSTNESS_ITERS=2
, but R defaults iterations=4
. So you need to set control = loess.control(iterations=X)
in R (X is the number of iterations).
LoessInterpolator defaults DEFAULT_BANDWIDTH=0.3
but R defaults span=0.75
.
I can't speak for the java implementation, but lowess
has a number of parameters which control the bandwidth of the fit. Unless you're fitting with the same control parameters you should expect the results to differ. My recommendation whenever people are smoothing data is to plot the original data as well as the fit, and decide for yourself what control parameters yield your desired tradeoff between fidelity to the data and smoothing (aka noise removal).
There are two problems here. First if you plot the data you are generating it looks almost random and the fit generated by loess in R is very poor e.g.
plot(T$x, T$y)
lines(T$s, T2$fitted, col="blue", lwd=3)
Then in your R script you are writing the residuals not the predictions so in this line
out.println("write.table(residuals(T2),'',
col.names= F,row.names=F,sep='\\t')");
you need to change residuals(T2)
to predict(T2)
e.g.
out.println("write.table(predict(T2),'',
col.names= F,row.names=F,sep='\\t')");
So it was pure chance in your code example that the first couple of lines of residuals generated by R looked a good fit.
For me if I try fitting with some more appropriate data then Java and R do return similar but not identical results. Also I found the results were closer if I did not adjust the default robustnessIter settings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With