Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between R.loess and org.apache.commons.math LoessInterpolator

I'm trying to compute the convert a R script to java using the apache.commons.math library. Can I use org.apache.commons.math.analysis.interpolation.LoessInterpolator in place of R loess ? I cannot get the same result.

EDIT.

here is a java program that creates a random array(x,y) and compute the loess with LoessInterpolator or by calling R. At the end, the results are printed.

import java.io.*;
import java.util.Random;

import org.apache.commons.math.analysis.interpolation.LoessInterpolator;


public class TestLoess
    {
    private String RScript="/usr/local/bin/Rscript";
    private static class ConsummeInputStream
        extends Thread
        {
        private InputStream in;
        ConsummeInputStream(InputStream in)
            {
            this.in=in;
            }
        @Override
        public void run()
            {
            try
                {
                int c;
                while((c=this.in.read())!=-1) 
                    System.err.print((char)c);
                }
            catch(IOException err)
                {
                err.printStackTrace();
                }
            }
        }
    TestLoess()
        {

        }
    private void run() throws Exception
        {
        int num=100;
        Random rand=new Random(0L);
        double x[]=new double[num];
        double y[]=new double[x.length];
        for(int i=0;i< x.length;++i)
            {
            x[i]=rand.nextDouble()+(i>0?x[i-1]:0);
            y[i]=Math.sin(i)*100;
            }
        LoessInterpolator loessInterpolator=new LoessInterpolator(
            0.75,//bandwidth,
            2//robustnessIters

            );
        double y2[]=loessInterpolator.smooth(x, y);

        Process proc=Runtime.getRuntime().exec(
            new String[]{RScript,"-"}
            );
        ConsummeInputStream errIn=new ConsummeInputStream(proc.getErrorStream());
        BufferedReader stdin=new BufferedReader(new InputStreamReader(proc.getInputStream()));
        PrintStream out=new PrintStream(proc.getOutputStream());
        errIn.start();
        out.print("T<-as.data.frame(matrix(c(");
        for(int i=0;i< x.length;++i)
            {
            if(i>0) out.print(',');
            out.print(x[i]+","+y[i]);
            }
        out.println("),ncol=2,byrow=TRUE))");
        out.println("colnames(T)<-c('x','y')");
        out.println("T2<-loess(y ~ x, T)");
        out.println("write.table(residuals(T2),'',col.names= F,row.names=F,sep='\\t')");
        out.flush();
        out.close();
        double y3[]=new double[x.length];
        for(int i=0;i< y3.length;++i)
            {
            y3[i]=Double.parseDouble(stdin.readLine());
            }
        System.out.println("X\tY\tY.java\tY.R");
        for(int i=0;i< y3.length;++i)
            {
            System.out.println(""+x[i]+"\t"+y[i]+"\t"+y2[i]+"\t"+y3[i]);
            }
        }

    public static void main(String[] args)
        throws Exception
        {
        new TestLoess().run();
        }
    }

compilation & exec:

javac -cp commons-math-2.2.jar TestLoess.java && java -cp commons-math-2.2.jar:. TestLoess

output:

X   Y   Y.java  Y.R
0.730967787376657   0.0 6.624884763714674   -12.5936186703287
0.9715042030481429  84.14709848078965   6.5263049649584 71.9725380029913
1.6089216283982513  90.92974268256818   6.269100654071115   79.839773167581
2.159358633515885   14.112000805986721  6.051308261720918   3.9270340708818
2.756903911313087   -75.68024953079282  5.818424835586378   -84.9176311089431
3.090122310789737   -95.89242746631385  5.689740879461759   -104.617807889069
3.4753114955304554  -27.941549819892586 5.541837854229562   -36.0902352062634
4.460153035730264   65.6986598718789    5.168028655980764   58.9472823439219
5.339335553602744   98.93582466233818   4.840314399516663   93.3329030534449
6.280584733084859   41.21184852417566   4.49531113985498    36.7282165788057
6.555538699120343   -54.40211108893698  4.395343460231256   -58.5812856445538
6.68443584999412    -99.99902065507035  4.348559404444451   -104.039069260889
6.831037507640638   -53.657291800043495 4.295400167908642   -57.5419313320511
6.854275630124528   42.016703682664094  4.286978656933373   38.1564179414478
7.401015387322993   99.06073556948704   4.089252482141094   95.7504087842369
8.365502247999844   65.02878401571168   3.7422883733498726  62.5865641279576
8.469992934250815   -28.790331666506532 3.704793544880599   -31.145867173504
9.095139297716374   -96.13974918795569  3.4805388562453574  -98.0047896609079
9.505935493207435   -75.09872467716761  3.3330472034239405  -76.6664588290508

the output values for y are clearly not the same between R and Java; TheY.R column looks good (it's close to the original Y column). How should I change this in order to get Y.java ~ Y.R ?

like image 972
Pierre Avatar asked Oct 03 '12 08:10

Pierre


3 Answers

You need to change the default values of three input parameters to make the Java and R versions identical:

  1. The Java LoessInterpolator only does linear local polynomial regression, but R supports linear (degree=1), quadratic (degree=2), and a strange degree=0 option. So you need to specify degree=1 in R to be identical to Java.

  2. LoessInterpolator defaults number of iterations DEFAULT_ROBUSTNESS_ITERS=2, but R defaults iterations=4. So you need to set control = loess.control(iterations=X) in R (X is the number of iterations).

  3. LoessInterpolator defaults DEFAULT_BANDWIDTH=0.3 but R defaults span=0.75.

like image 139
markthechao Avatar answered Oct 23 '22 16:10

markthechao


I can't speak for the java implementation, but lowess has a number of parameters which control the bandwidth of the fit. Unless you're fitting with the same control parameters you should expect the results to differ. My recommendation whenever people are smoothing data is to plot the original data as well as the fit, and decide for yourself what control parameters yield your desired tradeoff between fidelity to the data and smoothing (aka noise removal).

like image 30
Carl Witthoft Avatar answered Oct 23 '22 18:10

Carl Witthoft


There are two problems here. First if you plot the data you are generating it looks almost random and the fit generated by loess in R is very poor e.g.

plot(T$x, T$y)
lines(T$s, T2$fitted, col="blue", lwd=3)

plot of the data generated by the Java code above with a loess fit generated by R

Then in your R script you are writing the residuals not the predictions so in this line

out.println("write.table(residuals(T2),'',
   col.names= F,row.names=F,sep='\\t')");

you need to change residuals(T2) to predict(T2) e.g.

out.println("write.table(predict(T2),'',
   col.names= F,row.names=F,sep='\\t')");

So it was pure chance in your code example that the first couple of lines of residuals generated by R looked a good fit.

For me if I try fitting with some more appropriate data then Java and R do return similar but not identical results. Also I found the results were closer if I did not adjust the default robustnessIter settings.

like image 27
Mark Butler Avatar answered Oct 23 '22 18:10

Mark Butler