Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Representing Ternary Plot Data For Lookups

Soil Types

I am trying to represent the data on this ternary graph for lookup. I would like to be able to feed all three values into a function and get back which area category these values fall into. The approach I am looking at is translating each area into a polygon using Cartesian coordinates. Then translating the supplied values into a point and checking if that point lies inside which polygon. I am pretty sure that will work but it just seems like an overkill for a simple lookup?

I am asking if there is a way to represent the data on this graph in an array or an object so that a simple lookup would be enough?

like image 295
Lukasz Avatar asked Sep 20 '12 19:09

Lukasz


People also ask

How to read and extract data from a ternary plot?

PlotDigitizer can read and extract data from an equilateral triangle, a right angle triangle, or any scalene triangle. You only have to upload the ternary plot in an image format, like JPG, PNG, GIF, and then the following simple steps:

What is a ternary plot?

Ternary plots are a projection of that triangle into 2D space. Let’s look at that same chart projected (courtesy of ggtern ). But how do we read ternary plots of these plots?

What is a ternary diagram?

Ternary diagrams or ternary plots are very common in chemical engineering, physical chemistry, metallurgy, genetics (de Finetti diagram), and other physical sciences. They are mainly used to study compositional data for a mixture of three different substances. Below is an example of a typical ternary diagram.

What is a ternary search tree?

A ternary search tree is a special trie data structure where the child nodes of a standard trie are ordered as a binary search tree. 1. The left pointer points to the node whose value is less than the value in the current node.


2 Answers

You could translate the above into the Cartesian coordinate system, but there I think you're missing the important aspect of what it would take to actually do this.

In the above diagram, you are looking at the polygons as two-dimensional structures, but you have a three-dimensional point. It's not impossible, after all, we see the triangle on a two-dimensional plane above, but the translation into Cartesian space isn't exactly straightforward.

Rather, this looks very much like a graphical representation of a decision tree. It looks like readings for the amount of clay, silt and sand were taken to develop this representation.

Basically, you have a tuple of three values, and you want to get a classification depending on those values.

In the case above, you can model the decision tree for the classification "silt" as:

// These are approximate based on above.  Each branch of the
// tree is evaluated on one value out of all the values.
if (silt >= .8)
{
    // True case.  Check sand content.
    if (sand >= .20)
    {
        // Something else, other branches.
    }
    else
    {
        // False case, can possibly be silt.
        if (clay >= .125)
        {
            // Something else, other branches.
        }
        else
        {
            // Leaf, this is a classification.
            // Can return more strongly typed classification if you want.
            return "silt";
        }            
    }

}
else
{
    // Something else, other branches.
}

Each if/else statement represents a branch in the decision tree. At each branch, you'll want to evaluate value of the variable that at that point in the branch gives you the most information gain (pretty much what it sounds like, how many classifications can you make off that split?) which is based on the entropy (or uncertainty) that making a decision on that variable has.

The trees can be generated automatically or, you can manually code it yourself. While the latter is possible, I strongly suggest an automated/code-based method to develop it. I'd strongly suggest taking a look at Accord.NET (which requires AForge.NET, both are excellent). To get you started, you should take a look at this blog post showing how to create decision trees using Accord.NET.

Whichever way you go, you'll ultimately get a function that takes the three values (clay, silt, and sand) and returns your classification, traversing the tree depending on the values of each.

Note that you aren't necessarily going to have a one-to-one set of criteria (branches) that will map to each classification (as seen in the first code sample). Depending on the number of vertices on the polygons in the example above, you will need additional branches to handle those situations.

If you have the original sample data, then you just have to run the sample data through your decision tree builder and it should create a decision tree like the one above.

If you don't have the original sample data, you can create it using the vertices from above and classifying those. For example:

silt    sand    clay    classification
----    ----    ----    --------------
   0      50     100    clay (top point)
 100       0      50    silt (right bottom point)
  50     100       0    sand (left bottom point)
  15      45      40    sandy clay OR clay cloam OR clay (depending on splits)
...

Regarding the last row (and subsequent ones), the decision tree will set a boundary based on those values, and because it's continuous it will usually make a decision based on all values greater than or equal to that value.

like image 179
casperOne Avatar answered Oct 05 '22 22:10

casperOne


The way to do this is as you have suggested, using the 'point in polygon approach', I mean, when you do this visually, that is exactly what you are doing in your head. The problem with the accepted answer, is that whilst in this case (and simple cases) you might be able to build a decision tree, for a situation where there are many more categories, this process would become inextricably more complicated IMHO. Perhaps categories overlap, perhaps categories are only a single point, the process that I propose here, these artefacts would have no consequence to the result.

I have already covered HERE, the production of the USDA soil classification diagram, the result I attach below, based off the dataset provided in the ggtern package.

enter image description here

But what may not be so clear, is that the ggtern package has some functions that make this a little less cumbersome than it has to be. Specifically, there are some internal functions in the ggtern package (routinely used in the backend) to make the transformations necessary in order to evaluate the point in polygon truth table for each point against the reference categories.

Such approach is fairly straight forward with the ddply function from the plyr package, and, the point.in.polygon function, from the sp package.

First let us load the necessary packages, and load the USDA data from ggtern. Let us also create some sample data, testing this process for a point lying on a vertice, and a point lying in the centre of a classification region!.

library(ggtern)
library(sp)
library(plyr)

#The Main Data to lookup against
data(USDA)

#The sample Data (Try a point at a vertice, and a point in the middle...)
testData = rbind(data.frame(Clay=.4,Sand=.2,Silt=.4), #Vertice point
                 data.frame(Clay=1,Sand=1,Silt=1)/3)  #Simple middle point

I then suggest to use the internal function, transform_tern_to_cart(...), to convert both datasets to cartesian coordinates.

#Do the transformation to cartesian
USDA[,c("x","y")]     = transform_tern_to_cart(USDA$Clay,USDA$Sand,USDA$Silt)
testData[,c("x","y")] = transform_tern_to_cart(testData$Clay,testData$Sand,testData$Silt)

By using a combination of ddply(...) and apply(...), we can then test each point in the test set, against each category in the reference set via use of the point.in.polygon(...) function.

#Create a function to do the lookup
lookup <- function(data=testData,lookupdata=USDA,groupedby="Label"){  
  if(!groupedby %in% colnames(lookupdata))
    stop("Groupedby value is not a column of the lookupdata")

  #For each row in the data
  outer = apply(data[,c("x","y")],1,function(row){

    #for each groupedby in the lookupdata
    inner = ddply(lookupdata,groupedby,function(df){
      if(point.in.polygon(row[1],row[2],df$x,df$y) > 0) #Is in polygon?
        return(df) #Return a valid dataframe
      else
        return(NULL) #Return nothing
    })

    #Extract the groupedby data from the table
    inner = unique(inner[,which(colnames(inner) == groupedby)])

    #Join together in csv string and return to 'outer'
    return(paste(as.character(inner),collapse=","))
  })

  #Combine with the original data and return
  return(cbind(data,Lookups=outer))
}

Which can then be called in the following manner:

#Execute
lookup()

You will notice that the first point satisfies four (4) categories, and the second, only one (1), as to be expected.

like image 34
Nicholas Hamilton Avatar answered Oct 05 '22 22:10

Nicholas Hamilton