Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the optimal way to match list entries after rounding in Mathematica?

I have two lists in Mathematica:

list1 = {{a1, b1, c1}, ... , {an, bn, cn}} 

and

list2 = {{d1, e1, f1}, ... , {dn, en, fn}}

the lists contain numerical results and are roughly consisting of 50000 triplets each. Each triplet represents two coordinates and a numerical value of some property at these coordinates. Each list has different length and the coordinates are not quite the same range. My intention is to correlate the numerical values of the third property from each list so I need to scan through the lists and identify the properties whose coordinates are matching. My output will be something like

list3 = {{ci, fj}, ... , {cl, fm}}

where

{ai, bi}, ..., {al, bl}

will be (roughly) equal to, respectively

{dj, ej}, ..., {dm, em}

By "roughly" I mean the coordinates will match once rounded to some desired accuracy:

list1(2) = Round[{#[[1]], #[[2]], #[[3]]}, {1000, 500, 0.1}] & /@ list1(2)

so after this process I's have two lists that contain some matching coordinates amongst them. My question is how to perform the operation of identifying them and picking out the pairs of properties in the optimal way?

An example of a 6 element list would be

list1 = {{-1.16371*10^6, 548315., 14903.}, {-1.16371*10^6, 548322., 14903.9}, 
   {-1.16371*10^6, 548330., 14904.2}, {-1.16371*10^6, 548337., 14904.8}, 
   {-1.16371*10^6, 548345., 14905.5}, {-1.16371*10^6, 548352., 14911.5}}
like image 737
gpap Avatar asked Nov 16 '11 14:11

gpap


People also ask

How do you increase precision in Mathematica?

When SetPrecision is used to increase the precision of a number, the number is padded with zeros. The zeros are taken to be in base 2. In base 10, the additional digits are usually not zeros. SetPrecision returns an arbitrary‐precision number, even if the precision requested is less than $MachinePrecision.

What does aspect ratio do in Mathematica?

AspectRatio (Built-in Mathematica Symbol) AspectRatio is an option for Graphics and related functions that specifies the ratio of height to width for a plot.

What does conditional expression mean in Mathematica?

The ConditionalExpression has a value only when the condition evaluates to True . If the condition evaluates to False then the expression is undefined.


2 Answers

You may want to use something like this:

{Round[{#, #2}], #3} & @@@ Join[list1, list2];

% ~GatherBy~ First ~Select~ (Length@# > 1 &)

This will group all data points that having matching coordinates after rounding. You can use a second argument to Round to specify the fraction to round by.

This assumes that there are not duplicated points within a single list. If there are, you will need to remove those to get useful pairs. Tell me if this is the case and I will update my answer.

Here is another method using Sow and Reap. The same caveats apply. Both of these examples are simply guidelines for how you may implement your functionality.

Reap[
  Sow[#3, {Round[{#, #2}]}] & @@@ Join[list1, list2],
  _,
  List
][[2]] ~Cases~ {_, {_, __}}

To deal with duplicate-after-round elements within each list, you may use Round and GatherBy on each list as follows.

newList1 = GatherBy[{Round[{#, #2}], #3} & @@@ list1, First][[All, 1]];

newList2 = GatherBy[{Round[{#, #2}], #3} & @@@ list2, First][[All, 1]];

and then proceed with:

newList1 ~Join~ newList2 ~GatherBy~ First ~Select~ (Length@# > 1 &)
like image 56
Mr.Wizard Avatar answered Oct 14 '22 03:10

Mr.Wizard


Here's my approach, relying on Nearest to match the points.

Let's assume that list1 doesn't have fewer elements than list2. (Otherwise you can swap them using {list1, list2} = {list2, list1})

(* extract points *)

points1=list1[[All,{1,2}]];
points2=list2[[All,{1,2}]];

(* build a "nearest-function" for matching them *)

nf=Nearest[points1]

(* two points match only if they're closer than threshold *)
threshold=100;

(* This function will find the match of a point from points2 in points1.  
   If there's no match, the point is discarded using Sequence[]. *)
match[point_]:= 
   With[{m=First@nf[point]}, 
       If[Norm[m-point]<threshold, {m,point}, Unevaluated@Sequence[]]
   ]

(* find matching point-pairs *)
matches=match/@points1;

(* build hash tables to retrieve the properties associated with points quickly *)
Clear[values1,values2]
Set[values1[{#1,#2}],#3]&@@@list1;
Set[values2[{#1,#2}],#3]&@@@list2;

(* get the property-pairs *)
{values1[#1],values2[#2]}&@@@matches

An altrenative is to use a custom DistanceFunction in nearest to avoid the use of values1 & values2, and have a shorter program. This may be slower or faster, I didn't test this with large data at all.

Note: How complicated the implementation needs to be really depends on your particular dataset. Does each point from the first set have a match in the second one? Are there any duplicates? How close can points from the same dataset be? Etc. I tried to provide something which can be tweaked to be relatively robust, at the cost of having longer code.

like image 42
Szabolcs Avatar answered Oct 14 '22 02:10

Szabolcs