Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing two lists for unique items in each

I have two collections (they happen to be arrays, but it doesn't really matter, I think): L and R. They are both sorted and now I want to compare them. I want to end up with two collections: one for each input array containing the items which were not in the other.

I could just take the first item from L and then search R and, if there isn't a match, add it to my "unique" collection (Lu). But that's extremely inefficient, and I am expecting to have some very large collections to process in the near future.

I though about possibly "playing hopscotch":

  • Step 1: Take two lists, L and R, and compare the head of each list ( l :: L and r :: R):

    • Branch 1: if l < r, then add l to Lu and recurse, passing in L and r :: R

    • Branch 2: if l > r, then add r to Ru and recurse, passing in l :: L and R

    • Branch 3: if l = r, then recurse, passing in L and R

  • Step 2: return Lu and Ru

I can write this function, but before I put in the effort I was wondering if a function already exists which can do this for me. It seems like a not-to-uncommon scenario, and I'd always rather use an existing solution to rolling my own.

(Also, if there's a more recognizable name for this algorithm, I'd like to know what it's called.)

like image 808
JDB Avatar asked Jan 27 '14 19:01

JDB


People also ask

How do you compare elements of two different lists?

counter() method can be used to compare lists efficiently. The counter() function counts the frequency of the items in a list and stores the data as a dictionary in the format <value>:<frequency>. If two lists have the exact same dictionary output, we can infer that the lists are the same.

How do I compare two lists in Excel for duplicates?

Select both columns of data that you want to compare. On the Home tab, in the Styles grouping, under the Conditional Formatting drop down choose Highlight Cells Rules, then Duplicate Values. On the Duplicate Values dialog box select the colors you want and click OK. Notice Unique is also a choice.

How do I compare unique values in Excel?

Navigate to the "Home" option and select duplicate values in the toolbar. Next, navigate to Conditional Formatting in Excel Option. A new window will appear on the screen with options to select "Duplicate" and "Unique" values. You can compare the two columns with matching values or unique values.


1 Answers

(I wrote the question above about 2 hours ago. Since then, I found the answer on my own. The following is what I discovered.)

In set theory, the "list" of items in L but not in R is known as "the relative complement of R in L", also known as "set-theoretic difference of L and R"

(See Wikipedia's Complement (set theory) article)

F#, being a mathematical language, has this concept baked right in to it's Core library. First, you need to build your collections as sets:

// example arrays:
let arr1 = [| 1; 2; 3 |]
let arr2 = [| 2; 3; 4 |]

// build the L and R sets
let L = set arr1
let R = set arr2

Now you can call the "difference" function and quickly get the relative complement for each array:

let Lu = Set.difference L R |> Set.toArray
let Ru = Set.difference R L |> Set.toArray
> val Lu : int [] = [|1|]
> val Ru : int [] = [|4|]

There's also a shorter syntax. The Set type has overloaded the minus operator. Set.difference just subtracts the second parameter from the first, so you can actually just use the following:

let Lu = L - R |> Set.toArray
let Ru = R - L |> Set.toArray
> val Lu : int [] = [|1|]
> val Ru : int [] = [|4|]
like image 192
JDB Avatar answered Sep 24 '22 11:09

JDB