I have two collections (they happen to be arrays, but it doesn't really matter, I think): L
and R
. They are both sorted and now I want to compare them. I want to end up with two collections: one for each input array containing the items which were not in the other.
I could just take the first item from L
and then search R
and, if there isn't a match, add it to my "unique" collection (Lu
). But that's extremely inefficient, and I am expecting to have some very large collections to process in the near future.
I though about possibly "playing hopscotch":
Step 1: Take two lists, L
and R
, and compare the head of each list ( l :: L
and r :: R
):
Branch 1: if l
< r
, then add l
to Lu
and recurse, passing in L
and r :: R
Branch 2: if l
> r
, then add r
to Ru
and recurse, passing in l :: L
and R
Branch 3: if l
= r
, then recurse, passing in L
and R
Step 2: return Lu
and Ru
I can write this function, but before I put in the effort I was wondering if a function already exists which can do this for me. It seems like a not-to-uncommon scenario, and I'd always rather use an existing solution to rolling my own.
(Also, if there's a more recognizable name for this algorithm, I'd like to know what it's called.)
counter() method can be used to compare lists efficiently. The counter() function counts the frequency of the items in a list and stores the data as a dictionary in the format <value>:<frequency>. If two lists have the exact same dictionary output, we can infer that the lists are the same.
Select both columns of data that you want to compare. On the Home tab, in the Styles grouping, under the Conditional Formatting drop down choose Highlight Cells Rules, then Duplicate Values. On the Duplicate Values dialog box select the colors you want and click OK. Notice Unique is also a choice.
Navigate to the "Home" option and select duplicate values in the toolbar. Next, navigate to Conditional Formatting in Excel Option. A new window will appear on the screen with options to select "Duplicate" and "Unique" values. You can compare the two columns with matching values or unique values.
(I wrote the question above about 2 hours ago. Since then, I found the answer on my own. The following is what I discovered.)
In set theory, the "list" of items in L but not in R is known as "the relative complement of R in L", also known as "set-theoretic difference of L and R"
(See Wikipedia's Complement (set theory) article)
F#, being a mathematical language, has this concept baked right in to it's Core library. First, you need to build your collections as sets:
// example arrays:
let arr1 = [| 1; 2; 3 |]
let arr2 = [| 2; 3; 4 |]
// build the L and R sets
let L = set arr1
let R = set arr2
Now you can call the "difference" function and quickly get the relative complement for each array:
let Lu = Set.difference L R |> Set.toArray
let Ru = Set.difference R L |> Set.toArray
> val Lu : int [] = [|1|]
> val Ru : int [] = [|4|]
There's also a shorter syntax. The Set type has overloaded the minus operator. Set.difference
just subtracts the second parameter from the first, so you can actually just use the following:
let Lu = L - R |> Set.toArray
let Ru = R - L |> Set.toArray
> val Lu : int [] = [|1|]
> val Ru : int [] = [|4|]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With