Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing elements in a functional style

Tags:

algorithm

f#

I have been struggling with something that looks like a simple algorithm, but can't find a clean way to express it in a functional style so far. Here is an outline of the problem: suppose I have 2 arrays X and Y,

X = [| 1; 2; 2; 3; 3 |]
Y = [| 5; 4; 4; 3; 2; 2 |]

What I want is to retrieve the elements that match, and the unmatched elements, like:

matched = [| 2; 2; 3 |]
unmatched = [| 1; 3 |], [| 4; 4; 5 |]

In pseudo-code, this is how I would think of approaching the problem:

let rec match matches x y =
    let m = find first match from x in y
    if no match, (matches, x, y)
    else
        let x' = remove m from x
        let y' = remove m from y
        let matches' = add m to matches
        match matches' x' y'

The problem I run into is the "remove m from x" part - I can't find a clean way to do this (I have working code, but it's ugly as hell). Is there a nice, idiomatic functional way to approach that problem, either the removal part, or a different way to write the algorithm itself?

like image 403
Mathias Avatar asked Mar 23 '23 15:03

Mathias


2 Answers

This could be solved easily using the right data structures, but in case you wanted to do it manually, here's how I would do it in Haskell. I don't know F# well enough to translate this, but I hope it is similar enough. So, here goes, in (semi-)literate Haskell.

overlap xs ys =

I start by sorting the two sequences to get away from the problem of having to know about previous values.

  go (sort xs) (sort ys)
    where

The two base cases for the recursion are easy enough to handle -- if either list is empty, the result includes the other list in the list of elements that are not overlapping.

      go xs []           = ([], (xs, []))
      go [] ys           = ([], ([], ys))

I then inspect the first elements in each list. If they match, I can be sure that the lists overlap on that element, so I add that to the included elements, and I let the excluded elements be. I continue the search for the rest of the list by recursing on the tails of the lists.

      go (x:xs) (y:ys)
        | x == y = let (  included, excluded)     = go xs ys
                   in  (x:included, excluded)

Then comes the interesting part! What I essentially want to know is if the first element of one of the lists does not exist in the second list – in that case I should add it to the excluded lists and then continue the search.

        | x < y  = let (included, (  xex,   yex)) = go xs (y:ys)
                   in  (included, (x:xex,   yex))
        | y < x  = let (included, (  xex,   yex)) = go (x:xs) ys
                   in  (included, (  xex, y:yex))

And this is actually it. It seems to work for at least the example you gave.

> let (matched, unmatched) = overlap x y
> matched
[2,2,3]
> unmatched
([1,3],[4,4,5])
like image 68
kqr Avatar answered Apr 01 '23 21:04

kqr


It seems that you're describing multiset (bag) and its operations.

If you use the appropriate data structures, operations are very easy to implement:

// Assume that X, Y are initialized bags
let matches = X.IntersectWith(Y)
let x = X.Difference(Y)
let y = Y.Difference(X)

There's no built-in Bag collection in .NET framework. You could use Power Collection library including Bag class where the above function signature is taken.

UPDATE:

You can represent a bag by a weakly ascending list. Here is an improved version of @kqr's answer in F# syntax:

let overlap xs ys =
    let rec loop (matches, ins, outs) xs ys =
        match xs, ys with
        // found a match
        | x::xs', y::ys' when x = y -> loop (x::matches, ins, outs) xs' ys'
        // `x` is smaller than every element in `ys`, put `x` into `ins`
        | x::xs', y::ys' when x < y -> loop (matches, x::ins, outs) xs' ys
        // `y` is smaller than every element in `xs`, put `y` into `outs`
        | x::xs', y::ys' -> loop (matches, ins, y::outs) xs ys'
        // copy remaining elements in `xs` to `ins`
        | x::xs', [] -> loop (matches, x::ins, outs) xs' ys
        // copy remaining elements in `ys` to `outs`
        | [], y::ys' -> loop (matches, ins, y::outs) xs ys'
        | [], [] -> (List.rev matches, List.rev ins, List.rev outs)
    loop ([], [], []) (List.sort xs) (List.sort ys)

After two calls to List.sort, which are probably O(nlogn), finding matches is linear to the sum of the lengths of two lists.

If you need a quick-and-dirty bag module, I would suggest a module signature like this:

type Bag<'T> = Bag of 'T list

module Bag =
    val count : 'T -> Bag<'T> -> int
    val insert : 'T -> Bag<'T> -> Bag<'T>
    val intersect : Bag<'T> -> Bag<'T> -> Bag<'T>
    val union : Bag<'T> -> Bag<'T> -> Bag<'T>
    val difference : Bag<'T> -> Bag<'T> -> Bag<'T>
like image 20
pad Avatar answered Apr 01 '23 20:04

pad