I'm trying to solve an exercise in haskell about alphabets. Given a list of words in a new alphabetical order, find this new alphabet.
For example, given the words ["ab","abd","abc","ba","bd","cc"],
the new possible alphabets are "abdc" or "adbc".
I started by computing all possible alphabetical orders
alfabet :: Eq a => [[a]] -> [[a]]
alfabet list = permutations $ nub $ concat $ list
After this, I think I should filter out those alphabets that are wrong, but I can't seem to pass along enough info. I've tried using the built-in filter function, I've tried to write my own sort function so that when I sort the words by the new order, the resulting list is the same as the input list thus the alphabet is correct. All to no avail.
I guess the biggest problem I have is that I need to be able to work with two lists (the words and the different alphabets) at the same time and loop over them in a different way.
Any tips or help? Thanks
If your lists have always same size then just A == B . Also if your lists don't have the same size, just as == bs tells you if they are equal. @Ingo @mort's "working" solution treats, for example, [1,2,3] and [1,2,3,4] as equal, there (==) would not.
In Haskell, lists are a homogenous data structure. It stores several elements of the same type. That means that we can have a list of integers or a list of characters but we can't have a list that has a few integers and then a few characters. And now, a list!
There's more than one way to approach this. Your suggested way is to generate all possible alphabets on the letters you have, then filter them out by which ones are consistent with the example data. I'll show you a way of doing that first.
Another way would be to distill the information from the example data into some information about what order the letters can come in (mathematicians call this a partial ordering), then extend that into all the possible orderings.
import Data.List (permutations, nub, sort)
I'll use a type synonym Alphabet to make clearer which lists are potential alphabets and which are words, and define an ordering based on an alphabet (byAlphabet), and extend it to apply to a list by lexiographic ordering. 
type Alphabet a = [a]
byAlphabet :: Eq a => Alphabet a -> a -> a -> Ordering
byAlphabet alphabet x y 
    | x == y = EQ 
    | otherwise = if y `elem` (dropWhile (/=x) alphabet) then LT else GT
lexiographic :: (a->a->Ordering) -> [a]->[a]->Ordering
lexiographic cmp [] [] = EQ
lexiographic cmp [] _ = LT
lexiographic cmp _ [] = GT
lexiographic cmp (x:xs) (y:ys) = case cmp x y of
    EQ -> lexiographic cmp xs ys
    x -> x
We need to check whether a given list of words is consistentWith the given data:
consistentWith :: Eq a => [[a]] -> Alphabet a -> Bool
consistentWith xss alphabet = all (/=GT) $ 
       zipWith (lexiographic $ byAlphabet alphabet) xss (tail xss)
You seemed to be struggling with using that across a list of potential alphabets, but did know you can use filter:
anyOKby :: Eq a => [[a]] -> [Alphabet a] -> [Alphabet a]
anyOKby sortedWords = filter (consistentWith sortedWords)
Giving a slightly edited alfabet function, which filters out ones that don't work.
alfabet :: Eq a => [[a]] -> [Alphabet a]
alfabet list = anyOKby list $ permutations $ nub $ concat $ list
example = ["ab","abd","abc","ba","bd","cc"]
This works as expected:
ghci> byAlphabet "abc" 'c' 'a'
GT
ghci> lexiographic (byAlphabet "abc") "ccba" "ccbc"
LT
ghci> consistentWith example "abcd"
False
ghci> consistentWith example "abdc"
True
ghci> alfabet example
["abdc","adbc"]
Now that's a rather slow way because it generates many many potential alphabets then slowly filters them out. The first time I tried, I gave up waiting for alfabet (sort $ words "hello there the their he and at ah eh") to print. 
I'll use a data type to show which characters are before which others, so 'a' :<: 'b' would represent that 'a' has to be before 'b' in the alphabet
data CMP a = a :<: a deriving (Eq,Show)
I'm going to use [CMP a] instead of Maybe (CMP a) just because it's easier to concat than to import Data.Maybe (catMaybes), but each adjacent pair of words can give at most one comparison fact about the alphabet. The facts functions use the nice zipWith f xs (tail xs) trick to use f to make one thing from each adjacent pair in a list.
justTheFirst :: [a] -> [a]
justTheFirst [] = []
justTheFirst (a:_) = [a]
fact :: Eq a => [a] -> [a] -> [CMP a]
fact xs ys = justTheFirst . filter neq $ zipWith (:<:) xs ys where
   neq (a:<:b) = a /= b
facts :: Eq a => [[a]] -> [CMP a]
facts xss = nub . concat $ zipWith fact xss (tail xss)
Examples:
ghci> fact "wellbeing" "wellington"
['b' :<: 'i']
*Main ghci> facts example
['d' :<: 'c','a' :<: 'b','a' :<: 'd','b' :<: 'c']
We'll use a data type to represent a partial ordering - a list of characters and a set of comparisons, and we'll use the facts function to generate the comparisons from sample sorted words, and your nub.concat trick to get the letters themselves:
data Partial a = Partial {chars :: [a], order :: [CMP a]} deriving Show
partial :: Eq a => [[a]] -> Partial a
partial xss = Partial {chars = nub $ concat xss, order = facts xss}
Example:
ghci> partial example
Partial{chars = "abdc",order = ['d' :<: 'c','a' :<: 'b','a' :<: 'd','b' :<: 'c']}
To grow a list of possible alphabets from a partial ordering, we first need to find which elements can go at the front. It's OK to be at the front as long as you're not bigger than anything, so let's make a list of nonBigs. If we put the potential first letter at the front of the alphabet, we can remove it from the remaining partial order:
nonBigs :: Eq a => [CMP a] -> [a] -> [a]
nonBigs lts as = filter (not.big) as where
   big a = a `elem` (map (\ (_ :<: a) -> a) lts)
remove :: Eq a => a -> [CMP a] -> [CMP a]
remove a = filter no_a where
   no_a (x :<: y) = not $ a `elem` [x,y]
Examples: (the only thing that's not bigger than something in the example is 'a', and there are two facts that don't feature 'a')
ghci> facts example
['d' :<: 'c','a' :<: 'b','a' :<: 'd','b' :<: 'c']
ghci> nonBigs (facts example) "abcd"
"a"
ghci> remove 'a' (facts example)
['d' :<: 'c','b' :<: 'c']
Let's pair up the nonBigs with the partial ordering with that letter removed to get all the possible minimal elements and how to carry on from there:
minima :: Eq a => Partial a -> [(a,Partial a)]
minima (Partial as lts) = 
   [(a,Partial (filter (/=a) as) (remove a lts) )|a <- nonBigs lts as]
Example: You have to have 'a' first in the example, but after that you could either have 'b' or 'd':
ghci> minima $ partial example
         [('a',Partial {chars = "bdc", order = ['d' :<: 'c','b' :<: 'c']})]
ghci> minima $ Partial {chars = "bdc", order = ['d' :<: 'c','b' :<: 'c']}
         [('b',Partial {chars = "dc", order = ['d' :<: 'c']}),
          ('d',Partial {chars = "bc", order = ['b' :<: 'c']})]
The complicated bit is growing all the possible treelike paths using the "directed graph" that the partial ordering gives. We'll use a tree growing function f :: input -> [(output,input)] that tells you all the possible ways to carry on. If that doesn't give you any answers we need [[]], a single empty path, which we'll grow recursively by putting possible first elements in front (map (o:)) of each possibility (treePaths f i'):
treePaths :: (input -> [(output,input)]) -> input -> [[output]]
treePaths f i = case f i of 
   [] -> [[]]
   pairs -> concat [map (o:) (treePaths f i') | (o,i') <- pairs]
alphabets list = treePaths minima (partial list)
Example: The alphabets length calculation is allmost instant, but the alfabet length calculation takes more than 2 minutes on my (rather old) laptop; It's quicker to only generate the output you want than to generate every output and discard. 
ghci> alphabets example
["abdc","adbc"]
ghci> length $ alphabets (sort $ words "hello there the their he and at ah eh")
15120
ghci> length $ alfabet (sort $ words "hello there the their he and at ah eh")
15120
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With