Would you use if/else to write this algorithm in Haskell? Is there a way to express it without them? It's hard to extract functions out of the middle that have meaning. This is just the output of a machine learning system.
I'm implementing the algorithm for classifying segments of html content as Content or Boilerplate described here. This has the weights already hard coded.
curr_linkDensity <= 0.333333
| prev_linkDensity <= 0.555556
| | curr_numWords <= 16
| | | next_numWords <= 15
| | | | prev_numWords <= 4: BOILERPLATE
| | | | prev_numWords > 4: CONTENT
| | | next_numWords > 15: CONTENT
| | curr_numWords > 16: CONTENT
| prev_linkDensity > 0.555556
| | curr_numWords <= 40
| | | next_numWords <= 17: BOILERPLATE
| | | next_numWords > 17: CONTENT
| | curr_numWords > 40: CONTENT
curr_linkDensity > 0.333333: BOILERPLATE
An expression evaluates to a result (usually written (e rightsquigarrow r) but we'll use e -- > r ). Haskell uses a similar notation for numbers and operators as most languages: 2 -- > 2. 3+4 -- > 7. 3+4*5 {equivalent to 3+(4*5)} -- > 23.
This operator works in the same way as any other programming language, it just returns true or false based on the input we have provided. Also, we can use any number of or operators there is no such restriction for that. Or operator is represented by using the '||' double pipe symbol in Haskell.
Functions play a major role in Haskell, as it is a functional programming language. Like other languages, Haskell does have its own functional definition and declaration. Function declaration consists of the function name and its argument list along with its output.
Not simplifying the logic manually (assuming you might generate this code automatically), I think using MultiWayIf
is pretty clean and direct.
{-# LANGUAGE MultiWayIf #-}
data Stats = Stats {
curr_linkDensity :: Double,
prev_linkDensity :: Double,
...
}
data Classification = Content | Boilerplate
classify :: Stats -> Classification
classify s = if
| curr_linkDensity s <= 0.333333 -> if
| prev_linkDensity s <= 0.555556 -> if
| curr_numWords s <= 16 -> if
| next_numWords s <= 15 -> if
| prev_numWords s <= 4 -> Boilerplate
| prev_numWords s > 4 -> Content
| next_numWords s > 16 -> Content
...
and so on.
However, since this is so structured -- just a tree of if/else with comparisons, also consider creating a decision tree data structure and writing an interpreter for it. This will allow you to do transformations, manipulations, inspections. Maybe it will buy you something; defining miniature languages for your specifications can be surprisingly beneficial.
data DecisionTree i o
= Comparison (i -> Double) Double (DecisionTree i o) (DecisionTree i o)
| Leaf o
runDecisionTree :: DecisionTree i o -> i -> o
runDecisionTree (Comparison f v ifLess ifGreater) i
| f i <= v = runDecisionTree ifLess i
| otherwise = runDecisionTree ifGreater i
runDecisionTree (Leaf o) = o
-- DecisionTree is an encoding of a function, and you can write
-- Functor, Applicative, and Monad instances!
Then
classifier :: DecisionTree Stats Classification
classifier =
Comparison curr_linkDensity 0.333333
(Comparison prev_linkDensity 0.555556
(Comparison curr_numWords 16
(Comparison next_numWords 15
(Comparison prev_numWords 4
(Leaf Boilerplate)
(Leaf Content))
(Leaf Content)
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With