Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Words returns wrong output when a String has apostrophes in it

I am trying to split a String into a list of words

split_string :: String -> [Word]

split_string x = words x

"Functional Programming is Fun, isn’t it?" should give me back:

["Functional","Programming","is","Fun,","isn’t","it?"]

but it returns me this instead: ["Functional","Programming","is","Fun,","isn\8217t","it?"]

How can I avoid problem with the apostrophes problem? I am new to Haskell so sorry in advance if this is a stupid question.

like image 419
miwa_p Avatar asked Mar 02 '23 23:03

miwa_p


2 Answers

There is a common misunderstanding about what Haskell's Show mechanism is there for. Many beginners think it is supposed to yield pretty visualisations, but actually its purpose is specifically to generate representations that are valid Haskell code.

That means in particular that they shouldn't contain stuff that would cause an error if you copy&paste it back into a Haskell file.
For example, consider the string you tried to show "something" on the terminal. If GHCi displayed this as

"you tried to show "something" on the terminal"

that would incur a parse error. The quotation marks need to be escaped:

"you tried to show \"something\" on the terminal"

and that's the form that the Show String instance generates.

Generally, the representation is not unique. For example

"you tried to show \34something\34 on the terminal"

would also work, where \34 is the ASCII character code for the " symbol. This form can in fact be used for any characters:

Prelude> "\72\101\108\108\111\44\32\87\111\114\108\100"
"Hello, World"

Of course it's silly to do that for all characters, but the Haskell standard plays it safe in the sense that all non-ASCII characters are displayed in the escaped way:

Prelude> "Amila Bečirović"
"Amila Be\269irovi\263"

The advantage is that you're safe from quirks that could be introduced by incompatible character encodings – in the early 2000s this would often happen when Webpages used language-specific 8-bit encodings. By now this shouldn't really be an issue anymore.

As Willem Van Onsem wrote, you can always just raw-dump a string with putStrLn, which doesn't escape anything – though this isn't directly applicable to a list of strings.

For more flexibility, you can opt for a different Show class that doesn't have this behaviour, such as from the pragmatic-show package:

Prelude> import qualified Text.Show.Pragmatic as SP
Prelude SP> SP.print ["Functional","Programming","is","Fun,","isn’t","it?"]
["Functional","Programming","is","Fun,","isn’t","it?"]
Prelude SP> SP.print "Amila Bečirović"
"Amila Bečirović"

Note that this will still escape characters that really are unsafe:

Prelude SP> SP.print "bla\34blub"
"bla\"blub"
like image 131
leftaroundabout Avatar answered Mar 15 '23 23:03

leftaroundabout


This is not the string content. This is the representation of a string. If you write "foo" for a string literal, then the double quotes are not part of the content of the string. These are used to write a string literal.

Normally the Show typeclass will return for the show of an objects a string where the content of the string looks like a Haskell expression. This makes it more convenient to later copy paste the value in the code.

You can print the content of a string with putStrLn :: String -> IO (). For example:

Prelude> putStrLn "isn’t"
isn’t
Prelude> putStrLn "isn\8217t"
isn’t
like image 42
Willem Van Onsem Avatar answered Mar 16 '23 00:03

Willem Van Onsem