Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automatic conversion between String and Data.Text in haskell

As Nikita Volkov mentioned in his question Data.Text vs String I also wondered why I have to deal with the different String implementations type String = [Char] and Data.Text in haskell. In my code I use the pack and unpack functions really often.

My question: Is there a way to have an automatic conversion between both string types so that I can avoid writing pack and unpack so often?

In other programming languages like Python or JavaScript there is for example an automatic conversion between integers and floats if it is needed. Can I reach something like this also in haskell? I know, that the mentioned languages are weakly typed, but I heard that C++ has a similar feature.

Note: I already know the language extension {-# LANGUAGE OverloadedStrings #-}. But as I understand this language extensions just applies to strings defined as "...". I want to have an automatic conversion for strings which I got from other functions or I have as arguments in function definitions.

Extended question: Haskell. Text or Bytestring covers also the difference between Data.Text and Data.ByteString. Is there a way to have an automatic conversion between the three strings String, Data.Text and Data.ByteString?

like image 818
Stephan Kulla Avatar asked Mar 25 '14 17:03

Stephan Kulla


2 Answers

No.

Haskell doesn't have implicit coercions for technical, philosophical, and almost religious reasons.

As a comment, converting between these representations isn't free and most people don't like the idea that you have hidden and potentially expensive computations lurking around. Additionally, with strings as lazy lists, coercing them to a Text value might not terminate.

We can convert literals to Texts automatically with OverloadedStrings by desugaring a string literal "foo" to fromString "foo" and fromString for Text just calls pack.

The question might be to ask why you're coercing so much? Is there some why do you need to unpack Text values so often? If you constantly changing them to strings it defeats the purpose a bit.

like image 125
Daniel Gratzer Avatar answered Nov 09 '22 14:11

Daniel Gratzer


Almost Yes: Data.String.Conversions

Haskell libraries make use of different types, so there are many situations in which there is no choice but to heavily use conversion, distasteful as it is - rewriting libraries doesn't count as a real choice.

I see two concrete problems, either of which being potentially a significant problem for Haskell adoption :

  • coding ends up requiring specific implementation knowledge of the libraries you want to use.This is a big issue for a high-level language

  • performance on simple tasks is bad - which is a big issue for a generalist language.

Abstracting from the specific types

In my experience, the first problem is the time spent guessing the package name holding the right function for plumbing between libraries that basically operate on the same data.

To that problem there is a really handy solution : the Data.String.Conversions package, provided you are comfortable with UTF-8 as your default encoding.

This package provides a single cs conversion function between a number of different types.

  • String
  • Data.ByteString.ByteString
  • Data.ByteString.Lazy.ByteString
  • Data.Text.Text
  • Data.Text.Lazy.Text

So you just import Data.String.Conversions, and use cs which will infer the right version of the conversion function according to input and output types.

Example:

import Data.Aeson              (decode) import Data.Text               (Text) import Data.ByteString.Lazy    (ByteString) import Data.String.Conversions (cs)  decodeTextStoredJson' :: T.Text -> MyStructure decodeTextStoredJson' x = decode (cs x) :: Maybe MyStructure 

NB : In GHCi you generally do not have a context that gives the target type so you direct the conversion by explicitly stating the type of the result, like for read

let z = cs x :: ByteString 

Performance and the cry for a "true" solution

I am not aware of any true solution as of yet - but we can already guess the direction

  • it is legitimate to require conversion because the data does not change ;
  • best performance is achieved by not converting data from one type to another for administrative purposes ;
  • coercion is evil - coercitive, even.

So the direction must be to make these types not different, i.e. to reconcile them under (or over) an archtype from which they would all derive, allowing composition of functions using different derivations, without the need to convert.

Nota : I absolutely cannot evaluate the feasability / potential drawbacks of this idea. There may be some very sound stoppers.

like image 28
Titou Avatar answered Nov 09 '22 13:11

Titou