Range of unicode characters GHC accepts

Tags:

This may sound a bit ridiculous, but GHC fails to compile my string containing bacon, a croissant, cucumber, and a potato:

Click to copy

main = putStrLn "🥓  🥐  🥒  🥔"

I realize I could easily write

Click to copy

main = putStrLn "\x1F953  \x1F950  \x1F952  \x1F954"

to the same effect, but I had always assumed GHC would accept any unicode in its source. So: what are the actual restrictions on unicode characters GHC accepts in source files?

BTW: I realize that supporting this sort of thing is hell for the GHC lexer (actually I ran across the above problem while writing test cases for a lexer I wrote), but I still am a tad bit disappointed.

505

asked Jan 03 '17 07:01

Alec

1 Answers

Saving main = putStrLn "🥓 🥐 🥒 🥔" as UTF-8 and running it with ghc 8.0.1 on macOS, I got:

Click to copy

lexical error in string/character literal at character '\129365'

I found this related (but closed) ghc bug report:

The cause (for both problems) was that older versions of GHC support a older version of Unicode:

Click to copy
$ ghc-7.0.3 -e "Data.Char.generalCategory '\8342'"
NotAssigned

So the problem seems to be that the version of ghc we're using doesn't support the newer emojis yet – it thinks the unicode code point is unassigned and errors out even though it's assigned to the emoji in newer versions of unicode.

A related open ghc bug ticket which mostly discusses which whitespace chars are allowed though.

Finally, the lit_error function in Lexer.x seems to be where the error is surfaced. There are multiple functions in that file that call that error though, so not sure where it's coming from exactly...

answered Oct 12 '22 02:10

mb21

Related questions
                            
                                PHP extract link from <a> tag [duplicate]
                            
                                Ruby way to Check for string palindrome
                            
                                Java String split("|" ) method call not working correctly [duplicate]
                            
                                Reverse a given sentence in Java
                            
                                Multiplying a string by an int in C++
                            
                                Minimum Character that needed to be deleted
                            
                                How to convert float or currency to a localized string?
                            
                                No "sto{short, unsigned short}" functions in C++11? [closed]
                            
                                Remove of duplicate strings from very big text file
                            
                                C string program
                            
                                how to print u32string and u16string to the console in c++
                            
                                fuzzy string matching with term weights
                            
                                SQL String comparison speed 'like' vs 'patindex'
                            
                                String generation with regex like criteria
                            
                                show strings in compiled binary
                            
                                How can I figure out which tiles move and merge in my implementation of 2048?
                            
                                keep HTMLformat after replace some text (using PHP and JS)
                            
                                Longest matching substring irrespective of the order of characters
                            
                                Mismatched types: expected &str found String when assigning string
                            
                                Can you define an Android intent-filter using a string resource?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Range of unicode characters GHC accepts

Tags:

string

haskell

unicode

ghc

Alec

People also ask

1 Answers

mb21

Recent Activity

Donate For Us