I'm working on loading a list of emoji characters in a simple python 3.6 script. The YAML structure is essentially as follows:
- 🙂
- 😁
- 😬
My python script looks like this:
import yaml
f = open('emojis.yml')
EMOJIS = yaml.load(f)
f.close()
I'm getting the following exception:
yaml.reader.ReaderError: unacceptable character #x001d: special characters are not allowed in "emojis.yml", position 2
I have seen the allow_unicode=True
option but that seems to only be available for yaml.dump. It appears that people have had some trouble with similar issues in Python2, but since all strings should be unicode, I'm having trouble figuring out why this isn't working.
I've also tried wrapping my emojis in quotes and using a customer constructor for 'tag:yaml.org,2002:str'. My custom constructor is never even hit presumably because the yaml lib is failing to recognize my emoji as having the string type. I also observe the same behavior when I define my emoji directly as a string in source.
Is there a way to load a yaml file containing emojis with PyYAML?
You should upgrade to ruamel.yaml
(disclaimer: I am the author of that package), which has this, and many other long standing PyYAML issues, fixed:
import sys
from ruamel.yaml import YAML
yaml = YAML()
with open('emojis.yml') as fp:
idx = 0
for c in fp.read():
print('{:08x}'.format(ord(c)), end=' ')
idx += 1
if idx % 4 == 0:
print()
with open('emojis.yml') as fp:
data = yaml.load(fp)
yaml.dump(data, sys.stdout)
gives:
0000002d 00000020 0001f642 0000000a
0000002d 00000020 0001f601 0000000a
0000002d 00000020 0001f62c 0000000a
['🙂', '😁', '😬']
If you really have to stick with PyYAML, you can do:
import yaml.reader
import re
yaml.reader.Reader.NON_PRINTABLE = re.compile(
u'[^\x09\x0A\x0D\x20-\x7E\x85\xA0-\uD7FF\uE000-\uFFFD\U00010000-\U0010FFFF]')
to get rid of the error.
Starting with version 0.15.16, ruamel.yaml
now also dumps all supplementary plane Unicode without reverting to \Uxxxxxxxx
(controllable in the new API via .unicode_supplementary
, and depending on allow_unicode
).
the latest version of pyyaml has fixed this bug, upgrade to pyyaml>=5
This seems to be a bug in pyyaml, a workaround is to use their escape sequences:
$ cat test.yaml
- "\U0001f642"
- "\U0001f601"
- "\U0001f62c"
$ python
...
>>> yaml.load(open('test.yaml'))
['🙂', '😁', '😬']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With