Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Save sentence as server filename

I'm saving the recording of a set of sentences to a corresponding set of audio files.

Sentences include:

Ich weiß es nicht!
¡No lo sé! 
Ég veit ekki!

How would you recommend I convert the sentence to a human readable filename which will later be served on an online server. I'm not sure right now as to what languages I might be dealing with in the future.


Please note that two sentences can't clash with each other. For example:

É bär icke dej.
E bår icke dej.

can't resolve to the same filename as these will overwrite each other. This is the problem with the slugify function mentioned here: Turn a string into a valid filename?

The best I have come up with is to use urllib.parse.quote. However I think the resulting output is harder to read than I would have hoped. Any suggestions?:

like image 763
Baz Avatar asked Nov 07 '22 13:11


1 Answers

What about unidecode?

import unidecode
a = [u'Ich weiß es nicht!', u'¡No lo sé!', u'Ég veit ekki!']
for s in a:
    print(unidecode.unidecode(s).replace(' ', '_'))

This gives pure ASCII strings that can readily be processed if they still contain unwanted characters. Keeping spaces distinct in the form of underscores helps with readability.


If uniqueness is a problem, a hash or something like that might be added to the strings.


Some clarification seems to be required with respect to the hashing. Many hash functions are explicitely designed for giving very different outputs for close inputs. For example, the built-in hash function of python gives:

In [1]: hash('¡No lo sé!')
Out[1]: 6428242682022633791

In [2]: hash('¡No lo se!')
Out[2]: 4215591310983444451

With that you can do something like

unidecode.unidecode(s).replace(' ', '_') + '_' + str(hash(s))[:10]

in order to get not too long strings. Even with such shortened hashes, clashes are pretty unlikely.

like image 53
piripiri Avatar answered Nov 15 '22 13:11
