Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Save sentence as server filename

I'm saving the recording of a set of sentences to a corresponding set of audio files.

Sentences include:

Ich weiß es nicht!
¡No lo sé! 
Ég veit ekki!

How would you recommend I convert the sentence to a human readable filename which will later be served on an online server. I'm not sure right now as to what languages I might be dealing with in the future.

UPDATE:

Please note that two sentences can't clash with each other. For example:

É bär icke dej.
E bår icke dej.

can't resolve to the same filename as these will overwrite each other. This is the problem with the slugify function mentioned here: Turn a string into a valid filename?

The best I have come up with is to use urllib.parse.quote. However I think the resulting output is harder to read than I would have hoped. Any suggestions?:

Ich%20wei%C3%9F%20es%20nicht%21
%C2%A1No%20lo%20s%C3%A9%21
%C3%89g%20veit%20ekki%21
like image 763
Baz Avatar asked Nov 07 '22 13:11

Baz


1 Answers

What about unidecode?

import unidecode
a = [u'Ich weiß es nicht!', u'¡No lo sé!', u'Ég veit ekki!']
for s in a:
    print(unidecode.unidecode(s).replace(' ', '_'))

This gives pure ASCII strings that can readily be processed if they still contain unwanted characters. Keeping spaces distinct in the form of underscores helps with readability.

Ich_weiss_es_nicht!
!No_lo_se!
Eg_veit_ekki!

If uniqueness is a problem, a hash or something like that might be added to the strings.

Edit:

Some clarification seems to be required with respect to the hashing. Many hash functions are explicitely designed for giving very different outputs for close inputs. For example, the built-in hash function of python gives:

In [1]: hash('¡No lo sé!')
Out[1]: 6428242682022633791

In [2]: hash('¡No lo se!')
Out[2]: 4215591310983444451

With that you can do something like

unidecode.unidecode(s).replace(' ', '_') + '_' + str(hash(s))[:10]

in order to get not too long strings. Even with such shortened hashes, clashes are pretty unlikely.

like image 53
piripiri Avatar answered Nov 15 '22 13:11

piripiri