Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Turn a string into a valid filename?

I have a string that I want to use as a filename, so I want to remove all characters that wouldn't be allowed in filenames, using Python.

I'd rather be strict than otherwise, so let's say I want to retain only letters, digits, and a small set of other characters like "_-.() ". What's the most elegant solution?

The filename needs to be valid on multiple operating systems (Windows, Linux and Mac OS) - it's an MP3 file in my library with the song title as the filename, and is shared and backed up between 3 machines.

like image 897
Sophie Gage Avatar asked Nov 17 '08 09:11

Sophie Gage


People also ask

How do I make a file name valid?

Don't start or end your filename with a space, period, hyphen, or underline. Keep your filenames to a reasonable length and be sure they are under 31 characters. Most operating systems are case sensitive; always use lowercase. Avoid using spaces and underscores; use a hyphen instead.

Which method is used to create String for filename in Python?

The String format() Method.

What is a valid Python file name?

Python Programming Puzzles: Exercise-59 with Solution A valid filename should end in . txt, .exe, . jpg, . png, or . dll, and should have at most three digits, no additional periods.

Is a valid character in a filename?

Supported characters for a file name are letters, numbers, spaces, and ( ) _ - , . *Please note file names should be limited to 100 characters. Characters that are NOT supported include, but are not limited to: @ $ % & \ / : * ?


1 Answers

You can look at the Django framework for how they create a "slug" from arbitrary text. A slug is URL- and filename- friendly.

The Django text utils define a function, slugify(), that's probably the gold standard for this kind of thing. Essentially, their code is the following.

import unicodedata import re  def slugify(value, allow_unicode=False):     """     Taken from https://github.com/django/django/blob/master/django/utils/text.py     Convert to ASCII if 'allow_unicode' is False. Convert spaces or repeated     dashes to single dashes. Remove characters that aren't alphanumerics,     underscores, or hyphens. Convert to lowercase. Also strip leading and     trailing whitespace, dashes, and underscores.     """     value = str(value)     if allow_unicode:         value = unicodedata.normalize('NFKC', value)     else:         value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')     value = re.sub(r'[^\w\s-]', '', value.lower())     return re.sub(r'[-\s]+', '-', value).strip('-_') 

And the older version:

def slugify(value):     """     Normalizes string, converts to lowercase, removes non-alpha characters,     and converts spaces to hyphens.     """     import unicodedata     value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore')     value = unicode(re.sub('[^\w\s-]', '', value).strip().lower())     value = unicode(re.sub('[-\s]+', '-', value))     # ...     return value 

There's more, but I left it out, since it doesn't address slugification, but escaping.

like image 72
S.Lott Avatar answered Oct 05 '22 21:10

S.Lott