Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are these strings escaping from my regular expression in python?

In my code, I load up an entire folder into a list and then try to get rid of every file in the list except the .mp3 files.

import os
import re
path = '/home/user/mp3/'
dirList = os.listdir(path)
dirList.sort()
i = 0
for names in dirList:
  match = re.search(r'\.mp3', names)
  if match:
    i = i+1
  else:
    dirList.remove(names)
print dirList
print i

After I run the file, the code does get rid of some files in the list but keeps these two especifically:

['00. Various Artists - Indie Rock Playlist October 2008.m3u', '00. Various Artists - Indie Rock Playlist October 2008.pls']

I can't understand what's going on, why are those two specifically escaping my search.

like image 769
marcoamorales Avatar asked Jan 08 '11 21:01

marcoamorales


People also ask

How do I stop characters from escaping Python?

To ignoring escape sequences in the string, we make the string as "raw string" by placing "r" before the string.

What does escaping mean in regex?

Now, escaping a string (in regex terms) means finding all of the characters with special meaning and putting a backslash in front of them, including in front of other backslash characters. When you've done this one time on the string, you have officially "escaped the string".

What is escape character in Python regex?

Python supports Regex via module re . Python also uses backslash ( \ ) for escape sequences (i.e., you need to write \\ for \ , \\d for \d ), but it supports raw string in the form of r'...' , which ignore the interpretation of escape sequences - great for writing regex.

Do I need to escape in regex Python?

So if you want to match the '^' symbol, you can use it as the non-first character in a character class (e.g., pattern [ab^c] ). Note: It doesn't harm to escape the dot regex or any other special symbol within the character class—Python will simply ignore it!


1 Answers

You are modifying your list inside a loop. That can cause issues. You should loop over a copy of the list instead (for name in dirList[:]:), or create a new list.

modifiedDirList = []
for name in dirList:
    match = re.search(r'\.mp3', name)
    if match:
        i += 1
        modifiedDirList.append(name)

print modifiedDirList

Or even better, use a list comprehension:

dirList = [name for name in sorted(os.listdir(path))
           if re.search(r'\.mp3', name)]

The same thing, without a regular expression:

dirList = [name for name in sorted(os.listdir(path))
           if name.endswith('.mp3')]
like image 195
Seth Avatar answered Sep 21 '22 00:09

Seth