Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find an alternating (letter or number) + letter pattern in a file name

Tags:

python

regex

From the following list of filenames I am trying to retrieve the highlighted parts:

  • something something Ah6d8c.txt
  • something Qd6h7s.txt
  • somethingAcKhJssomething.txt
  • 7h6c8c something.txt

The pattern is:

  • 6 characters long
  • starts with 2-9 or A K Q J T, both lower and uppercase
  • the second character is always h s c d, both lower and uppercase
  • the third and fourth match the first and second
  • same for the fifth and sixth
import os
import re

root = "C:/root"
data = dict()

re_pattern = "[a-zA-Z|2-9][h|s|c|d][a-zA-Z|2-9][h|s|c|d][a-zA-Z|2-9][h|s|c|d]"

for folder in os.listdir(root):
    data[folder] = dict()
    for item in os.listdir(f"{root}/{folder}"):
        board_id = re.findall(item, re_pattern)
        print(board_id)
        data[folder][item] = f"{root}/{folder}/{item}"

I thought my regex would work but it finds an empty list. Is my regex or my code wrong? The goal is to have the board_id be the dictionary key and the value the entire path.

EDIT Improved pattern looks like:

import os
import re

root = "C:/root"
data = dict()

re_pattern = "(?i)(?:[2-9AJKQT][hscd]){3}"

for folder in os.listdir(root):
    data[folder] = dict()
    for item in os.listdir(f"{root}/{folder}"):
        board_id = re.search(item, re_pattern)
        print(f"{item} :: {board_id}")
        data[folder][item] = f"{root}/{folder}/{item}"

Results are still not right, must be in the code instead:

  • As6d5d BTN 2.5x vs BB.txt :: None
  • SRP 3x 5h6d3c.txt :: None

Best regards

like image 295
fisheatshark Avatar asked Nov 20 '19 15:11

fisheatshark


People also ask

Which pattern is used to match any non What character?

The expression \w will match any word character. Word characters include alphanumeric characters ( - , - and - ) and underscores (_). \W matches any non-word character.

How do I match a pattern in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

Which of the following characters can you use in a file name in Windows?

Supported characters for a file name are letters, numbers, spaces, and ( ) _ - , . *Please note file names should be limited to 100 characters.

What method should you use when you want to get all sequences matching a regex pattern in a string?

To find all the matching strings, use String's scan method.


1 Answers

How about using character classes in a quantified group.

(?i)(?:[2-9AJKQT][hscd]){3}

See this demo at regex101 or this Python demo

For caseless matching use (?i) flag or re.IGNORECASE.


Taking a closer look at your code, further be aware of order, arguments are passed in re.findall

 re.findall(pattern, string, flags=0)

Another idea for future, it might further be, to consider generally using raw string notation for regex patterns but this is not an issue with your current pattern.

like image 79
bobble bubble Avatar answered Oct 19 '22 02:10

bobble bubble