Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I need help writing a regular expression for detecting chords

Tags:

python

regex

I'm writing a text to cdr (chordpro) converter and I'm having trouble detecting chord lines on the form:

               Cmaj7    F#m           C7    
Xxx xxxxxx xxx xxxxx xx x xxxxxxxxxxx xxx 

This is my python code:

def getChordMatches(line):
    import re
    notes = "[CDEFGAB]";
    accidentals = "(#|##|b|bb)?";
    chords = "(maj|min|m|sus|aug|dim)?";
    additions = "[0-9]?"
    return re.findall(notes + accidentals + chords + additions, line)

I want it to return a list ["Cmaj7", "F#m", "C7"]. The above code doesn't work, I've struggled with the documentation, but I'm not getting anywhere.

Why doesn't it work to just chain the classes and groups together?

edit

Thanks, I ended up with the following which covers most (it won't match E#m11 for instance) of my needs.

def getChordMatches(line):
    import re

    notes = "[ABCDEFG]";
    accidentals = "(?:#|##|b|bb)?";
    chords = "(?:maj|min|m|sus|aug|dim)?"
    additions = "[0-9]?"
    chordFormPattern = notes + accidentals + chords + additions
    fullPattern = chordFormPattern + "(?:/%s)?\s" % (notes + accidentals)
    matches = [x.replace(' ', '').replace('\n', '') for x in re.findall(fullPattern, line)]
    positions = [x.start() for x in re.finditer(fullPattern, line)]

    return matches, positions
like image 446
MdaG Avatar asked Feb 18 '23 01:02

MdaG


2 Answers

You should make your groups non-capturing by changing (...) to (?:...).

accidentals = "(?:#|##|b|bb)?";
chords = "(?:maj|min|m|sus|aug|dim)?";

See it working online: ideone


The reason why it doesn't work when you have capturing groups is that it only returns those groups and not the entire match. From the documentation:

re.findall(pattern, string, flags=0)

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

like image 168
Mark Byers Avatar answered Feb 20 '23 18:02

Mark Byers


There is a specific syntax for writing a verbose regex

regex = re.compile(
    r"""[CDEFGAB]                 # Notes
        (?:#|##|b|bb)?            # Accidentals
        (?:maj|min|m|sus|aug|dim) # Chords
        [0-9]?                    # Additions
     """, re.VERBOSE
)
result_list = regex.findall(line)

It's arguably a bit clearer than joining the strings together

like image 30
aychedee Avatar answered Feb 20 '23 16:02

aychedee