Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding equality between different strings that should be equal

I have data about soccer teams from three different sources. However, the 'team name' for the same team from each of these sources differ in style. For e.g.

[Source1]             [Source2]  [Source3]
Arsenal               ARS        Arsenal
Manchester United     MNU        ManUtd
West Bromwich Albion  WBA        WestBrom

Now very often I have to compare these team names (from different or the same sources) to check they're the same or different team. For e.g.

Arsenal == ARS  : True
MNU == WBA      : False
WBA == WestBrom : True

I wanted to know if there is a neat pythonic way of achieving this.

My idea is the following: Create a class Team which has a list of tuples containing tuples with the 3 matching team names together. Instantiate a. object of Team for each of the team names. Then override the __eq__ method for the class where I'll do a reduce over the list of tuples to find if the two team names in question belong to the same tuple which would indicate equality.

Some pseudocode:

class Team:
  def __init__(self, teamname):
    self.teams = [(Arsenal, ARS, Arsenal),
                  (Manchester United, MNU, ManUtd),
                  (West Bromwich Albion, WBA, WestBrom),]
    self.teamname = teamname

  def __eq__(self, teamA, teamB):
    reduce(self.teams, <check if teamA and teamB in same tuple>)

Thoughts?

P.S.: Please suggest a better Title for this question as I don't think I've done a good job with the same.

Edit: Expanded my suggested solution

like image 528
keithxm23 Avatar asked Mar 19 '14 15:03

keithxm23


2 Answers

For simplicity, you can just put everything in a flat canonical lookup:

canonical = {'Arsenal':'ARS',
             'ARS':'ARS',
             'Manchester United':'MNU',
             'MNU':'MNU',
             'ManUtd':'MNU',
             ...}

Then equivalence testing is easy:

if canonical[x] == canonical[y]:
    #they're the same team

There are a lot of good alternative answers here, so broad picture: this approach is good if you never expect your canonical lookup to change. You can generate it once then forget about it. If it does frequently change, this is going to be miserable to maintain, so you should look elsewhere.

like image 183
roippi Avatar answered Oct 06 '22 00:10

roippi


You could have some kind of equivalence mapping:

equivalents = {"Arsenal": ["ARS",], 
               "Manchester United": ["MNU", "ManUtd"], ...}

And use this to process your data:

>>> name = "ManUtd"
>>> for main, equivs in equivalents.items():
    if name == main or name in equivs:
        name = main
        break

>>> name 
"Manchester United"

This allows you to easily see what you consider to be the "canonical name" for the team (i.e. the key) and other names that are considered to be the same team (i.e. the list value).


If you do go down the class route, you should make the list of team tuples a class attribute:

class Team:

    TEAMS = [("Arsenal", "ARS"), ("Manchester United", "MNU", "ManUtd"), ...]

    def __init__(self, name):
        if not any(name in names for names in self.TEAMS):
            raise ValueError("Not a valid team name.")
        self.name = name

    def __eq__(self, other):
        for names in self.TEAMS:
            if self.name in names and other.name in names:
                return True
        return False

The output from this:

>>> mnu1 = Team("ManUtd")
>>> mnu2 = Team("MNU")
>>> mnu1 == mnu2
True

>>> ars = Team("ARS")
>>> ars == mnu1
False

>>> fail = Team("Not a name")
Traceback (most recent call last):
  File "<pyshell#49>", line 1, in <module>
    fail = Team("Not a name")
  File "<pyshell#43>", line 7, in __init__
    raise ValueError("Not a valid team name.")
ValueError: Not a valid team name.

Alternatively, just a simple function would do the same job if your Team won't have other attributes:

def equivalent(team1, team2):
    teams = [("Arsenal", "ARS"), ("Manchester United", "MNU", "ManUtd"), ...]
    for names in teams:
        if team1 in names and team2 in names:
            return True
    return False

Output from this:

>>> equivalent("MNU", "ManUtd")
True
>>> equivalent("MNU", "Arsenal")
False
>>> equivalent("MNU", "Not a name")
False
like image 29
jonrsharpe Avatar answered Oct 05 '22 23:10

jonrsharpe