I have data about soccer teams from three different sources. However, the 'team name' for the same team from each of these sources differ in style. For e.g.
[Source1] [Source2] [Source3]
Arsenal ARS Arsenal
Manchester United MNU ManUtd
West Bromwich Albion WBA WestBrom
Now very often I have to compare these team names (from different or the same sources) to check they're the same or different team. For e.g.
Arsenal == ARS : True
MNU == WBA : False
WBA == WestBrom : True
I wanted to know if there is a neat pythonic way of achieving this.
My idea is the following:
Create a class Team
which has a list of tuples containing tuples with the 3 matching team names together. Instantiate a. object of Team
for each of the team names. Then override the __eq__
method for the class where I'll do a reduce
over the list of tuples to find if the two team names in question belong to the same tuple which would indicate equality.
Some pseudocode:
class Team:
def __init__(self, teamname):
self.teams = [(Arsenal, ARS, Arsenal),
(Manchester United, MNU, ManUtd),
(West Bromwich Albion, WBA, WestBrom),]
self.teamname = teamname
def __eq__(self, teamA, teamB):
reduce(self.teams, <check if teamA and teamB in same tuple>)
Thoughts?
P.S.: Please suggest a better Title for this question as I don't think I've done a good job with the same.
Edit: Expanded my suggested solution
For simplicity, you can just put everything in a flat canonical lookup:
canonical = {'Arsenal':'ARS',
'ARS':'ARS',
'Manchester United':'MNU',
'MNU':'MNU',
'ManUtd':'MNU',
...}
Then equivalence testing is easy:
if canonical[x] == canonical[y]:
#they're the same team
There are a lot of good alternative answers here, so broad picture: this approach is good if you never expect your canonical
lookup to change. You can generate it once then forget about it. If it does frequently change, this is going to be miserable to maintain, so you should look elsewhere.
You could have some kind of equivalence mapping:
equivalents = {"Arsenal": ["ARS",],
"Manchester United": ["MNU", "ManUtd"], ...}
And use this to process your data:
>>> name = "ManUtd"
>>> for main, equivs in equivalents.items():
if name == main or name in equivs:
name = main
break
>>> name
"Manchester United"
This allows you to easily see what you consider to be the "canonical name" for the team (i.e. the key) and other names that are considered to be the same team (i.e. the list value).
If you do go down the class route, you should make the list of team tuples a class attribute:
class Team:
TEAMS = [("Arsenal", "ARS"), ("Manchester United", "MNU", "ManUtd"), ...]
def __init__(self, name):
if not any(name in names for names in self.TEAMS):
raise ValueError("Not a valid team name.")
self.name = name
def __eq__(self, other):
for names in self.TEAMS:
if self.name in names and other.name in names:
return True
return False
The output from this:
>>> mnu1 = Team("ManUtd")
>>> mnu2 = Team("MNU")
>>> mnu1 == mnu2
True
>>> ars = Team("ARS")
>>> ars == mnu1
False
>>> fail = Team("Not a name")
Traceback (most recent call last):
File "<pyshell#49>", line 1, in <module>
fail = Team("Not a name")
File "<pyshell#43>", line 7, in __init__
raise ValueError("Not a valid team name.")
ValueError: Not a valid team name.
Alternatively, just a simple function would do the same job if your Team
won't have other attributes:
def equivalent(team1, team2):
teams = [("Arsenal", "ARS"), ("Manchester United", "MNU", "ManUtd"), ...]
for names in teams:
if team1 in names and team2 in names:
return True
return False
Output from this:
>>> equivalent("MNU", "ManUtd")
True
>>> equivalent("MNU", "Arsenal")
False
>>> equivalent("MNU", "Not a name")
False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With