Group strings with values in Python

Question

I'm working on twitter hashtags and I've already counted the number of times they appear in my csv file. My csv file look like:

GilletsJaunes, 100
Macron, 50
gilletsjaune, 20
tax, 10

Now, I would like to group together 2 terms that are close, such as "GilletsJaunes" and "gilletsjaune" using the fuzzywuzzy library. If the proximity between the 2 terms is greater than 80, then their value is added in only one of the 2 terms and the other is deleted. This would give:

GilletsJaunes, 120
Macron, 50
tax, 10

For use "fuzzywuzzy":

from fuzzywuzzy import fuzz
from fuzzywuzzy import process

fuzz.ratio("GiletsJaunes", "giletsjaune")
82 #output

Wok · Accepted Answer

First, copy these two functions to be able to compute the argmax:

# given an iterable of pairs return the key corresponding to the greatest value
def argmax(pairs):
    return max(pairs, key=lambda x: x[1])[0]


# given an iterable of values return the index of the greatest value
def argmax_index(values):
    return argmax(enumerate(values))

Second, load the content of your CSV into a Python dictionary and proceed as follows:

from fuzzywuzzy import fuzz

input = {
    'GilletsJaunes': 100,
    'Macron': 50,
    'gilletsjaune': 20,
    'tax': 10,
}

threshold = 50

output = dict()
for query in input:
    references = list(output.keys()) # important: this is output.keys(), not input.keys()!
    scores = [fuzz.ratio(query, ref) for ref in references]
    if any(s > threshold for s in scores):
        best_reference = references[argmax_index(scores)]
        output[best_reference] += input[query]
    else:
        output[query] = input[query]

print(output)

{'GilletsJaunes': 120, 'Macron': 50, 'tax': 10}

Group strings with values in Python

Tags:

python

levenshtein-distance

grouping

fuzzywuzzy

Steph

1 Answers

Wok

Recent Activity

Donate For Us

Group strings with values in Python

Tags:

python

levenshtein-distance

grouping

fuzzywuzzy

Steph

1 Answers

Wok

Related questions

Recent Activity

Donate For Us