I'm new to Python and Pyspark and I'm practicing TF-IDF. I split all words from sentences in the txt file, removed punctuations, removed the words that are in the stop-words list, and saved them as a dictionary with the codes below.
x = text_file.flatmap(lambda line: str_clean(line).split()
x = x.filter(lambda word: word not in stopwords
x = x.reduceByKey(lambda a,b: a+b)
x = x.collectAsMap()
I have 10 different txt files for this same process. And I'd like to add a string like "@d1"
to keys in dictionary so that I can indicate that the key is from document 1.
How can I add "@1"
to all keys in the dictionary?
Essentially my dictionary is in the form:
{'word1': 1, 'word2': 1, 'word3': 2, ....}
And I would like it to be:
{'word1@d1': 1, 'word2@d1': 1, 'word3@d1': 2, ...}
Try a dictionary comprehension:
{k+'@d1': v for k, v in d.items()}
In Python 3.6+, you can use f-strings:
{f'{k}@d1': v for k, v in d.items()}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With