I want to remove duplicate word from a text file.
i have some text file which contain such like following:
None_None
ConfigHandler_56663624
ConfigHandler_56663624
ConfigHandler_56663624
ConfigHandler_56663624
None_None
ColumnConverter_56963312
ColumnConverter_56963312
PredicatesFactory_56963424
PredicatesFactory_56963424
PredicateConverter_56963648
PredicateConverter_56963648
ConfigHandler_80134888
ConfigHandler_80134888
ConfigHandler_80134888
ConfigHandler_80134888
The resulted output needs to be:
None_None
ConfigHandler_56663624
ColumnConverter_56963312
PredicatesFactory_56963424
PredicateConverter_56963648
ConfigHandler_80134888
I have used just this command: en=set(open('file.txt') but it does not work.
Could anyone help me with how to extract only the unique set from the file
Thank you
Here is a simple solution using sets to remove the duplicates from the text file.
lines = open('workfile.txt', 'r').readlines()
lines_set = set(lines)
out = open('workfile.txt', 'w')
for line in lines_set:
out.write(line)
Here's about option that preserves order (unlike a set), but still has the same behaviour (note that the EOL character is deliberately stripped and blank lines are ignored)...
from collections import OrderedDict
with open('/home/jon/testdata.txt') as fin:
lines = (line.rstrip() for line in fin)
unique_lines = OrderedDict.fromkeys( (line for line in lines if line) )
print unique_lines.keys()
# ['None_None', 'ConfigHandler_56663624', 'ColumnConverter_56963312',PredicatesFactory_56963424', 'PredicateConverter_56963648', 'ConfigHandler_80134888']
Then you just need to write the above to your output file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With