I have a long list of strings to look into a very large file. I know I can achieve the above by using two for loops:
dns = sys.argv[2]
file = open(dns)
search_words = var #[list of 100+ strings ]
for line in file:
for word in search_words:
if word in line::
print(line)
However I'm looking for an efficient way todo this so that I don't have to wait for an half an hour for this to run. Can anyone help ?
The problem here is that you read the file line by line instead of actually loading the entire text file into RAM at once, which would gain you a lot of time in this case. This is what takes most time, but text-search can be improved in many ways that aren't as straightforward.
That said, there are multiple packages that are genuinely designed to do text-search efficiently in Python. I suggest that you have a look at AhocoraPy, which is based on the Aho-Corasick Algorithm, which, by the way, is the algorithm used in the well-known grep function. The GitHub page of the package provides explanation on how to achieve your task efficiently, so I will not go into further detail here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With