Python is not my best language, and so I'm not all that good at finding the most efficient solutions to some of my problems. I have a very large string (coming from a 30 MB file) and I need to check if that file contains a smaller substring (this string is only a few dozen characters). The way I am currently doing it is:
if small_string in large_string:
# logic here
But this seems to be very inefficient because it will check every possible sequence of characters in the file. I know that there will only be an exact match on a newline, so would it be better to read the file in as a list and iterate through that list to match?
EDIT: To clear up some confusion on "matching on a newline only", here's an example:
small_string = "This is a line"
big_string = "This is a line\nThis is another line\nThis is yet another"
If I'm not mistake, the in keyword will check all sequences, not just every line.
How slow is too slow? I just did an a in b
test for a unique string at the end of a 170 MB string. It finished before my finger left the Enter key.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With