Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python efficient way to check if very large string contains a substring

Python is not my best language, and so I'm not all that good at finding the most efficient solutions to some of my problems. I have a very large string (coming from a 30 MB file) and I need to check if that file contains a smaller substring (this string is only a few dozen characters). The way I am currently doing it is:

if small_string in large_string:
    # logic here

But this seems to be very inefficient because it will check every possible sequence of characters in the file. I know that there will only be an exact match on a newline, so would it be better to read the file in as a list and iterate through that list to match?

EDIT: To clear up some confusion on "matching on a newline only", here's an example:

small_string = "This is a line"
big_string = "This is a line\nThis is another line\nThis is yet another"

If I'm not mistake, the in keyword will check all sequences, not just every line.

like image 498
Jon Martin Avatar asked Aug 24 '11 11:08

Jon Martin


1 Answers

How slow is too slow? I just did an a in b test for a unique string at the end of a 170 MB string. It finished before my finger left the Enter key.

like image 160
Marcelo Cantos Avatar answered Oct 18 '22 18:10

Marcelo Cantos