Python efficient way to check if very large string contains a substring

Question

Python is not my best language, and so I'm not all that good at finding the most efficient solutions to some of my problems. I have a very large string (coming from a 30 MB file) and I need to check if that file contains a smaller substring (this string is only a few dozen characters). The way I am currently doing it is:

if small_string in large_string:
    # logic here

But this seems to be very inefficient because it will check every possible sequence of characters in the file. I know that there will only be an exact match on a newline, so would it be better to read the file in as a list and iterate through that list to match?

EDIT: To clear up some confusion on "matching on a newline only", here's an example:

small_string = "This is a line"
big_string = "This is a line
This is another line
This is yet another"

If I'm not mistake, the in keyword will check all sequences, not just every line.

Marcelo Cantos · Accepted Answer

How slow is too slow? I just did an a in b test for a unique string at the end of a 170 MB string. It finished before my finger left the Enter key.

Python efficient way to check if very large string contains a substring

Tags:

performance

python

Jon Martin

1 Answers

Marcelo Cantos

Recent Activity

Donate For Us

Python efficient way to check if very large string contains a substring

Tags:

performance

python

Jon Martin

1 Answers

Marcelo Cantos

Related questions

Recent Activity

Donate For Us