I'm using this to calculate the number of sentences in a text:
fileObj = codecs.open( "someText.txt", "r", "utf-8" )
shortText = fileObj.read()
pat = '[.]'
for match in re.finditer(pat, shortText, re.UNICODE):
nSentences = nSentences+1
Someone told me this is better:
result = re.findall(pat, shortText)
nSentences = len(result)
Is there a difference? Don't they do the same thing?
The second is probably going to be a little faster, since the iteration is done entirely in C. How much faster? About 15% in my tests (matching 'a'
in 'a' * 16
), though that percentage will get smaller as the regex gets more complex and takes a larger proportion of the running time. But it will use more memory since it's actually going to create a list for you. Assuming you don't have a ton of matches, though, not too much more memory.
As to which I'd prefer, I do kind of like the second's conciseness, especially when combined into a single statement:
nSentences = len(re.findall(pat, shortText))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With