I have some code that does something like:
if string in comma_delimited_string.split(','):
return True
This website says that membership testing with sets and dicts is much faster that with lists or tuples. I know doing set(comma_delimited_string.split(','))
wouldn't improve speed because a list is still being created before it's converted into a set (or at least, it appeared to slow it down when I timed it).
I was wondering then, (mostly out of curiosity than real benefit to my code), is there a way to achieve the same effect of comma_delimited_string.split(',')
but directly creating a set, instead of a list, with the intention of speeding up the above operation?
You're ignoring the fact that in order to convert anything to a set, you need to iterate it. And that iteration is exactly the same as you are already doing in order to search the original list. So there cannot be any advantage in doing this, only overhead.
Searching against a set is more efficient if you're doing it multiple times, as that allows you to amortise the cost of the conversion. But the conversion itself is always going to be a linear scan; there's no way of avoiding that.
No, the str.split
operation always returns a list and trying to convert that into a set
will cost time. Also writing your own handmade split
that produces directly a set will be slower, because str.split
is implemente in C (the source code should be under Objects/stringlib/split.h
)
Note however that if your string
does not contain a comma and you expect string
to not be a substring of the elements returned by split
, then you can just do:
if string in comma_delimited_string:
If string
contains a comma, then your test will always fail (because by definition the elements of text.split(',')
will never contain one.
The case in which the above condition fail is when you have something like:
if "a" in "aaa,bb,c".split(',')
because in this case "a" in ["aaa", "bb", "c"]
fails.
Alternatively you could use a regex:
import re
if re.search(r'(^{0},)|(,{0},)|(,{0}$)|(^{0}$)'.format(re.escape(string)), comma_delimited_string):
However I don't know whether this would be faster, it probably depends on your inputs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With