You can extract a substring from a string before a specific character using the rpartition() method. rpartition() method partitions the given string based on the last occurrence of the delimiter and it generates tuples that contain three elements where.
Use the split() method to cut string before the character in Python. The split() method splits a string into a list.
Just use the split
function. It returns a list, so you can keep the first element:
>>> s1.split(':')
['Username', ' How are you today?']
>>> s1.split(':')[0]
'Username'
Using index
:
>>> string = "Username: How are you today?"
>>> string[:string.index(":")]
'Username'
The index will give you the position of :
in string, then you can slice it.
If you want to use regex:
>>> import re
>>> re.match("(.*?):",string).group()
'Username'
match
matches from the start of the string.
you can also use itertools.takewhile
>>> import itertools
>>> "".join(itertools.takewhile(lambda x: x!=":", string))
'Username'
You don't need regex
for this
>>> s = "Username: How are you today?"
You can use the split
method to split the string on the ':'
character
>>> s.split(':')
['Username', ' How are you today?']
And slice out element [0]
to get the first part of the string
>>> s.split(':')[0]
'Username'
I have benchmarked these various technics under Python 3.7.0 (IPython).
c
is known): pre-compiled regex.s.partition(c)[0]
.c
may not be in s
): partition, split.import string, random, re
SYMBOLS = string.ascii_uppercase + string.digits
SIZE = 100
def create_test_set(string_length):
for _ in range(SIZE):
random_string = ''.join(random.choices(SYMBOLS, k=string_length))
yield (random.choice(random_string), random_string)
for string_length in (2**4, 2**8, 2**16, 2**32):
print("\nString length:", string_length)
print(" regex (compiled):", end=" ")
test_set_for_regex = ((re.compile("(.*?)" + c).match, s) for (c, s) in test_set)
%timeit [re_match(s).group() for (re_match, s) in test_set_for_regex]
test_set = list(create_test_set(16))
print(" partition: ", end=" ")
%timeit [s.partition(c)[0] for (c, s) in test_set]
print(" index: ", end=" ")
%timeit [s[:s.index(c)] for (c, s) in test_set]
print(" split (limited): ", end=" ")
%timeit [s.split(c, 1)[0] for (c, s) in test_set]
print(" split: ", end=" ")
%timeit [s.split(c)[0] for (c, s) in test_set]
print(" regex: ", end=" ")
%timeit [re.match("(.*?)" + c, s).group() for (c, s) in test_set]
String length: 16
regex (compiled): 156 ns ± 4.41 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
partition: 19.3 µs ± 430 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
index: 26.1 µs ± 341 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split (limited): 26.8 µs ± 1.26 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split: 26.3 µs ± 835 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
regex: 128 µs ± 4.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
String length: 256
regex (compiled): 167 ns ± 2.7 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
partition: 20.9 µs ± 694 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
index: 28.6 µs ± 2.73 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split (limited): 27.4 µs ± 979 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split: 31.5 µs ± 4.86 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
regex: 148 µs ± 7.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
String length: 65536
regex (compiled): 173 ns ± 3.95 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
partition: 20.9 µs ± 613 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
index: 27.7 µs ± 515 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split (limited): 27.2 µs ± 796 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split: 26.5 µs ± 377 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
regex: 128 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
String length: 4294967296
regex (compiled): 165 ns ± 1.2 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
partition: 19.9 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
index: 27.7 µs ± 571 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split (limited): 26.1 µs ± 472 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split: 28.1 µs ± 1.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
regex: 137 µs ± 6.53 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
partition() may be better then split() for this purpose as it has the better predicable results for situations you have no delimiter or more delimiters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With