Consider the following examples:
string_now = 'apple and avocado'
stringthen = string_now.swap('apple', 'avocado') # stringthen = 'avocado and apple'
and:
string_now = 'fffffeeeeeddffee'
stringthen = string_now.swap('fffff', 'eeeee') # stringthen = 'eeeeefffffddffee'
Approaches discussed in Swap character of string in Python do not work, as the mapping technique used there only takes one character into consideration. Python's built-in str.maketrans()
also only supports one-character translations, as when I try to do multiple characters, it throws the following error:
Traceback (most recent call last):
File "main.py", line 4, in <module>
s.maketrans(mapper)
ValueError: string keys in translate table must be of length 1
A chain of replace()
methods is not only far from ideal (since I have many replacements to do, chaining replaces would be a big chunk of code) but because of its sequential nature, it will not translate things perfectly as:
string_now = 'apple and avocado'
stringthen = string_now.replace('apple', 'avocado').replace('avocado', 'apple')
gives 'apple and apple'
instead of 'avocado and apple'
.
What's the best way to achieve this?
A character in Python is also a string. So, we can use the replace() method to replace multiple characters in a string. It replaced all the occurrences of, Character 's' with 'X'.
Use the translate() method to replace multiple different characters. You can create the translation table specified in translate() by the str. maketrans() . Specify a dictionary whose key is the old character and whose value is the new string in the str.
To replace multiple values in a DataFrame we can apply the method DataFrame. replace(). In Pandas DataFrame replace method is used to replace values within a dataframe object.
Given that we want to swap words x
and y
, and that we don't care about the situation where they overlap, we can:
x
y
with x
y
Essentially, we use split points within the string as a temporary marker to avoid the problem with sequential replacements.
Thus:
def swap_words(s, x, y):
return y.join(part.replace(y, x) for part in s.split(x))
Test it:
>>> swap_words('apples and avocados and avocados and apples', 'apples', 'avocados')
'avocados and apples and apples and avocados'
>>>
Two regex solutions and one for other people who do have a character that can't appear (there are over a million different possible characters, after all) and who don't dislike replace
chains :-)
def swap_words_regex1(s, x, y):
return re.sub(re.escape(x) + '|' + re.escape(y),
lambda m: (x if m[0] == y else y),
s)
def swap_words_regex2(s, x, y):
return re.sub(f'({re.escape(x)})|{re.escape(y)}',
lambda m: x if m[1] is None else y,
s)
def swap_words_replaces(s, x, y):
return s.replace(x, chr(0)).replace(y, x).replace(chr(0), y)
Some benchmark results:
3.7 ms 1966 kB swap_words_split
10.7 ms 2121 kB swap_words_regex1
17.8 ms 2121 kB swap_words_regex2
1.3 ms 890 kB swap_words_replaces
Full code (Try it online!):
from timeit import repeat
import re
import tracemalloc as tm
def swap_words_split(s, x, y):
return y.join(part.replace(y, x) for part in s.split(x))
def swap_words_regex1(s, x, y):
return re.sub(re.escape(x) + '|' + re.escape(y),
lambda m: (x if m[0] == y else y),
s)
def swap_words_regex2(s, x, y):
return re.sub(f'({re.escape(x)})|{re.escape(y)}',
lambda m: x if m[1] is None else y,
s)
def swap_words_replaces(s, x, y):
return s.replace(x, chr(0)).replace(y, x).replace(chr(0), y)
funcs = swap_words_split, swap_words_regex1, swap_words_regex2, swap_words_replaces
args = 'apples and avocados and bananas and oranges and ' * 10000, 'apples', 'avocados'
for _ in range(3):
for func in funcs:
t = min(repeat(lambda: func(*args), number=1))
tm.start()
func(*args)
memory = tm.get_traced_memory()[1]
tm.stop()
print(f'{t * 1e3:4.1f} ms {memory // 1000:4} kB {func.__name__}')
print()
str.format()
:string_now = "apple and avocado"
stringthen = ( # "avocado and apple"
string_now.replace("apple", "{apple}")
.replace("avocado", "{avocado}")
.format(apple="avocado", avocado="apple")
)
# Edit: as a function
def swap_words(s, x, y):
return s.replace(x, "{" + x + "}")
.replace(y, "{" + y + "}")
.format(**{x: y, y: x})
It first adds curly brackets before and after keywords to turn them into placeholders. Then str.format()
is used to replace the placeholders.
Why not just use a temp string which will never be in the origin string?
for example:
>>> a = 'apples and avocados and avocados and apples'
>>> b = a.replace('apples', '#IamYourFather#').replace('avocados', 'apples').replace('#IamYourFather#', 'avocados')
>>> print(b)
avocados and apples and apples and avocados
where #IamYourFather#
is a string which will never be in the origin string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With