I have data frame in which txt
column contains a list. I want to clean the txt
column using function clean_text().
data = {'value':['abc.txt', 'cda.txt'], 'txt':['[''2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart'']',
'[''2019/02/01-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart'']']}
df = pandas.DataFrame(data=data)
def clean_text(text):
"""
:param text: it is the plain text
:return: cleaned text
"""
patterns = [r"^{53}",
r"[A-Za-z]+[\d]+[\w]*|[\d]+[A-Za-z]+[\w]*",
r"[-=/':,?${}\[\]-_()>.~" ";+]"]
for p in patterns:
text = re.sub(p, '', text)
return text
My Solution:
df['txt'] = df['txt'].apply(lambda x: clean_text(x))
But I am getting below error: Error
sre_constants.error: nothing to repeat at position 1
^{53}
is not a valid regular expression, since the repeater {53}
must be preceded by a character or a pattern that can be repeated. If you mean to make it validate a string that is at least 53 characters long you can use the following pattern instead:
^.{53}
The culprit is the first pattern from the list - r"^{53}"
. It reads: ^
- match the beginning of the string and then {53}
repeat the previous character or group 53 times. Wait... but there is no other character than ^
which cannot be repeated! Indeed. Add a char that you want to match 53 repetitions of. Or, escape the sequence {53}
if you want to match it verbatim, e.g. using re.escape
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With