Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting part of string in parenthesis using python

I have a csv file with a column with strings. Part of the string is in parentheses. I wish to move the part of string in parentheses to a different column and retain the rest of the string as it is.

For instance: I wish to convert:

LC(Carbamidomethyl)RLK     

to

LCRLK Carbamidomethyl
like image 305
kkhatri99 Avatar asked Mar 20 '26 09:03

kkhatri99


1 Answers

Regex solution

If you only have one parentheses group in your string, you can use this regex:

>>> a = "LC(Carbamidomethyl)RLK"
>>> re.sub('(.*)\((.+)\)(.*)', '\g<1>\g<3> \g<2>', a)
'LCRLK Carbamidomethyl'
>>> a = "LCRLK"  
>>> re.sub('(.*)\((.+)\)(.*)', '\g<1>\g<3> \g<2>', a)
'LCRLK'  # works with no parentheses too

Regex decomposed:

(.*)       #! Capture begin of the string
\(         # match first parenthesis
  (.+)     #! Capture content into parentheses
\)         # match the second
(.*)       #! Capture everything after

---------------
\g<1>\g<3> \g<2>  # Write each capture in the correct order

String manipulation solution

A faster solution, for huge data set is:

begin, end  = a.find('('), a.find(')')
if begin != -1 and end != -1: 
    a = a[:begin] + a[end+1:] + " " + a[begin+1:end]

The process is to get the positions of parentheses (if there's any) and cut the string where we want. Then, we concatenate the result.

Performance of each method

It's clear that the string manipulation is the fastest method:

>>> timeit.timeit("re.sub('(.*)\((.+)\)(.*)', '\g<1>\g<3> \g<2>', a)", setup="a = 'LC(Carbadidomethyl)RLK'; import re")
15.214869976043701


>>> timeit.timeit("begin, end  = a.find('('), a.find(')') ; b = a[:begin] + a[end+1:] + ' ' + a[begin+1:end]", setup="a = 'LC(Carbamidomethyl)RL'")
1.44008207321167

Multi parentheses set

See comments

>>> a = "DRC(Carbamidomethyl)KPVNTFVHESLADVQAVC(Carbamidomethyl)SQKNVACK"
>>> while True:
...     begin, end  = a.find('('), a.find(')')
...     if begin != -1 and end != -1:
...         a = a[:begin] + a[end+1:] + " " + a[begin+1:end]
...     else:
...         break
...
>>> a
'DRCKPVNTFVHESLADVQAVCSQKNVACK Carbamidomethyl Carbamidomethyl'
like image 190
Maxime Lorant Avatar answered Mar 21 '26 23:03

Maxime Lorant



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!