Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python re.findall prints output as list instead of string

My re.findall search is matching and returning the right string, but when I try to print the result, it prints it as a list instead of a string. Example below:

> line =  ID=id5;Parent=rna1;Dbxref=GeneID:653635,Genbank:NR_024540.1,HGNC:38034;gbkey=misc_RNA;gene=WASH7P;product=WAS protein family homolog 7 pseudogene;transcript_id=NR_024540.1

> print re.findall(r'gene=[^;\n]+', line)

>     ['gene=WASH7P']

I would like the print function just to return gene=WASH7P without the brackets and parentheses around it.

How can I adjust my code so that it prints just the match, without the brackets and parentheses around it?

Thank you!

like image 284
Ilea Avatar asked Mar 29 '15 05:03

Ilea


2 Answers

Thank you for everyone's help!

Both of the below codes were successful in printing the output as a string.

> re.findall(r'gene=[^;\n]+', line)[0]  

> re.search(r'gene=[^;\n]+', line).group

However, I was continuing to get "list index out of range" errors on one of my regex, even though results were printing when I just used re.findall().

> re.findall(r'transcript_id=[^\s]+',line)

I realized that this seemingly impossible result was because I was calling re.findall() within a for loop that was iterating over every line in a file. There were matches for some lines but not for others, so I was receiving the "list index out of range" error for those lines in which there was no match.

the code below resolved the issue:

> if re.findall(r'transcript_id=[^\s]+',line):

>    transcript = re.findall(r'transcript_id=[^\s]+',line)[0]

> else:

>   transcript = "NA" 

Thank you!

like image 93
Ilea Avatar answered Sep 29 '22 08:09

Ilea


It prints it as a list, because.. it is a list.

findall():

Return all non-overlapping matches of pattern in string, as a list of strings.

To print only the string use print(re.findall(r'Name=[^;]+', line)[0]) instead.

That code is assuming you do have one match. If you have 0 matches, you ll get an error. If you have more, you ll print only the first match.

To ensure you are not getting an error, check if a match was found before you use [0] (or .group() for re.search()).

s = re.search(r'Name=[^;]+', my_str)
if s:
    print(s.group())

or print(s[0])

like image 40
user Avatar answered Sep 29 '22 09:09

user