complex regex matches in python

Question

I have a txt file that contains the following data:

chrI

ATGCCTTGGGCAACGGT...(multiple lines)

chrII

AGGTTGGCCAAGGTT...(multiple lines)

I want to first find 'chrI' and then iterate through the multiple lines of ATGC until I find the xth char. Then I want to print the xth char until the yth char. I have been using regex but once I have located the line containing chrI, I don't know how to continue iterating to find the xth char.

Here is my code:

for i, line in enumerate(sacc_gff):
    for match in re.finditer(chromo_val, line):
        print(line)
        for match in re.finditer(r"[ATGC]{%d},{%d}\Z" % (int(amino_start), int(amino_end)), line):
            print(match.group())

What the variables mean:

chromo_val = chrI

amino_start = (some start point my program found)

amino_end = (some end point my program found)

Note: amino_start and amino_end need to be in variable form.

Please let me know if I could clarify anything for you, Thank you.

afinit · Accepted Answer

It looks like you are working with fasta data, so I will provide an answer with that in mind, but if it isn't you can use the sub_sequence selection part still.

fasta_data = {} # creates an empty dictionary
with open( fasta_file, 'r' ) as fh:
    for line in fh:
        if line[0] == '>':
            seq_id = line.rstrip()[1:] # strip newline character and remove leading '>' character
            fasta_data[seq_id] = ''
        else:
            fasta_data[seq_id] += line.rstrip()

# return substring from chromosome 'chrI' with a first character at amino_start up to but not including amino_end
sequence_string1 = fasta_data['chrI'][amino_start:amino_end]
# return substring from chromosome 'chrII' with a first character at amino_start up to and including amino_end
sequence_string2 = fasta_data['chrII'][amino_start:amino_end+1]

fasta format:

>chr1
ATTTATATATAT
ATGGCGCGATCG
>chr2
AATCGCTGCTGC

complex regex matches in python

Tags:

python

regex

bioinformatics

Medici

1 Answers

afinit

Recent Activity

Donate For Us

complex regex matches in python

Tags:

python

regex

bioinformatics

Medici

1 Answers

afinit

Related questions

Recent Activity

Donate For Us