Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Opening and editing multiple files in a folder with python

I am trying to modify my .fasta files from this:

>YP_009208724.1 hypothetical protein ADP65_00072 [Achromobacter phage phiAxp-3]
MSNVLLKQ...

>YP_009220341.1 terminase large subunit [Achromobacter phage phiAxp-1]
MRTPSKSE...

>YP_009226430.1 DNA packaging protein [Achromobacter phage phiAxp-2]
MMNSDAVI...

to this:

>Achromobacter phage phiAxp-3
MSNVLLKQ...

>Achromobacter phage phiAxp-1
MRTPSKSE...

>Achromobacter phage phiAxp-2
MMNSDAVI...

Now, I've already have a script that can do it to a single file:

with open('Achromobacter.fasta', 'r') as fasta_file:
    out_file = open('./fastas3/Achromobacter.fasta', 'w')
    for line in fasta_file:
        line = line.rstrip()
        if '[' in line:
            line = line.split('[')[-1]
            out_file.write('>' + line[:-1] + "\n")
        else:
            out_file.write(str(line) + "\n")

but I can't get to automate the process for all 120 files in my folder.

I tried using glob.glob, but I can't seem to make it work:

import glob

for fasta_file in glob.glob('*.fasta'):
    outfile = open('./fastas3/'+fasta_file, 'w')
    with open(fasta_file, 'r'):
        for line in fasta_file:
            line = line.rstrip()
            if '[' in line:
                line2 = line.split('[')[-1]
                outfile.write('>' + line2[:-1] + "\n")
            else:
                outfile.write(str(line) + "\n")

it gives me this output:

A
c
i
n
e
t
o
b
a
c
t
e
r
.
f
a
s
t
a

I managed to get a list of all files in the folder, but can't open certain files using the object on the list.

import os


file_list = []
for file in os.listdir("./fastas2/"):
    if file.endswith(".fasta"):
        file_list.append(file)
like image 850
tahunami Avatar asked May 16 '26 13:05

tahunami


2 Answers

Considering you are able to change the contents of file name now you need to automate the process. We changed the function for one file by removing file handler which was used twice for the opening of file.

def file_changer(filename):
    data_to_put = ''
    with open(filename, 'r+') as fasta_file:
        for line in fasta_file.readlines():
            line = line.rstrip()
            if '[' in line:
                line = line.split('[')[-1]
                data_to_put += '>' + str(line[:-1]) + "\n"
            else:
                data_to_put += str(line) + "\n"
        fasta_file.write(data_to_put) 
        fasta_file.close()

Now we need to iterate over all your files. So lets use glob module for it

import glob
for file in glob.glob('*.fasta'):
    file_changer(file)
like image 145
Rajan Chauhan Avatar answered May 19 '26 01:05

Rajan Chauhan


You are iterating the file name, which gives you all the characters in the name instead of the lines of the file. Here is a corrected version of the code:

import glob

for fasta_file_name in glob.glob('*.fasta'):
    with open(fasta_file_name, 'r') as fasta_file, \
            open('./fastas3/' + fasta_file_name, 'w') as outfile:
        for line in fasta_file:
            line = line.rstrip()
            if '[' in line:
                line2 = line.split('[')[-1]
                outfile.write('>' + line2[:-1] + "\n")
            else:
                outfile.write(str(line) + "\n")

As an alternative to the Python script, you can simply use sed from the command line:

sed -i 's/^>.*\[\(.*\)\].*$/>\1/' *.fasta

This will modify all files in place, so consider copying them first.

like image 20
Sven Marnach Avatar answered May 19 '26 01:05

Sven Marnach



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!