Ahoy, I'm writing a Python script to filter some large CSV files.
I only want to keep rows which meet my criteria.
My input is a CSV file in the following format
Locus Total_Depth Average_Depth_sample Depth_for_17 chr1:6484996 1030 1030 1030 chr1:6484997 14 14 14 chr1:6484998 0 0 0
I want to return lines where the Total_Depth is 0.
I've been following this answer to read the data. But am stuck trying to parse over the rows and pull out the lines that meet my condition.
Here is the code I have so far:
import csv
f = open("file path", 'rb')
reader = csv.reader(f) #reader object which iterates over a csv file(f)
headers = reader.next() #assign the first row to the headers variable
column = {} #list of columns
for h in headers: #for each header
column[h] = []
for row in reader: #for each row in the reader object
for h, v in zip(headers, row): #combine header names with row values (v) in a series of tuples
column[h].append(v) #append each value to the relevant column
I understand that my data is now in a dictionary format, and I want to filter it based on the "Total_Depth" key, but I am unsure how to do this. I'm aiming to use an 'if' statement to select the relevant rows, but not sure how to do this with the dictionary structure.
Any advice would be greatly appreciated. SB :)
The .next() method returns the current row and moves to the next row.
In order to iterate over rows, we can use three function iteritems(), iterrows(), itertuples() . These three function will help in iteration over rows.
Use list comprehension.
import csv
with open("filepath", 'rb') as f:
reader = csv.DictReader(f)
rows = [row for row in reader if row['Total_Depth'] != '0']
for row in rows:
print row
DictReader
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With