My scenario: I am reading a csv file. I want to have access to both a dictionary of the fields generated by each line, and the raw, un-parsed line.
The goal is ultimately to do some processing on the fields, use the result to decide which lines I am interested in, and write those lines only into an output file.
An easy solution, involving reading the file twice looks something like:
def dict_and_row(filename):
with open(filename) as f:
tmp = [row for row in DictReader(f)]
with open(filename) as f:
next(f) # skip header
for i, line in enumerate(f):
if len(line.strip()) > 0:
yield line.strip(), tmp[i]
Any better suggestions?
Edit: to be more specific about the usage scenario. I intended to index the lines by some of the data in the dict, and then use this index to find lines I am interested in. Something like:
d = {}
for raw, parsed in dict_and_row(somefile):
d[(parsed["SOMEFIELD"], parsed ["ANOTHERFIELD"])] = raw
and then later on
for pair in some_other_source_of_pairs:
if pair in d:
output.write(d[pair])
I ended up wrapping the file with an object that saves the last line read, and the handing this object to the DictReader.
class FileWrapper:
def __init__(self, f):
self.f = f
self.last_line = None
def __iter__(self):
return self
def __next__(self):
self.last_line = next(self.f)
return self.last_line
This could be then used this way:
f = FileWrapper(file_object)
for row in csv.DictReader(f):
print(row) # that's the dict
print(f.last_line) # that's the line
Or I can implement dict_and_row
:
def dict_and_row(filename):
with open(filename) as f:
wrapper = FileWrapper(f)
reader = DictReader(wrapper)
for row in reader:
yield row, wrapper.last_line
This also allows access to other properties such as the number of characters read.
Not sure that's the best solution but it does have the advantage of retaining access to strings as they were originally read from the file.
You could use Pandas which is an excellent library to do such kind of processing...
import pandas as pd
# read the csv file
data = pd.read_csv('data.csv')
# do some calculation on a column and store it in another column
data['column2'] = data['column1'] * 2
# If you decide that you need only a particular set of rows
# that match some condition of yours
data = data[data['column2'] > 100]
# store only particular columns back
cols = ['column1', 'column2', 'column3']
data[cols].to_csv('data_edited.csv')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With