I need to write a Python generator that yields tuples (X, Y) coming from two different CSV files.
It should receive a batch size on init, read line after line from the two CSVs, yield a tuple (X, Y) for each line, where X and Y are arrays (the columns of the CSV files).
I've looked at examples of lazy reading but I'm finding it difficult to convert them for CSVs:
Also, unfortunately Pandas Dataframes are not an option in this case.
Any snippet I can start from?
Thanks
Using Lazy Generator In fact, since csv file is a line-based file, you can simply use open function to loop through the data, one line at a time. open function already returns a generator and does not load the entire file into memory. In this article, we have learnt different ways to read large CSV file.
Reading a CSV is a very common use case as Python continues to grow in the data analytics community. Data is also growing and it’s now often the case that all the data folks are trying to work with, will not fit in memory. It’s also not always necessary to load all the data into memory.
To answer this question, let’s assume that csv_reader () just opens the file and reads it into an array: This function opens a given file and uses file.read () along with .split () to add each line as a separate element to a list.
We use open keyword to open the file and use a for loop that runs as long as there is data to be read. In each iteration it simply prints the output of read_in_chunks function that returns one chunk of data. 3. Using iterators You may also use iterators to easily read & process csv or other files one chunk at a time. Here is an example. 4.
You can have a generator, that reads lines from two different csv readers and yield their lines as pairs of arrays. The code for that is:
import csv
import numpy as np
def getData(filename1, filename2):
with open(filename1, "rb") as csv1, open(filename2, "rb") as csv2:
reader1 = csv.reader(csv1)
reader2 = csv.reader(csv2)
for row1, row2 in zip(reader1, reader2):
yield (np.array(row1, dtype=np.float),
np.array(row2, dtype=np.float))
# This will give arrays of floats, for other types change dtype
for tup in getData("file1", "file2"):
print(tup)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With