I am trying to read and filter a csv file in chunks, and then put the result into a dataframe.
Here is what I use for reading and filtering the csv:
csv_chunks = pandas.read_csv(filepath, sep = DELIMITER,skiprows = 2, chunksize = 1000, converters = {"A": str, "B": str})
for chunk in csv_chunks:
chunk = chunk[(chunk["B"] + chunk["A"]).isin(acids.tolist())]
When I go and concatenate the chunks
df = pandas.concat(chunk for chunk in csv_chunks)
I get an error saying
File "C:\Program Files\Python\Anaconda\lib\site-packages\pandas\tools\merge.py
", line 872, in concat
verify_integrity=verify_integrity)
File "C:\Program Files\Python\Anaconda\lib\site-packages\pandas\tools\merge.py
", line 913, in __init__
raise Exception('All objects passed were None')
Exception: All objects passed were None
There are a couple of chunks that are empty, but there are non-empty ones too, so not sure what objects are seen as None. Any thoughts welcome!
Thanks, Anne
Try:
csv_chunks = [chunk[(chunk["B"] + chunk["A"]).isin(acids.tolist())]
for chunk in csv_chunks]
df = pandas.concat(csv_chunks)
The code
for chunk in csv_chunks:
chunk = chunk[(chunk["B"] + chunk["A"]).isin(acids.tolist())]
is probably not doing what you intend. With each iteration of the for-loop, for chunk in csv_chunks assigns an item in csv_chunks to chunk. Then,
chunk = chunk[(chunk["B"] + chunk["A"]).isin(acids.tolist())]
immediately reassigns a new value to chunk. Fine, but this does not change the items in csv_chunks. You are just twiddling the value in some independent variable, chunk.
To modify the values in csv_chunks, you could use a list comprehension to build a new list which is then reassigned to the variable csv_chunks:
csv_chunks = [chunk[(chunk["B"] + chunk["A"]).isin(acids.tolist())]
for chunk in csv_chunks]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With