I have a large csv file, that I cannot load into a DataFrame using read_csv() due to memory issues. However in the first column of the csv there is a {0,1} flag, and I only need to load the rows with a '1', which will easily be small enough to fit in a DataFrame. Is there any way to load the data with a condition, or to manipulate the csv prior to loading it (similar to grep)?
You can use pd.read_csv
s the comment
parameter and set it to '0'
import pandas as pd
from io import StringIO
txt = """col1,col2
1,a
0,b
1,c
0,d"""
pd.read_csv(StringIO(txt), comment='0')
col1 col2
0 1 a
1 1 c
You can also use chunksize
to turn pd.read_csv
into an iterator and process it with query
and pd.concat
NOTE: As the OP pointed out, chunk size of 1
isn't realistic. I used it for demonstration purposes only. Please increase it to suit individual needs.
pd.concat([df.query('col1 == 1') for df in pd.read_csv(StringIO(txt), chunksize=1)])
# Equivalent to and slower than... use the commented line for better performance
# pd.concat([df[df.col1 == 1] for df in pd.read_csv(StringIO(txt), chunksize=1)])
col1 col2
0 1 a
2 1 c
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With