When reading a CSV file using pandas, read_csv method, how do I skip the lines if the number of lines are not known in advance ?
I have a CSV file which contains some meta-data at the beginning of the file and then contains the header and actual data.
Example for the file sample_file.csv:
# Meta-Data Line 1
# Meta-Data Line 2
# Meta-Data Line 3
col1,col2,col3
a,b,c
d,e,f
g,h,i
How would I use Pandas read_csv function and skiprows parameter to read the csv ?
df = pd.read_csv('sample_file.csv', skiprows=?)
Does Pandas 0.19.X or greater support this use case ?
You can use pandas read_csv() function (see documentation here) to read the csv-file. In this function you can add an argument called "skiprows" and define the number of rows that should be skipped when reading the file. I have multiple files, each have different number of rows to be skipped.
This way, we can ignore the header row from the csv while reading the data. Line 1: We import the Pandas library as a pd. Line 2: We read the csv file using the pandas read_csv module, and in that, we mentioned the skiprows=[0], which means skip the first line while reading the csv file data.
In the Pandas DataFrame we can find the specified row value with the using function iloc(). In this function we pass the row number as parameter.
comment
is what you're searching for:
df = pd.read_csv('sample_file.csv', comment='#')
From the documentation:
comment : str, default None
Indicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Like empty lines (as long as skip_blank_lines=True), fully commented lines are ignored by the parameter header but not by skiprows. For example, if comment=’#’, parsing ‘#emptyna,b,cn1,2,3’ with header=0 will result in ‘a,b,c’ being treated as the header.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With