Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Read CSV file with variable rows to skip with special character at the beginning of row

Tags:

python

pandas

csv

When reading a CSV file using pandas, read_csv method, how do I skip the lines if the number of lines are not known in advance ?

I have a CSV file which contains some meta-data at the beginning of the file and then contains the header and actual data.

  • The meta data always start with a # sign and it would always be at the top of CSV file.
  • The number of lines for meta data is not fixed.

Example for the file sample_file.csv:

# Meta-Data Line 1
# Meta-Data Line 2
# Meta-Data Line 3
col1,col2,col3
a,b,c
d,e,f
g,h,i

How would I use Pandas read_csv function and skiprows parameter to read the csv ?

df = pd.read_csv('sample_file.csv', skiprows=?)

Does Pandas 0.19.X or greater support this use case ?

like image 443
Spandan Brahmbhatt Avatar asked Jan 30 '17 21:01

Spandan Brahmbhatt


People also ask

How do you skip a line when reading a CSV file?

You can use pandas read_csv() function (see documentation here) to read the csv-file. In this function you can add an argument called "skiprows" and define the number of rows that should be skipped when reading the file. I have multiple files, each have different number of rows to be skipped.

How do I ignore the first line of a CSV file in Python?

This way, we can ignore the header row from the csv while reading the data. Line 1: We import the Pandas library as a pd. Line 2: We read the csv file using the pandas read_csv module, and in that, we mentioned the skiprows=[0], which means skip the first line while reading the csv file data.

How do I read specific rows in pandas?

In the Pandas DataFrame we can find the specified row value with the using function iloc(). In this function we pass the row number as parameter.


1 Answers

comment is what you're searching for:

df = pd.read_csv('sample_file.csv', comment='#')

From the documentation:

comment : str, default None

Indicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Like empty lines (as long as skip_blank_lines=True), fully commented lines are ignored by the parameter header but not by skiprows. For example, if comment=’#’, parsing ‘#emptyna,b,cn1,2,3’ with header=0 will result in ‘a,b,c’ being treated as the header.

like image 85
Zeugma Avatar answered Oct 20 '22 11:10

Zeugma