Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to ignore the first line of data when processing CSV data?

Tags:

python

csv

People also ask

How do I skip the first line of a CSV file?

Line 1: We import the Pandas library as a pd. Line 2: We read the csv file using the pandas read_csv module, and in that, we mentioned the skiprows=[0], which means skip the first line while reading the csv file data. Line 4: Now, we print the final dataframe result shown in the above output without the header row.

How do I skip the first line of a CSV file using pandas?

While calling pandas. read_csv() if we pass skiprows argument with int value, then it will skip those rows from top while reading csv file and initializing a dataframe. For example if we want to skip 2 lines from top while reading users.

Is it necessary to have a line as first line in CSV file?

The first row is only mandatory when the import template has the setting use "Use column headers as configuration" enabled. However having the first row in the CSV file helps knowing what data is in the file.

How do I ignore the first column in Python?

Use drop() to remove first column of pandas dataframe To make sure that it removes the columns only, use argument axis=1 and to make changes in place i.e. in calling dataframe object, pass argument inplace=True.


You could use an instance of the csv module's Sniffer class to deduce the format of a CSV file and detect whether a header row is present along with the built-in next() function to skip over the first row only when necessary:

import csv

with open('all16.csv', 'r', newline='') as file:
    has_header = csv.Sniffer().has_header(file.read(1024))
    file.seek(0)  # Rewind.
    reader = csv.reader(file)
    if has_header:
        next(reader)  # Skip header row.
    column = 1
    datatype = float
    data = (datatype(row[column]) for row in reader)
    least_value = min(data)

print(least_value)

Since datatype and column are hardcoded in your example, it would be slightly faster to process the row like this:

    data = (float(row[1]) for row in reader)

Note: the code above is for Python 3.x. For Python 2.x use the following line to open the file instead of what is shown:

with open('all16.csv', 'rb') as file:

To skip the first line just call:

next(inf)

Files in Python are iterators over lines.


Borrowed from python cookbook,
A more concise template code might look like this:

import csv
with open('stocks.csv') as f:
    f_csv = csv.reader(f) 
    headers = next(f_csv) 
    for row in f_csv:
        # Process row ...

In a similar use case I had to skip annoying lines before the line with my actual column names. This solution worked nicely. Read the file first, then pass the list to csv.DictReader.

with open('all16.csv') as tmp:
    # Skip first line (if any)
    next(tmp, None)

    # {line_num: row}
    data = dict(enumerate(csv.DictReader(tmp)))