Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find number of columns in csv file

Tags:

python

csv

My program needs to read csv files which may have 1,2 or 3 columns, and it needs to modify its behaviour accordingly. Is there a simple way to check the number of columns without "consuming" a row before the iterator runs? The following code is the most elegant I could manage, but I would prefer to run the check before the for loop starts:

import csv f = 'testfile.csv' d = '\t'  reader = csv.reader(f,delimiter=d) for row in reader:     if reader.line_num == 1: fields = len(row)     if len(row) != fields:         raise CSVError("Number of fields should be %s: %s" % (fields,str(row)))     if fields == 1:         pass     elif fields == 2:         pass     elif fields == 3:         pass     else:         raise CSVError("Too many columns in input file.") 

Edit: I should have included more information about my data. If there is only one field, it must contain a name in scientific notation. If there are two fields, the first must contain a name, and the second a linking code. If there are three fields, the additional field contains a flag which specifies whether the name is currently valid. Therefore if any row has 1, 2 or 3 columns, all must have the same.

like image 905
rudivonstaden Avatar asked Jul 03 '12 11:07

rudivonstaden


People also ask

How do I count the number of columns in a CSV file?

All what has left is to simply use wc command to count number of characters. The file has 5 columns. In case you wonder why there are only 4 commas and wc -l returned 5 characters it is because wc also counted \n the carriage return as an extra character.

How do I count the number of rows and columns in a CSV file in Python?

To get the number of rows, and columns we can use len(df. axes[]) function in Python.

How do I see all columns in a CSV file in Python?

Python3. In this method we will import the csv library and open the file in reading mode, then we will use the DictReader() function to read the data of the CSV file. This function is like a regular reader, but it maps the information to a dictionary whose keys are given by the column names and all the values as keys.


2 Answers

You can use itertools.tee

itertools.tee(iterable[, n=2])
Return n independent iterators from a single iterable.

eg.

reader1, reader2 = itertools.tee(csv.reader(f, delimiter=d)) columns = len(next(reader1)) del reader1 for row in reader2:     ... 

Note that it's important to delete the reference to reader1 when you are finished with it - otherwise tee will have to store all the rows in memory in case you ever call next(reader1) again

like image 187
John La Rooy Avatar answered Sep 22 '22 15:09

John La Rooy


This seems to work as well:

import csv  datafilename = 'testfile.csv' d = '\t' f = open(datafilename,'r')  reader = csv.reader(f,delimiter=d) ncol = len(next(reader)) # Read first line and count columns f.seek(0)              # go back to beginning of file for row in reader:     pass #do stuff 
like image 42
mgilson Avatar answered Sep 18 '22 15:09

mgilson