My program needs to read csv files which may have 1,2 or 3 columns, and it needs to modify its behaviour accordingly. Is there a simple way to check the number of columns without "consuming" a row before the iterator runs? The following code is the most elegant I could manage, but I would prefer to run the check before the for loop starts:
import csv f = 'testfile.csv' d = '\t' reader = csv.reader(f,delimiter=d) for row in reader: if reader.line_num == 1: fields = len(row) if len(row) != fields: raise CSVError("Number of fields should be %s: %s" % (fields,str(row))) if fields == 1: pass elif fields == 2: pass elif fields == 3: pass else: raise CSVError("Too many columns in input file.")
Edit: I should have included more information about my data. If there is only one field, it must contain a name in scientific notation. If there are two fields, the first must contain a name, and the second a linking code. If there are three fields, the additional field contains a flag which specifies whether the name is currently valid. Therefore if any row has 1, 2 or 3 columns, all must have the same.
All what has left is to simply use wc command to count number of characters. The file has 5 columns. In case you wonder why there are only 4 commas and wc -l returned 5 characters it is because wc also counted \n the carriage return as an extra character.
To get the number of rows, and columns we can use len(df. axes[]) function in Python.
Python3. In this method we will import the csv library and open the file in reading mode, then we will use the DictReader() function to read the data of the CSV file. This function is like a regular reader, but it maps the information to a dictionary whose keys are given by the column names and all the values as keys.
You can use itertools.tee
itertools.tee(iterable[, n=2])
Return n independent iterators from a single iterable.
eg.
reader1, reader2 = itertools.tee(csv.reader(f, delimiter=d)) columns = len(next(reader1)) del reader1 for row in reader2: ...
Note that it's important to delete the reference to reader1
when you are finished with it - otherwise tee
will have to store all the rows in memory in case you ever call next(reader1)
again
This seems to work as well:
import csv datafilename = 'testfile.csv' d = '\t' f = open(datafilename,'r') reader = csv.reader(f,delimiter=d) ncol = len(next(reader)) # Read first line and count columns f.seek(0) # go back to beginning of file for row in reader: pass #do stuff
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With