Is there a possibility to read the header of a CSV file white space and case insensitive? As for now I use csv.dictreader
like this:
import csv
csvDict = csv.DictReader(open('csv-file.csv', 'rU'))
# determine column_A name
if 'column_A' in csvDict.fieldnames:
column_A = 'column_A'
elif ' column_A' in csvDict.fieldnames:
# extra space
column_A = ' column_A'
elif 'Column_A' in csvDict.fieldnames:
# capital A
column_A = 'Column_A'
# get column_A data
for lineDict in csvDict:
print(lineDict[column_A])
As you can see from the code, my csv files sometimes differ in extra white space or capital letters, for example
I want to use something like this:
column_A = ' Column_A'.strip().lower()
print(lineDict[column_A])
Any ideas?
Why go for Case Insensitive CSV DictReader? Using CSV reader we can read data by using column indexes and with DictReader we can read the data by using column names. Using the normal reader if the column indexes change then the data extraction goes wrong, to over come this we'll go for DictReder.
The csv package has a reader () method that we can use to read CSV files. It returns an iterable object that we can traverse to print the contents of the CSV file being read. The time complexity of the above solution is O (n). As we can see, the output shows that the first row is the header and the other rows have the values.
Pandas - Read, skip and customize column headers for read_csv. Pandas read_csv () function automatically parses the header while loading a csv file. It assumes that the top row (rowid = 0) contains the column name information. It is possible to change this default behavior to customize the column names.
I tried to upload a CSV and got an "Invalid Header" error. What should I do next? This error is usually caused by formatting or white space changes in the header of the CSV file you're attempting to upload. You can fix this very quickly by copying the entire header row from our Sample CSV file.
You can redefine reader.fieldnames
:
import csv
import io
content = '''column_A " column_B"
1 2'''
reader = csv.DictReader(io.BytesIO(content), delimiter = ' ')
reader.fieldnames = [field.strip().lower() for field in reader.fieldnames]
for line in reader:
print(line)
yields
{'column_b': '2', 'column_a': '1'}
How about override DictReader.fieldnames
property?
class MyDictReader(DictReader):
@property
def fieldnames(self):
return [field.strip().lower() for field in super(MyDictReader, self).fieldnames]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With