Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading csv header white space and case insensitive

Is there a possibility to read the header of a CSV file white space and case insensitive? As for now I use csv.dictreader like this:

import csv
csvDict = csv.DictReader(open('csv-file.csv', 'rU'))

# determine column_A name
if 'column_A' in csvDict.fieldnames:
    column_A = 'column_A'
elif ' column_A' in csvDict.fieldnames:
    # extra space
    column_A = ' column_A'
elif 'Column_A' in csvDict.fieldnames:
    # capital A
    column_A = 'Column_A'

# get column_A data
for lineDict in csvDict:
    print(lineDict[column_A])

As you can see from the code, my csv files sometimes differ in extra white space or capital letters, for example

  • "column_A"
  • " column_A"
  • "Column_A"
  • " Column_A"
  • ...

I want to use something like this:

    column_A = ' Column_A'.strip().lower()
    print(lineDict[column_A])

Any ideas?

like image 565
user1251007 Avatar asked Oct 17 '12 12:10

user1251007


People also ask

Why go for case insensitive CSV dictreader?

Why go for Case Insensitive CSV DictReader? Using CSV reader we can read data by using column indexes and with DictReader we can read the data by using column names. Using the normal reader if the column indexes change then the data extraction goes wrong, to over come this we'll go for DictReder.

How to read the contents of a CSV file?

The csv package has a reader () method that we can use to read CSV files. It returns an iterable object that we can traverse to print the contents of the CSV file being read. The time complexity of the above solution is O (n). As we can see, the output shows that the first row is the header and the other rows have the values.

How do I customize column headers in pandas read_CSV?

Pandas - Read, skip and customize column headers for read_csv. Pandas read_csv () function automatically parses the header while loading a csv file. It assumes that the top row (rowid = 0) contains the column name information. It is possible to change this default behavior to customize the column names.

Why did I get an invalid header error when uploading a CSV?

I tried to upload a CSV and got an "Invalid Header" error. What should I do next? This error is usually caused by formatting or white space changes in the header of the CSV file you're attempting to upload. You can fix this very quickly by copying the entire header row from our Sample CSV file.


Video Answer


2 Answers

You can redefine reader.fieldnames:

import csv
import io

content = '''column_A " column_B"
1 2'''
reader = csv.DictReader(io.BytesIO(content), delimiter = ' ')
reader.fieldnames = [field.strip().lower() for field in reader.fieldnames]
for line in reader:
    print(line)

yields

{'column_b': '2', 'column_a': '1'}
like image 97
unutbu Avatar answered Sep 29 '22 23:09

unutbu


How about override DictReader.fieldnames property?

class MyDictReader(DictReader):

    @property
    def fieldnames(self):
        return [field.strip().lower() for field in super(MyDictReader, self).fieldnames]
like image 28
defuz Avatar answered Sep 30 '22 00:09

defuz