Can I import a CSV file and automatically infer the delimiter?

Tags:

I want to import two kinds of CSV files, some use ";" for delimiter and others use ",". So far I have been switching between the next two lines:

reader=csv.reader(f,delimiter=';')

reader=csv.reader(f,delimiter=',')

Is it possible not to specify the delimiter and to let the program check for the right delimiter?

The solutions below (Blender and sharth) seem to work well for comma-separated files (generated with Libroffice) but not for semicolon-separated files (generated with MS Office). Here are the first lines of one semicolon-separated file:

ReleveAnnee;ReleveMois;NoOrdre;TitreRMC;AdopCSRegleVote;AdopCSAbs;AdoptCSContre;NoCELEX;ProposAnnee;ProposChrono;ProposOrigine;NoUniqueAnnee;NoUniqueType;NoUniqueChrono;PropoSplittee;Suite2LecturePE;Council PATH;Notes 1999;1;1;1999/83/EC: Council Decision of 18 January 1999 authorising the Kingdom of Denmark to apply or to continue to apply reductions in, or exemptions from, excise duties on certain mineral oils used for specific purposes, in accordance with the procedure provided for in Article 8(4) of Directive 92/81/EEC;U;;;31999D0083;1998;577;COM;NULL;CS;NULL;;;;Propos* are missing on Celex document 1999;1;2;1999/81/EC: Council Decision of 18 January 1999 authorising the Kingdom of Spain to apply a measure derogating from Articles 2 and 28a(1) of the Sixth Directive (77/388/EEC) on the harmonisation of the laws of the Member States relating to turnover taxes;U;;;31999D0081;1998;184;COM;NULL;CS;NULL;;;;Propos* are missing on Celex document

275

asked May 01 '13 02:05

rom

1 Answers

The csv module seems to recommend using the csv sniffer for this problem.

They give the following example, which I've adapted for your case.

with open('example.csv', 'rb') as csvfile:  # python 3: 'r',newline=""     dialect = csv.Sniffer().sniff(csvfile.read(1024), delimiters=";,")     csvfile.seek(0)     reader = csv.reader(csvfile, dialect)     # ... process CSV file contents here ...

Let's try it out.

[9:13am][wlynch@watermelon /tmp] cat example  #!/usr/bin/env python import csv  def parse(filename):     with open(filename, 'rb') as csvfile:         dialect = csv.Sniffer().sniff(csvfile.read(), delimiters=';,')         csvfile.seek(0)         reader = csv.reader(csvfile, dialect)          for line in reader:             print line  def main():     print 'Comma Version:'     parse('comma_separated.csv')      print     print 'Semicolon Version:'     parse('semicolon_separated.csv')      print     print 'An example from the question (kingdom.csv)'     parse('kingdom.csv')  if __name__ == '__main__':     main()

And our sample inputs

[9:13am][wlynch@watermelon /tmp] cat comma_separated.csv  test,box,foo round,the,bend  [9:13am][wlynch@watermelon /tmp] cat semicolon_separated.csv  round;the;bend who;are;you  [9:22am][wlynch@watermelon /tmp] cat kingdom.csv  ReleveAnnee;ReleveMois;NoOrdre;TitreRMC;AdopCSRegleVote;AdopCSAbs;AdoptCSContre;NoCELEX;ProposAnnee;ProposChrono;ProposOrigine;NoUniqueAnnee;NoUniqueType;NoUniqueChrono;PropoSplittee;Suite2LecturePE;Council PATH;Notes 1999;1;1;1999/83/EC: Council Decision of 18 January 1999 authorising the Kingdom of Denmark to apply or to continue to apply reductions in, or exemptions from, excise duties on certain mineral oils used for specific purposes, in accordance with the procedure provided for in Article 8(4) of Directive 92/81/EEC;U;;;31999D0083;1998;577;COM;NULL;CS;NULL;;;;Propos* are missing on Celex document 1999;1;2;1999/81/EC: Council Decision of 18 January 1999 authorising the Kingdom of Spain to apply a measure derogating from Articles 2 and 28a(1) of the Sixth Directive (77/388/EEC) on the harmonisation of the laws of the Member States relating to turnover taxes;U;;;31999D0081;1998;184;COM;NULL;CS;NULL;;;;Propos* are missing on Celex document

And if we execute the example program:

[9:14am][wlynch@watermelon /tmp] ./example  Comma Version: ['test', 'box', 'foo'] ['round', 'the', 'bend']  Semicolon Version: ['round', 'the', 'bend'] ['who', 'are', 'you']  An example from the question (kingdom.csv) ['ReleveAnnee', 'ReleveMois', 'NoOrdre', 'TitreRMC', 'AdopCSRegleVote', 'AdopCSAbs', 'AdoptCSContre', 'NoCELEX', 'ProposAnnee', 'ProposChrono', 'ProposOrigine', 'NoUniqueAnnee', 'NoUniqueType', 'NoUniqueChrono', 'PropoSplittee', 'Suite2LecturePE', 'Council PATH', 'Notes'] ['1999', '1', '1', '1999/83/EC: Council Decision of 18 January 1999 authorising the Kingdom of Denmark to apply or to continue to apply reductions in, or exemptions from, excise duties on certain mineral oils used for specific purposes, in accordance with the procedure provided for in Article 8(4) of Directive 92/81/EEC', 'U', '', '', '31999D0083', '1998', '577', 'COM', 'NULL', 'CS', 'NULL', '', '', '', 'Propos* are missing on Celex document'] ['1999', '1', '2', '1999/81/EC: Council Decision of 18 January 1999 authorising the Kingdom of Spain to apply a measure derogating from Articles 2 and 28a(1) of the Sixth Directive (77/388/EEC) on the harmonisation of the laws of the Member States relating to turnover taxes', 'U', '', '', '31999D0081', '1998', '184', 'COM', 'NULL', 'CS', 'NULL', '', '', '', 'Propos* are missing on Celex document']

It's also probably worth noting what version of python I'm using.

[9:20am][wlynch@watermelon /tmp] python -V Python 2.7.2

114

answered Sep 28 '22 11:09

Bill Lynch

Related questions
                            
                                How to set timestamps on GMT/UTC on Python logging?
                            
                                How to convert Counter object to dict?
                            
                                Django Rest Framework - Get related model field in serializer
                            
                                Python logging before you run logging.basicConfig?
                            
                                Python List of np arrays to array
                            
                                Combine (join) networkx Graphs
                            
                                Is there a head and tail method for Numpy array?
                            
                                What is the best way to write the contents of a StringIO to a file?
                            
                                What is the difference between an 'sdist' .tar.gz distribution and an python egg?
                            
                                Inverse Cosine in Python
                            
                                Unexpected '{' in field name when doing string formatting
                            
                                pandas - find first occurrence
                            
                                Generator functions equivalent in Java
                            
                                Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
                            
                                Creating a Colormap Legend in Matplotlib
                            
                                How Do I Keep Python Code Under 80 Chars Without Making It Ugly?
                            
                                How can I use functools.singledispatch with instance methods?
                            
                                ImportError: No module named 'spacy.en'
                            
                                Calling base class method in Python
                            
                                jinja2.exceptions.TemplateNotFound error [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can I import a CSV file and automatically infer the delimiter?

Tags:

python

file

import

csv

delimiter

rom

People also ask

1 Answers

Bill Lynch

Recent Activity

Donate For Us