I want to import two kinds of CSV files, some use ";" for delimiter and others use ",". So far I have been switching between the next two lines:
reader=csv.reader(f,delimiter=';')
or
reader=csv.reader(f,delimiter=',')
Is it possible not to specify the delimiter and to let the program check for the right delimiter?
The solutions below (Blender and sharth) seem to work well for comma-separated files (generated with Libroffice) but not for semicolon-separated files (generated with MS Office). Here are the first lines of one semicolon-separated file:
ReleveAnnee;ReleveMois;NoOrdre;TitreRMC;AdopCSRegleVote;AdopCSAbs;AdoptCSContre;NoCELEX;ProposAnnee;ProposChrono;ProposOrigine;NoUniqueAnnee;NoUniqueType;NoUniqueChrono;PropoSplittee;Suite2LecturePE;Council PATH;Notes 1999;1;1;1999/83/EC: Council Decision of 18 January 1999 authorising the Kingdom of Denmark to apply or to continue to apply reductions in, or exemptions from, excise duties on certain mineral oils used for specific purposes, in accordance with the procedure provided for in Article 8(4) of Directive 92/81/EEC;U;;;31999D0083;1998;577;COM;NULL;CS;NULL;;;;Propos* are missing on Celex document 1999;1;2;1999/81/EC: Council Decision of 18 January 1999 authorising the Kingdom of Spain to apply a measure derogating from Articles 2 and 28a(1) of the Sixth Directive (77/388/EEC) on the harmonisation of the laws of the Member States relating to turnover taxes;U;;;31999D0081;1998;184;COM;NULL;CS;NULL;;;;Propos* are missing on Celex document
Mac/Windows Select the CSV file that has the data clustered into one column. Select Delimited, then make sure the File Origin is Unicode UTF-8. Select Comma (this is Affinity's default list separator). The preview will show the columns being separated.
A CSV file stores data in rows and the values in each row is separated with a separator, also known as a delimiter. Although the file is defined as Comma Separated Values, the delimiter could be anything.
csv file format has data items separated by a delimiter other than a comma. This includes semicolon, colon, tab space, vertical bars, etc. In such cases, we need to use the sep parameter inside the read. csv() function.
The csv
module seems to recommend using the csv sniffer for this problem.
They give the following example, which I've adapted for your case.
with open('example.csv', 'rb') as csvfile: # python 3: 'r',newline="" dialect = csv.Sniffer().sniff(csvfile.read(1024), delimiters=";,") csvfile.seek(0) reader = csv.reader(csvfile, dialect) # ... process CSV file contents here ...
Let's try it out.
[9:13am][wlynch@watermelon /tmp] cat example #!/usr/bin/env python import csv def parse(filename): with open(filename, 'rb') as csvfile: dialect = csv.Sniffer().sniff(csvfile.read(), delimiters=';,') csvfile.seek(0) reader = csv.reader(csvfile, dialect) for line in reader: print line def main(): print 'Comma Version:' parse('comma_separated.csv') print print 'Semicolon Version:' parse('semicolon_separated.csv') print print 'An example from the question (kingdom.csv)' parse('kingdom.csv') if __name__ == '__main__': main()
And our sample inputs
[9:13am][wlynch@watermelon /tmp] cat comma_separated.csv test,box,foo round,the,bend [9:13am][wlynch@watermelon /tmp] cat semicolon_separated.csv round;the;bend who;are;you [9:22am][wlynch@watermelon /tmp] cat kingdom.csv ReleveAnnee;ReleveMois;NoOrdre;TitreRMC;AdopCSRegleVote;AdopCSAbs;AdoptCSContre;NoCELEX;ProposAnnee;ProposChrono;ProposOrigine;NoUniqueAnnee;NoUniqueType;NoUniqueChrono;PropoSplittee;Suite2LecturePE;Council PATH;Notes 1999;1;1;1999/83/EC: Council Decision of 18 January 1999 authorising the Kingdom of Denmark to apply or to continue to apply reductions in, or exemptions from, excise duties on certain mineral oils used for specific purposes, in accordance with the procedure provided for in Article 8(4) of Directive 92/81/EEC;U;;;31999D0083;1998;577;COM;NULL;CS;NULL;;;;Propos* are missing on Celex document 1999;1;2;1999/81/EC: Council Decision of 18 January 1999 authorising the Kingdom of Spain to apply a measure derogating from Articles 2 and 28a(1) of the Sixth Directive (77/388/EEC) on the harmonisation of the laws of the Member States relating to turnover taxes;U;;;31999D0081;1998;184;COM;NULL;CS;NULL;;;;Propos* are missing on Celex document
And if we execute the example program:
[9:14am][wlynch@watermelon /tmp] ./example Comma Version: ['test', 'box', 'foo'] ['round', 'the', 'bend'] Semicolon Version: ['round', 'the', 'bend'] ['who', 'are', 'you'] An example from the question (kingdom.csv) ['ReleveAnnee', 'ReleveMois', 'NoOrdre', 'TitreRMC', 'AdopCSRegleVote', 'AdopCSAbs', 'AdoptCSContre', 'NoCELEX', 'ProposAnnee', 'ProposChrono', 'ProposOrigine', 'NoUniqueAnnee', 'NoUniqueType', 'NoUniqueChrono', 'PropoSplittee', 'Suite2LecturePE', 'Council PATH', 'Notes'] ['1999', '1', '1', '1999/83/EC: Council Decision of 18 January 1999 authorising the Kingdom of Denmark to apply or to continue to apply reductions in, or exemptions from, excise duties on certain mineral oils used for specific purposes, in accordance with the procedure provided for in Article 8(4) of Directive 92/81/EEC', 'U', '', '', '31999D0083', '1998', '577', 'COM', 'NULL', 'CS', 'NULL', '', '', '', 'Propos* are missing on Celex document'] ['1999', '1', '2', '1999/81/EC: Council Decision of 18 January 1999 authorising the Kingdom of Spain to apply a measure derogating from Articles 2 and 28a(1) of the Sixth Directive (77/388/EEC) on the harmonisation of the laws of the Member States relating to turnover taxes', 'U', '', '', '31999D0081', '1998', '184', 'COM', 'NULL', 'CS', 'NULL', '', '', '', 'Propos* are missing on Celex document']
It's also probably worth noting what version of python I'm using.
[9:20am][wlynch@watermelon /tmp] python -V Python 2.7.2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With