Parsing CSV / tab-delimited txt file with Python

Tags:

I currently have a CSV file which, when opened in Excel, has a total of 5 columns. Only columns A and C are of any significance to me and the data in the remaining columns is irrelevant.

Starting on line 8 and then working in multiples of 7 (ie. lines 8, 15, 22, 29, 36 etc...), I am looking to create a dictionary with Python 2.7 with the information from these fields. The data in column A will be the key (a 6-digit integer) and the data in column C being the respective value for the key. I've tried to highlight this below but the formatting isn't the best:-

    A        B      C          D 1                           CDCDCDCD   2                           VDDBDDB 3 4 5 6 7  DDEFEEF                   FEFEFEFE 8  123456         JONES 9 10 11 12 13 14 15 293849         SMITH

As per the above, I am looking to extract the value from A7 (DDEFEEF) as a key in my dictionary and "FEFEFEFE" being the respective data and then add another entry to my dictionary, jumping to line 15 with "2938495" being my key and "Smith" being the respective value.

Any suggestions? The source file is a .txt file with entries being tab-delimited. Thanks

Clarification:

Just to clarify, so far, I have tried the below:-

import csv  mydict = {:} f = open("myfile", 'rt') reader = csv.reader(f)     for row in reader:         print row

The above simply prints out all content though a row at a time. I did try "for row(7) in reader" but this returned an error. I then researched it and had a go at the below but it didn't work neither:

import csv from itertools import islice  entries = csv.reader(open("myfile", 'rb')) mydict = {'key' : 'value'}  for i in xrange(6):     mydict['i(0)] = 'I(2)    # integers representing columns     range = islice(entries,6)     for entry in range:         mydict[entries(0) = entries(2)] # integers representing columns

708

asked Oct 21 '11 23:10

thefragileomen

2 Answers

Start by turning the text into a list of lists. That will take care of the parsing part:

lol = list(csv.reader(open('text.txt', 'rb'), delimiter='\t'))

The rest can be done with indexed lookups:

d = dict() key = lol[6][0]      # cell A7 value = lol[6][3]    # cell D7 d[key] = value       # add the entry to the dictionary  ...

128

answered Sep 18 '22 16:09

Raymond Hettinger

Although there is nothing wrong with the other solutions presented, you could simplify and greatly escalate your solutions by using python's excellent library pandas.

Pandas is a library for handling data in Python, preferred by many Data Scientists.

Pandas has a simplified CSV interface to read and parse files, that can be used to return a list of dictionaries, each containing a single line of the file. The keys will be the column names, and the values will be the ones in each cell.

In your case:

    import pandas      def create_dictionary(filename):         my_data = pandas.DataFrame.from_csv(filename, sep='\t', index_col=False)         # Here you can delete the dataframe columns you don't want!         del my_data['B']         del my_data['D']         # ...         # Now you transform the DataFrame to a list of dictionaries         list_of_dicts = [item for item in my_data.T.to_dict().values()]         return list_of_dicts  # Usage: x = create_dictionary("myfile.csv")

answered Sep 17 '22 16:09

José Fonseca

Related questions
                            
                                list extend() to index, inserting list elements not only to the end
                            
                                Is it possible to get color gradients under curve in matplotlib?
                            
                                Generate SQL statements from a Pandas Dataframe
                            
                                This model has not yet been built error on model.summary()
                            
                                Convert fraction to float?
                            
                                How do I get the username in Python? [duplicate]
                            
                                Python out of memory on large CSV file (numpy)
                            
                                String format a JSON string gives KeyError
                            
                                python how to get BytesIO allocated memory length?
                            
                                XML writing tools for Python
                            
                                Find all the occurrences of a character in a string
                            
                                Is possible to create Column in SQLAlchemy which is going to be automatically populated with time when it inserted/updated last time?
                            
                                Pandas converting String object to lower case and checking for string
                            
                                Determine if a function is available in a Python module
                            
                                How to debug gunicorn failure issues? (Worker failed to boot)
                            
                                What is more 'pythonic' for 'not' [duplicate]
                            
                                Is Python's == an equivalence relation on the floats?
                            
                                How to create a shapely Polygon from a list of shapely Points?
                            
                                How to convert int to float in python?
                            
                                How to list all class properties

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parsing CSV / tab-delimited txt file with Python

Tags:

python

dictionary

parsing

csv

thefragileomen

People also ask

2 Answers

Raymond Hettinger

José Fonseca

Recent Activity

Donate For Us