Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

converting a space delimited file to a CSV

Tags:

python

I have a text file containing tabular data. What I need to do is automate the task of writing to a new text file that is comma delimited instead of space delimited, extract a few columns from existing data, reorder the columns.

This is a snippet of the first 4 lines of the original data:

Number of rows: 8542
 Algorithm  |Date   |Time   |Longitude  |Latitude   |Country    
 1  2000-01-03  215926.688  -0.262  35.813  Algeria 
 1  2000-01-03  215926.828  -0.284  35.817  Algeria

Here is what I want in the end:

Longitude,Latitude,Country,Date,Time
-0.262,35.813,Algeria,2000-01-03,215926.688

Any tips on how to approach this?

like image 265
MLau Avatar asked Jan 14 '12 00:01

MLau


2 Answers

I guess the file is separated by tabs, not spaces.

If so, you can try something like:

input_file = open('some_tab_separated_file.txt', 'r')
output_file = open('some_tab_separated_file.csv', 'w')
input_file.readline() # skip first line 
for line in input_file:
    (a, date, time, lon, lat, country) = line.strip().split('\t')
    output_file.write(','.join([lon, lat, country, date, time]) + '\n')
input_file.close()
output_file.close()

This code is untested, any bug is left for you as exercise.

like image 158
Paulo Scardine Avatar answered Sep 30 '22 10:09

Paulo Scardine


You could use the csv module and a reader with the ' ' delimiter to read your data in, and use the a writer from the same module (with a comma delimiter) to produce the output.

In fact, the first example in the csv module documentation uses delimiter=' '.

You can use a DictReader/DictWriter and specify the order of the columns in its constructor (fieldnames list: different for reader/writer if you want to re-order) to output the entries in the order you wish.

(You may need to skip/ignore your first two rows when producing the output.)

EDIT:

Here is an example for dealing with multi-word country names:

import cStringIO
import csv

f = cStringIO.StringIO("""A B C
1 2 Costa Rica
3 4 Democratic Republic of the Congo
""")

r = csv.DictReader(f, delimiter=' ', restkey='rest')
for row in r:
    if row.get('rest'):
        row['C'] += " %s" % (" ".join(row['rest']))
    print 'A: %s, B: %s, C: %s' % (row['A'], row['B'], row['C'])

Use the restkey= and concatenate the dict entry for that value, which is a list of what's left over (here restkey='rest'). This prints:

A: 1, B: 2, C: Costa Rica
A: 3, B: 4, C: Democratic Republic of the Congo
like image 39
Bruno Avatar answered Sep 30 '22 11:09

Bruno