I have a text file containing tabular data. What I need to do is automate the task of writing to a new text file that is comma delimited instead of space delimited, extract a few columns from existing data, reorder the columns.
This is a snippet of the first 4 lines of the original data:
Number of rows: 8542 Algorithm |Date |Time |Longitude |Latitude |Country 1 2000-01-03 215926.688 -0.262 35.813 Algeria 1 2000-01-03 215926.828 -0.284 35.817 Algeria
Here is what I want in the end:
Longitude,Latitude,Country,Date,Time -0.262,35.813,Algeria,2000-01-03,215926.688
Any tips on how to approach this?
I guess the file is separated by tabs, not spaces.
If so, you can try something like:
input_file = open('some_tab_separated_file.txt', 'r')
output_file = open('some_tab_separated_file.csv', 'w')
input_file.readline() # skip first line
for line in input_file:
(a, date, time, lon, lat, country) = line.strip().split('\t')
output_file.write(','.join([lon, lat, country, date, time]) + '\n')
input_file.close()
output_file.close()
This code is untested, any bug is left for you as exercise.
You could use the csv
module and a reader with the ' '
delimiter to read your data in, and use the a writer from the same module (with a comma delimiter) to produce the output.
In fact, the first example in the csv
module documentation uses delimiter=' '
.
You can use a DictReader
/DictWriter
and specify the order of the columns in its constructor (fieldnames
list: different for reader/writer if you want to re-order) to output the entries in the order you wish.
(You may need to skip/ignore your first two rows when producing the output.)
EDIT:
Here is an example for dealing with multi-word country names:
import cStringIO
import csv
f = cStringIO.StringIO("""A B C
1 2 Costa Rica
3 4 Democratic Republic of the Congo
""")
r = csv.DictReader(f, delimiter=' ', restkey='rest')
for row in r:
if row.get('rest'):
row['C'] += " %s" % (" ".join(row['rest']))
print 'A: %s, B: %s, C: %s' % (row['A'], row['B'], row['C'])
Use the restkey=
and concatenate the dict entry for that value, which is a list of what's left over (here restkey='rest'
). This prints:
A: 1, B: 2, C: Costa Rica
A: 3, B: 4, C: Democratic Republic of the Congo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With