Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

renaming the header when using dictreader

Tags:

python

csv

header

I'm looking for the best way to rename my header using dictreader / dictwriter to add to my other steps already done.

This is what I am trying to do to the Source data example below.

  1. Remove the first 2 lines
  2. Reorder the columns (header & data) to 2, 1, 3 vs the source file
  3. Rename the header to ASXCode, CompanyName, GISC

When I'm at

If I use 'reader = csv.reader.inf' the first lines are removed and columns reordered but as expected no header rename

Alternately when I run the dictreader line 'reader = csv.DictReader(inf, fieldnames=('ASXCode', 'CompanyName', 'GICS'))' I receive the error 'dict contains fields not in fieldnames:' and shows the first row of data rather than the header.

I'm a bit stuck on how I get around this so any tips appreciated.

Source Data example

ASX listed companies as at Mon May 16 17:01:04 EST 2016     

Company name    ASX code    GICS industry group
1-PAGE LIMITED  1PG Software & Services
1300 SMILES LIMITED ONT Health Care Equipment & Services
1ST AVAILABLE LTD   1ST Health Care Equipment & Services

My Code

import csv
import urllib.request
from itertools import islice

local_filename = "C:\\myfile.csv"
url = ('http://mysite/afile.csv')

temp_filename, headers = urllib.request.urlretrieve(url)

with open(temp_filename, 'r', newline='') as inf, \
        open(local_filename, 'w', newline='') as outf:

  #  reader = csv.DictReader(inf, fieldnames=('ASXCode', 'CompanyName', 'GICS'))
    reader = csv.reader(inf)
    fieldnames = ['ASX code', 'Company name', 'GICS industry group']  
    writer = csv.DictWriter(outf, fieldnames=fieldnames)

# 1. Remove top 2 rows
    next(islice(reader, 2, 2), None)

# 2. Reorder Columns
    writer.writeheader()  
    for row in csv.DictReader(inf):
        writer.writerow(row)        
like image 527
bassmann Avatar asked Oct 31 '22 03:10

bassmann


1 Answers

IIUC here is a solution using pandas and its function read_csv:

import pandas as pd
#Considering that you have your data in a file called 'stock.txt' 
#and it is tab separated, by default the blank lines are not read by read_csv, 
#hence set the header=1
df = pd.read_csv('stock.txt', sep='\t',header=1)
#Rename the columns as required
df.columns= ['CompanyName', 'ASXCode', 'GICS']
#Reorder the columns as required
df = df[['ASXCode','CompanyName','GICS']]

And this is how you would do it in ipython and the output would look like: enter image description here

like image 121
Abbas Avatar answered Nov 15 '22 05:11

Abbas