Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing rows and columns in python CSV module

Tags:

python

csv

I promise I searched and read several pages of google before I came to make this post. Due diligence has been done I swear.

I am trying to open a CSV file in python, read the file, make changes to it, and then write out a new file.

I got this far:

import csv
def water_data ():
    with open('aquastat.csv', 'r') as csv_file:
        csv_reader = csv.reader(csv_file)
        final_file_name = "final_water.data.csv"
        final_file = open(final_file_name,'w')
        csv_writer = csv.writer(final_file,delimiter="\t")
        for row in csv_reader:
            csv_writer.writerow(row)

But I'm struggling to get any further. I want to remove certain columns, but I cannot comprehend how python will know the difference between a row and a column. For example, the columns are Area, Area ID, Year, Value, etc. I only want Area, Year, Value. I tried

for row in final_file:

final_file.writerow(row[0] + row[2] + row[4] + row[5])

but I kept getting the following error: IndexError: list index out of range

[I would also like to replace blank cells with a *, but the column thing is the priority]

Note that I cannot use Pandas

If possible I would really appreciate if someone could not just tell me the code but explain it to me so I can figure it out further myself.

TLDR: How can I remove empty rows from the CVS file and write only certain columns into the new file?

INPUT:

"Area","Area Id","Variable Name","Variable Id","Year","Value","Symbol","Md" 
"Afghanistan",2,"Total area of the country",4100,1977,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1982,65286.0,"E","","" 
"Afghanistan",2,"Total area of the country",4100,1987,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1992,65286.0,"E","","" 
"Afghanistan",2,"Total area of the country",4100,1997,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,2002,65286.0,"E","",""
like image 775
Ashley F Avatar asked Feb 05 '26 08:02

Ashley F


1 Answers

I have tried to provide you an answer as close as possible than what you have done so far.

Prototype:

import csv

with open('aquastat.csv', 'r') as csv_file:
  csv_reader = csv.reader(csv_file)
  final_file_name = "final_water.data.csv"
  final_file = open(final_file_name,'w')
  csv_writer = csv.writer(final_file,delimiter="\t")
  for row in csv_reader:
    if len(row) >= 6:
        row = [row[0], row[4], row[5]]
        csv_writer.writerow(row)
  final_file.close()

explanations:

  • Before the line csv_writer.writerow(row) where you output the row in the output csv file. I have added the line row = [row[0], row[4], row[5]] where I overwrite the content of the array row by an array containing only 3 cells, those cells are respectively taken from the Area, Year, Value columns
  • On top of this, I have added a the if condition if len(row) >= 6: to check that you have at least enough elements in your row to extract the columns until Value.

input:

"Area","Area Id","Variable Name","Variable Id","Year","Value","Symbol","Md"
"Afghanistan",2,"Total area of the country",4100,1977,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1982,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1987,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1992,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1997,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,2002,65286.0,"E","",""

output:

Area    Year    Value
Afghanistan     1977    65286.0
Afghanistan     1982    65286.0
Afghanistan     1987    65286.0
Afghanistan     1992    65286.0
Afghanistan     1997    65286.0
Afghanistan     2002    65286.0
like image 155
Allan Avatar answered Feb 07 '26 01:02

Allan