I promise I searched and read several pages of google before I came to make this post. Due diligence has been done I swear.
I am trying to open a CSV file in python, read the file, make changes to it, and then write out a new file.
I got this far:
import csv
def water_data ():
with open('aquastat.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)
final_file_name = "final_water.data.csv"
final_file = open(final_file_name,'w')
csv_writer = csv.writer(final_file,delimiter="\t")
for row in csv_reader:
csv_writer.writerow(row)
But I'm struggling to get any further. I want to remove certain columns, but I cannot comprehend how python will know the difference between a row and a column. For example, the columns are Area, Area ID, Year, Value, etc. I only want Area, Year, Value. I tried
for row in final_file:
final_file.writerow(row[0] + row[2] + row[4] + row[5])
but I kept getting the following error: IndexError: list index out of range
[I would also like to replace blank cells with a *, but the column thing is the priority]
Note that I cannot use Pandas
If possible I would really appreciate if someone could not just tell me the code but explain it to me so I can figure it out further myself.
TLDR: How can I remove empty rows from the CVS file and write only certain columns into the new file?
INPUT:
"Area","Area Id","Variable Name","Variable Id","Year","Value","Symbol","Md"
"Afghanistan",2,"Total area of the country",4100,1977,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1982,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1987,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1992,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1997,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,2002,65286.0,"E","",""
I have tried to provide you an answer as close as possible than what you have done so far.
Prototype:
import csv
with open('aquastat.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)
final_file_name = "final_water.data.csv"
final_file = open(final_file_name,'w')
csv_writer = csv.writer(final_file,delimiter="\t")
for row in csv_reader:
if len(row) >= 6:
row = [row[0], row[4], row[5]]
csv_writer.writerow(row)
final_file.close()
explanations:
csv_writer.writerow(row) where you output the row in the output csv file. I have added the line row = [row[0], row[4], row[5]] where I overwrite the content of the array row by an array containing only 3 cells, those cells are respectively taken from the Area, Year, Value columnsif len(row) >= 6: to check that you have at least enough elements in your row to extract the columns until Value.input:
"Area","Area Id","Variable Name","Variable Id","Year","Value","Symbol","Md"
"Afghanistan",2,"Total area of the country",4100,1977,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1982,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1987,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1992,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1997,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,2002,65286.0,"E","",""
output:
Area Year Value
Afghanistan 1977 65286.0
Afghanistan 1982 65286.0
Afghanistan 1987 65286.0
Afghanistan 1992 65286.0
Afghanistan 1997 65286.0
Afghanistan 2002 65286.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With