How would I extract specific data from a csv file, based on the header in python? For example, say the csv file contained this information:
Height,Weight,Age
6.0,78,25
How could I retrieve just the age in python?
Step 1: In order to read rows in Python, First, we need to load the CSV file in one object. So to load the csv file into an object use open() method. Step 2: Create a reader object by passing the above-created file object to the reader function. Step 3: Use for loop on reader object to get each row.
I second the csv
recommendation, but I think here using csv.DictReader
would be simpler:
(Python 2):
>>> import csv
>>> with open("hwa.csv", "rb") as fp:
... reader = csv.DictReader(fp)
... data = next(reader)
...
>>> data
{'Age': '25', 'Weight': '78', 'Height': '6.0'}
>>> data["Age"]
'25'
>>> float(data["Age"])
25.0
Here I've used next
just to get the first row, but you could loop over the rows and/or extract a full column of information if you liked.
The process to follow is: read in the first line, find the index (location) on that line of the data you're looking for, then use that index to pull the data out of the remaining lines.
Python offers a very helpful csv.reader
class for doing all the reading, so it's quite simple.
import csv
filename = 'yourfilenamehere'
column = 'Age'
data = [] # This will contain our data
# Create a csv reader object to iterate through the file
reader = csv.reader( open( filename, 'rU'), delimiter=',', dialect='excel')
hrow = reader.next() # Get the top row
idx = hrow.index(column) # Find the column of the data you're looking for
for row in reader: # Iterate the remaining rows
data.append( row[idx] )
print data
Note that the values will come out as strings. You can convert to int by wrapping the row[idx]
e.g. data.append( int( row[idx] ) )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With