I can access elements of a named tuple by name as follows(*):
from collections import namedtuple Car = namedtuple('Car', 'color mileage') my_car = Car('red', 100) print my_car.color
But how can I use a variable to specify the name of the field I want to access? E.g.
field = 'color' my_car[field] # doesn't work my_car.field # doesn't work
My actual use case is that I'm iterating through a pandas dataframe with for row in data.itertuples()
. I am doing an operation on the value from a particular column, and I want to be able to specify the column to use by name as a parameter to the method containing this loop.
(*) example taken from here. I am using Python 2.7.
The accessing methods of NamedTuple From NamedTuple, we can access the values using indexes, keys and the getattr() method. The attribute values of NamedTuple are ordered. So we can access them using the indexes. The NamedTuple converts the field names as attributes.
namedtuple() is a factory function for tuple subclasses. Here, 'whatsmypurpose' is the type name. When you create a named tuple, a class with this name ( whatsmypurpose ) gets created internally. You can notice this by using the verbose argument like: Point=namedtuple('whatsmypurpose',['x','y'], verbose=True)
Answer: To create a named tuple, import the namedtuple class from the collections module. The constructor takes the name of the named tuple (which is what type() will report), and a string containing the fields names, separated by whitespace. It returns a new namedtuple class for the specified fields.
As namedtuples are a subclass of tuples, the fields can be accessed via the index or by the name of the field. The index value of a field is tied to the order during the declaration of the namedtuple. Consider the above Address example. You can access the street field by name or by using 0 as the index value.
You can use getattr
getattr(my_car, field)
The 'getattr' answer works, but there is another option which is slightly faster.
idx = {name: i for i, name in enumerate(list(df), start=1)} for row in df.itertuples(name=None): example_value = row[idx['product_price']]
Make a dictionary mapping the column names to the row position. Call 'itertuples' with "name=None". Then access the desired values in each tuple using the indexes obtained using the column name from the dictionary.
idx = {name: i for i, name in enumerate(list(df), start=1)}
for row in df.itertuples(name=None): example_value = row[idx['product_price']]
Note: Use start=0
in enumerate
if you call itertuples with index=False
Here is a working example showing both methods and the timing of both methods.
import numpy as np import pandas as pd import timeit data_length = 3 * 10**5 fake_data = { "id_code": list(range(data_length)), "letter_code": np.random.choice(list('abcdefgz'), size=data_length), "pine_cones": np.random.randint(low=1, high=100, size=data_length), "area": np.random.randint(low=1, high=100, size=data_length), "temperature": np.random.randint(low=1, high=100, size=data_length), "elevation": np.random.randint(low=1, high=100, size=data_length), } df = pd.DataFrame(fake_data) def iter_with_idx(): result_data = [] idx = {name: i for i, name in enumerate(list(df), start=1)} for row in df.itertuples(name=None): row_calc = row[idx['pine_cones']] / row[idx['area']] result_data.append(row_calc) return result_data def iter_with_getaatr(): result_data = [] for row in df.itertuples(): row_calc = getattr(row, 'pine_cones') / getattr(row, 'area') result_data.append(row_calc) return result_data dict_idx_method = timeit.timeit(iter_with_idx, number=100) get_attr_method = timeit.timeit(iter_with_getaatr, number=100) print(f'Dictionary index Method {dict_idx_method:0.4f} seconds') print(f'Get attribute method {get_attr_method:0.4f} seconds')
Result:
Dictionary index Method 49.1814 seconds Get attribute method 80.1912 seconds
I assume the difference is due to lower overhead in creating a tuple vs a named tuple and also lower overhead in accessing it by the index rather than getattr but both of those are just guesses. If anyone knows better please comment.
I have not explored how the number of columns vs number of rows effects the timing results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With