Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to access a field of a namedtuple using a variable for the field name?

I can access elements of a named tuple by name as follows(*):

from collections import namedtuple Car = namedtuple('Car', 'color mileage') my_car = Car('red', 100) print my_car.color 

But how can I use a variable to specify the name of the field I want to access? E.g.

field = 'color' my_car[field] # doesn't work my_car.field # doesn't work 

My actual use case is that I'm iterating through a pandas dataframe with for row in data.itertuples(). I am doing an operation on the value from a particular column, and I want to be able to specify the column to use by name as a parameter to the method containing this loop.

(*) example taken from here. I am using Python 2.7.

like image 273
LangeHaare Avatar asked Jun 19 '17 15:06

LangeHaare


People also ask

How do you access fields in a Namedtuple?

The accessing methods of NamedTuple From NamedTuple, we can access the values using indexes, keys and the getattr() method. The attribute values of NamedTuple are ordered. So we can access them using the indexes. The NamedTuple converts the field names as attributes.

What is Typename in Namedtuple?

namedtuple() is a factory function for tuple subclasses. Here, 'whatsmypurpose' is the type name. When you create a named tuple, a class with this name ( whatsmypurpose ) gets created internally. You can notice this by using the verbose argument like: Point=namedtuple('whatsmypurpose',['x','y'], verbose=True)

Which of the following is used to get named entries into a tuple?

Answer: To create a named tuple, import the namedtuple class from the collections module. The constructor takes the name of the named tuple (which is what type() will report), and a string containing the fields names, separated by whitespace. It returns a new namedtuple class for the specified fields.

Can you subclass Namedtuple?

As namedtuples are a subclass of tuples, the fields can be accessed via the index or by the name of the field. The index value of a field is tied to the order during the declaration of the namedtuple. Consider the above Address example. You can access the street field by name or by using 0 as the index value.


2 Answers

You can use getattr

getattr(my_car, field) 
like image 132
juanpa.arrivillaga Avatar answered Oct 13 '22 05:10

juanpa.arrivillaga


The 'getattr' answer works, but there is another option which is slightly faster.

idx = {name: i for i, name in enumerate(list(df), start=1)} for row in df.itertuples(name=None):    example_value = row[idx['product_price']] 

Explanation

Make a dictionary mapping the column names to the row position. Call 'itertuples' with "name=None". Then access the desired values in each tuple using the indexes obtained using the column name from the dictionary.

  1. Make a dictionary to find the indexes.

idx = {name: i for i, name in enumerate(list(df), start=1)}

  1. Use the dictionary to access the desired values by name in the row tuples
for row in df.itertuples(name=None):    example_value = row[idx['product_price']] 

Note: Use start=0 in enumerate if you call itertuples with index=False

Here is a working example showing both methods and the timing of both methods.

import numpy as np import pandas as pd import timeit  data_length = 3 * 10**5 fake_data = {     "id_code": list(range(data_length)),     "letter_code": np.random.choice(list('abcdefgz'), size=data_length),     "pine_cones": np.random.randint(low=1, high=100, size=data_length),     "area": np.random.randint(low=1, high=100, size=data_length),     "temperature": np.random.randint(low=1, high=100, size=data_length),     "elevation": np.random.randint(low=1, high=100, size=data_length), } df = pd.DataFrame(fake_data)   def iter_with_idx():     result_data = []          idx = {name: i for i, name in enumerate(list(df), start=1)}          for row in df.itertuples(name=None):                  row_calc = row[idx['pine_cones']] / row[idx['area']]         result_data.append(row_calc)              return result_data         def iter_with_getaatr():          result_data = []     for row in df.itertuples():         row_calc = getattr(row, 'pine_cones') / getattr(row, 'area')         result_data.append(row_calc)              return result_data       dict_idx_method = timeit.timeit(iter_with_idx, number=100) get_attr_method = timeit.timeit(iter_with_getaatr, number=100)  print(f'Dictionary index Method {dict_idx_method:0.4f} seconds') print(f'Get attribute method {get_attr_method:0.4f} seconds') 

Result:

Dictionary index Method 49.1814 seconds Get attribute method 80.1912 seconds 

I assume the difference is due to lower overhead in creating a tuple vs a named tuple and also lower overhead in accessing it by the index rather than getattr but both of those are just guesses. If anyone knows better please comment.

I have not explored how the number of columns vs number of rows effects the timing results.

like image 25
Mint Avatar answered Oct 13 '22 05:10

Mint