Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas select rows based on a function of a column

Tags:

python

pandas

I am trying to learn Pandas. I have found several examples on how to construct a pandas dataframe and how to add columns, they work nicely. I would like to learn to select all rows based on a value of a column. I have found multiple examples on how to perform selection if a value of a column should be smaller or greater than a certain number, that also works. My question is how to do a more general selection, where I would like to first compute a function of a column, then select all rows for which the value of a function would be greater or smaller than a certain number

import names
import numpy as np
import pandas as pd
from datetime import date
import random

def randomBirthday(startyear, endyear):
    T1 = date.today().replace(day=1, month=1, year=startyear).toordinal()
    T2 = date.today().replace(day=1, month=1, year=endyear).toordinal()
    return date.fromordinal(random.randint(T1, T2))

def age(birthday):
    today = date.today()
    return today.year - birthday.year - ((today.month, today.day) < (birthday.month, birthday.day))

N_PEOPLE = 20
dict_people = { }
dict_people['gender'] = np.array(['male','female'])[np.random.randint(0, 2, N_PEOPLE)]
dict_people['names'] = [names.get_full_name(gender=g) for g in dict_people['gender']]

peopleFrame = pd.DataFrame(dict_people)

# Example 1: Add new columns to the data frame
peopleFrame['birthday'] = [randomBirthday(1920, 2020) for i in range(N_PEOPLE)]

# Example 2: Select all people with a certain age
peopleFrame.loc[age(peopleFrame['birthday']) >= 20]

This code works except for the last line. Please suggest what is the correct way to write this line. I have considered adding an extra column with the value of the function age, and then selecting based on its value. That would work. But I am wondering if I have to do it. What if I don't want to store the age of a person, only use it for selection

like image 976
Aleksejs Fomins Avatar asked Mar 04 '23 19:03

Aleksejs Fomins


1 Answers

Use Series.apply:

peopleFrame.loc[peopleFrame['birthday'].apply(age) >= 20]
like image 161
jezrael Avatar answered Mar 13 '23 03:03

jezrael