Multiple results for each individual row (one-to-many) with Pandas

Tags:

pandas

If I have a DataFrame where each row is an individual and each column individual attributes, how can I get a new DataFrame which maps each individual to multiple results?

I've tried doing this with DataFrame.apply() which seems the most intuitive – but it gives exceptions as per the example below. Adding broadcast=False or reduce=False does not help.

^{Below is a trivial example, obviously, but consider any scenario where each row maps to multiple rows. What's the best way to handle that? In reality, each row can map to a different number of results. This is basically computing a one-to-many relationship.}

Example: I have a DataFrame dataset with the following structure where I want to, for each individual, get the three upcoming birthdays (trivial example, I know). So, from:

+---+-------+------------+
|   | name  | birthdate  |
+---+-------+------------+
| 1 | John  | 1990-01-01 |
| 2 | Jane  | 1957-04-03 |
| 3 | Max   | 1987-02-03 |
| 4 | David | 1964-02-12 |
+---+-------+------------+

to something like:

+-------+------------+
| name  | birthday   |
+-------+------------+
| John  | 2016-01-01 |
| John  | 2017-01-01 |
| John  | 2018-01-01 |
| Jane  | 2016-04-03 |
| Jane  | 2017-04-03 |
| Jane  | 2018-04-03 |
| Max   | 2016-02-03 |
| Max   | 2017-02-03 |
| Max   | 2018-02-03 |
| David | 2016-02-12 |
| David | 2017-02-12 |
| David | 2018-02-12 |
+-------+------------+

Intuitively, I would have attempted something like this:

def get_birthdays(person):
    birthdays = []
    for year in range(2016, 2019):
        birthdays.append({
            'name': person.name,
            'birthday': person.birthdate.replace(year=year)
        })

    return pd.DataFrame(birthdays)

# with data as my original DataFrame
data.apply(get_birthdays, axis=1)

However, this raises:

ValueError: could not broadcast input array from shape (3,2) into shape (3)

During handling of the above exception, another exception occurred:

[...]

ValueError: cannot copy sequence with size 2 to array axis with dimension 3

915

asked Sep 09 '15 14:09

vicvicvic

1 Answers

The groupby version of apply supports DataFrame as return value in the way which you intended:

import pandas as pd
from datetime import datetime

df = pd.DataFrame({
    'name': ['John', 'Jane', 'Max', 'David'],
    'birthdate': [datetime(1990,1,1), datetime(1957,4,3), datetime(1987,2,3), datetime(1964,2,12)],
})

def get_birthdays(df_x):
    d = {'name': [], 'birthday': []}
    name = df_x.iloc[0]['name']
    original = df_x.iloc[0]['birthdate']
    for year in range(2016, 2019):
        d['name'].append(name)
        d['birthday'].append(original.replace(year=year))
    return pd.DataFrame(d)

print df.groupby('name', group_keys=False).apply(get_birthdays).reset_index(drop=True)

Output:

     birthday   name
0  2016-02-12  David
1  2017-02-12  David
2  2018-02-12  David
3  2016-04-03   Jane
4  2017-04-03   Jane
5  2018-04-03   Jane
6  2016-01-01   John
7  2017-01-01   John
8  2018-01-01   John
9  2016-02-03    Max
10 2017-02-03    Max
11 2018-02-03    Max

answered Oct 31 '22 10:10

YS-L

Related questions
                            
                                Is It possible to integrate Python Seaborn with plotly
                            
                                How can I generate a random sample of bin counts given a sequence of bin probabilities?
                            
                                How to import from __init__.py with Flask
                            
                                cPickle.UnpicklingError: invalid load key, ' '.?
                            
                                pass multiple argument to sys.stdout.write
                            
                                Most efficient way to get the integer index of a key in pandas
                            
                                Flask Admin ModelView different fields between CREATE and EDIT
                            
                                Fast distance calculation in scipy and numpy
                            
                                Webbrowser converts double quotes to %2522
                            
                                Apache webserver and Flask app
                            
                                How to create Matplotlib figure with image and profile plots that fit together?
                            
                                OpenCV error with AdaptiveThreshold
                            
                                Drop the date from a matplotlib time series plot
                            
                                Export pip packages
                            
                                sklearn - model keeps overfitting
                            
                                How to tell the HTTP server to not send chunked encoding
                            
                                How to run WordCountTopology from storm-starter in Intellij
                            
                                Is there an alternative to Ansible on Python3
                            
                                How to prevent key creation through d[key] = val
                            
                                How to raise an error / return a {"foo":["This field is required."]} response in Django REST

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With