How can I use Python to aggregate data from multiple directors in various companies into one figure per company using Blau's Index?

Question

I have a dataframe which contains categorized data about the educational backgrounds of the directors of several companies. Currently, each company (recorded by its ticker) has multiple entries, one per director, and the df looks something like this:

Ticker  Education
ABC     1
ABC     1
ABC     5
ABC     7
ABC     5
DEF     3
DEF     4
DEF     4
DEF     4
DEF     6

I want to use the Blau's Index formula (same as the Gini-Simpson Index) to create a new dataframe with only one entry per company as follows:

Ticker  Education Diversity
ABC     0.64
DEF     0.56

The formula used is (1 - ∑p_i²) where p_i is the proportion of directors in each of the i education categories; e.g. for company ABC, p₁ = 2/5.

Can anyone help me implement this in Python (3.7)? Any help would be greatly appreciated!

Chris Adams · Accepted Answer

You could try implenting your own def then use groupby.apply. Finally, Series.reset_index to get back to DataFrame format:

def blaus_index(arr):
    return 1 - sum((arr.value_counts() / len(arr)) ** 2)

df.groupby('Ticker')['Education'].apply(blaus_index).reset_index(name='Education Diversity')

  Ticker  Education Diversity
0    ABC                 0.64
1    DEF                 0.56

How can I use Python to aggregate data from multiple directors in various companies into one figure per company using Blau's Index?

Tags:

python

python-3.x

pandas

dataframe

amiskov

1 Answers

Chris Adams

Recent Activity

Donate For Us

How can I use Python to aggregate data from multiple directors in various companies into one figure per company using Blau's Index?

Tags:

python

python-3.x

pandas

dataframe

amiskov

1 Answers

Chris Adams

Related questions

Recent Activity

Donate For Us