I am working with school schedule data and I have to differentiate different sessions of the same course.
If a different class has the same course, this is in effect another session of the same course and needs to be differentiated. This means having an extra column with a session index.
import pandas as pd
cols = ['course', 'class_name', 'professor']
data = [ ['Math', 'X', 'Bob'],
['Math', 'X', 'Bob'],
['Math', 'Y', 'Bob'],
['English', 'Y', 'Tim'],
['English', 'X', 'Jim'],
['English', 'X', 'Jim'],
]
df = pd.DataFrame(columns=cols, data=data)
# Add session
df['session'] = '?'
print(df)
The result should be something like this.
course class_name professor session
0 Math X Bob 0
1 Math X Bob 0
2 Math Y Bob 1
3 Eng. Y Tim 1
4 Eng. X Jim 0
5 Eng. X Jim 0
I have come up with a convoluted procedural solution, what would be a more pandas way of doing this?
groups = df.groupby(['course', 'class_name'])
d_sessions = {}
counter = 0
pclass = ""
pcourse = ""
for m_idx in list(groups.groups):
course = m_idx[0]
class_ = m_idx[1]
if class_ != pclass:
counter += 1
if pcourse != course:
counter = 0
pclass = class_
pcourse = course
d_sessions[m_idx] = counter
df.set_index(['course', 'class_name'], inplace=True)
for k, v in d_sessions.items():
df.set_value(col='index', value=v, index=k)
df.reset_index(inplace=True)
df
Let's try:
df['session'] = df.groupby('course')['class_name'].transform(lambda x: (~x.duplicated()).cumsum())
Output:
course class_name professor session
0 Math X Bob 1
1 Math X Bob 1
2 Math Y Bob 2
3 English Y Tim 1
4 English X Jim 2
5 English X Jim 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With