Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Insert Conditional Rows in Columns

I currently have a data set that tracks completed 5 tests however, it only shows those who have completed the test, not those who yet to take it - example below:

 Name   Test    Completed
John    Math-Test1  Yes
John    Math-Test2  Yes
John    Math-Test3  Yes
John    Math-Test4  Yes
John    Math-Test5  Yes
Lauren  Math-Test1  Yes
Lauren  Math-Test2  Yes
Lauren  Math-Test3  Yes
Tom     Math-Test1  Yes
Tom     Math-Test2  Yes
Tom     Math-Test3  Yes
Tom     Math-Test4  Yes
Tom     Math-Test5  Yes

As you can see, Lauren has not yet taken the tests 'Math-Test4' and 'Math-Test5', so her name doesn't appear. I would like to add an option to have the 'Completed' column say 'No' when someone has not completed a test.

Desired output is below:

Name    Test    Completed
John    Math-Test1  Yes
John    Math-Test2  Yes
John    Math-Test3  Yes
John    Math-Test4  Yes
John    Math-Test5  Yes
Lauren  Math-Test1  Yes
Lauren  Math-Test2  Yes
Lauren  Math-Test3  Yes
*Lauren Math-Test4  No* - Add these rows automatically
*Lauren Math-Test5  No*
Tom     Math-Test1  Yes
Tom     Math-Test2  Yes
Tom     Math-Test3  Yes
Tom     Math-Test4  Yes
Tom     Math-Test5  Yes

How could this be achieved with Python/Pandas/Numpy?

Thanks for all who can assist!

Edit - Update: Upon trying @Scott Boston's code I get this out:

idx = pd.MultiIndex.from_product([df['Name'].unique(), 
                                  df['Test'].unique()], 
                                 names=['Name','Test'])

newidx = idx[~idx.isin(df.set_index(['Name','Test']).index)]
pd.concat([df,
         newidx.to_series().reset_index().assign(Completed="No*")[['Name','Test','Completed']]], ignore_index=True)

Output:

Name1   Test    Completed
John    Math-Test1      Yes
John    Math-Test2      Yes
John    Math-Test3      Yes
John    Math-Test4      Yes
John    Math-Test5      Yes
Lauren  Math-Test1      Yes
Lauren  Math-Test2      Yes
Lauren  Math-Test3      Yes
Tom     Math-Test1      Yes
Tom     Math-Test2      Yes
Tom     Math-Test3      Yes
Tom     Math-Test4      Yes
Tom     Math-Test5      Yes
John    Math-Test3      No*
John    Math-Test4      No*
John    Math-Test5      No*
John    Math-Test2      No*
Lauren  Math-Test3      No*
Lauren  Math-Test4      No*
Lauren  Math-Test5      No*
Lauren  Math-Test2      No*
Lauren  Math-Test5      No*
Lauren  Math-Test1      No*
Lauren  Math-Test2      No*
Lauren  Math-Test4      No*
Lauren  Math-Test5      No*

Now just need to find way to remove unwanted rows for the desired output.

like image 205
SlimJim Avatar asked Oct 15 '22 08:10

SlimJim


1 Answers

Try, let's use multiindex with from_product, set_index, and reindex,

This method works for all "seen" values, if a value isn't seen, then you'll need to use hardcoded list in the from_product method:

idx = pd.MultiIndex.from_product([df['Name'].unique(), 
                                  df['Test'].unique()], 
                                 names=['Name','Test'])

df.set_index(['Name','Test']).reindex(idx, fill_value='No*').reset_index()

Output:

      Name        Test Completed
0     John  Math-Test1       Yes
1     John  Math-Test2       Yes
2     John  Math-Test3       Yes
3     John  Math-Test4       Yes
4     John  Math-Test5       Yes
5   Lauren  Math-Test1       Yes
6   Lauren  Math-Test2       Yes
7   Lauren  Math-Test3       Yes
8   Lauren  Math-Test4       No*
9   Lauren  Math-Test5       No*
10     Tom  Math-Test1       Yes
11     Tom  Math-Test2       Yes
12     Tom  Math-Test3       Yes
13     Tom  Math-Test4       Yes
14     Tom  Math-Test5       Yes

Update

idx = pd.MultiIndex.from_product([df['Name'].unique(), 
                                  df['Test'].unique()], 
                                 names=['Name','Test'])

newidx = idx[~idx.isin(df.set_index(['Name','Test']).index)]
pd.concat([df,
         newidx.to_series().reset_index().assign(Completed="No*")[['Name','Test','Completed']]], sort=True, ignore_index=True)
like image 157
Scott Boston Avatar answered Oct 20 '22 11:10

Scott Boston