I have a dataframe that contains 16 columns. My goal is to return a 17th column containing all the column names in list or tuple format if the cell contained a certain value. The purpose is to efficiently store data from a multi-select survey question so that Python's .explode or SQL's UNNEST methods can be used to count the items in the 17th column.
A sample dataset:
| Q1 | Q2 | Q3 |
|-------|-------|-------|
| True | True | False |
| False | True | True |
| True | True | False |
What I'd like to return:
| Q1 | Q2 | Q3 | List |
|-------|-------|-------|----------|
| True | True | False | [Q1, Q2] |
| False | True | True | [Q2, Q3] |
| True | True | False | [Q1, Q2] |
I'm open to other solutions if I'm not quite thinking about this issue the right way.
While this solution works for the specific question, I think it only works for NxN dict,list shapes e.g. add a 'Q4' key with a list length 3, or, drop a value from each list and it will break. I found this to be more robust personally even if not the most pythonic...
import itertools
data={'Q1':['True', 'False', 'True'], 'Q2':['True', 'True', 'True'], 'Q3':['False', 'True', 'False']}
output = []
for k,v in data.items():
z=[]
for i in v:
if i =='True':
z.append(k)
else:
z.append(None)
output.append(z)
print(output)
#[['Q1', None, 'Q1'], ['Q2', 'Q2', 'Q2'], [None, 'Q3', None]]
output1 = list(map(list, itertools.zip_longest(*output, fillvalue=None)))
output2 = output1.copy()
print(output2)
#[['Q1', 'Q2', None], [None, 'Q2', 'Q3'], ['Q1', 'Q2', None]]
for x in output2:
while None in x:
x.remove(None)
print(output2)
#[['Q1', 'Q2'], ['Q2', 'Q3'], ['Q1', 'Q2']]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With