From the table below, I need to create 4 different barplots, corresponding to the 4 diffeent places TST1 TST2 TST3 TST4 TST5
Each barplot should have 8 ticks for NOT_DONE INCOMP UNTESTED 30 35 40 45 50 in that order if possible. The ticks will correspond to the number of time each "value" appears for that given place. (The places are one of 4 options: L1 L2 L3 L4)
However:
Only the values in the right-most column are to be considered meaning if no values are found in TST5, then the program should check TST4 etc until it finds a value. If no value is found in either of these 5 columns then no value is counted. If a value is found then it does not matter what is to the left of it.
My thought process for that would be to create a new column dataframe with the values I need (so the most right values for each row) and their corresponding place. I am new to all this and unsure how to do it so any help in which direction to go would be greatly appreciated.
I am required to use python 2.7, I am also using seaborn for the plotting.
+-------+----------+----------+----------+--------+----------+
| PLACE | TST1 | TST2 | TST3 | TST4 | TST5 |
+-------+----------+----------+----------+--------+----------+
| L1 | | NOT_DONE | | | 50 |
+-------+----------+----------+----------+--------+----------+
| L1 | | | 35 | | |
+-------+----------+----------+----------+--------+----------+
| L4 | | | | | |
+-------+----------+----------+----------+--------+----------+
| L3 | | | INCOMP | | |
+-------+----------+----------+----------+--------+----------+
| L2 | UNTESTED | | | INCOMP | |
+-------+----------+----------+----------+--------+----------+
| L3 | | | | | |
+-------+----------+----------+----------+--------+----------+
| L4 | | 30 | | | |
+-------+----------+----------+----------+--------+----------+
| L3 | | INCOMP | 40 | | |
+-------+----------+----------+----------+--------+----------+
| L4 | | | | | UNTESTED |
+-------+----------+----------+----------+--------+----------+
| L1 | | | | | |
+-------+----------+----------+----------+--------+----------+
| L3 | | INCOMP | | | |
+-------+----------+----------+----------+--------+----------+
| L2 | | | | | |
+-------+----------+----------+----------+--------+----------+
| L2 | | 50 | | | |
+-------+----------+----------+----------+--------+----------+
| L3 | | | UNTESTED | 35 | NOT_DONE |
+-------+----------+----------+----------+--------+----------+
| L1 | | | | | |
+-------+----------+----------+----------+--------+----------+
| L2 | | 40 | | INCOMP | |
+-------+----------+----------+----------+--------+----------+
| L3 | | | | | |
+-------+----------+----------+----------+--------+----------+
| L1 | | | | | |
+-------+----------+----------+----------+--------+----------+
| L4 | | NOT_DONE | | 30 | NOT_DONE |
+-------+----------+----------+----------+--------+----------+

I am required to use python 2.7, I am also using seaborn for the plotting.
Tested on python 2.7.18 and pandas 0.24.2 (though it works fine in python 3):
Propagate the right-most values (ignoring PLACE) using ffill along columns:
df['TST'] = df.drop(columns='PLACE').ffill(axis='columns').iloc[:, -1]
Group by PLACE and get their value_counts:
data = df.groupby('PLACE')['TST'].value_counts().reset_index(name='COUNT')
# PLACE TST COUNT
# 0 L1 35 1
# 1 L1 50 1
# 2 L2 INCOMP 2
# 3 L2 50 1
# 4 L3 INCOMP 2
# 5 L3 40 1
# 6 L3 NOT_DONE 1
# 7 L4 30 1
# 8 L4 NOT_DONE 1
# 9 L4 UNTESTED 1
Then pass this data into catplot (use the order param to set your preferred tick order):
incompletes = ['NOT_DONE', 'INCOMP', 'UNTESTED']
ticks = incompletes + sorted(data.TST.unique())[:len(incompletes)]
g = sns.catplot(x='TST', y='COUNT', col='PLACE', col_wrap=2,
data=data, order=ticks, kind='bar')
g.set_xticklabels(rotation=90)

Versions:
>>> sys.version
2.7.18 (default, Mar 15 2021, 14:29:03) \n[GCC 10.2.0]
>>> pandas.__version__
0.24.2
>>> matplotlib.__version__
2.2.5
>>> seaborn.__version__
0.9.1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With