Barplot from unorganised data - dataframe creation?

Question

From the table below, I need to create 4 different barplots, corresponding to the 4 diffeent places TST1 TST2 TST3 TST4 TST5

Each barplot should have 8 ticks for NOT_DONE INCOMP UNTESTED 30 35 40 45 50 in that order if possible. The ticks will correspond to the number of time each "value" appears for that given place. (The places are one of 4 options: L1 L2 L3 L4)

However:

Only the values in the right-most column are to be considered meaning if no values are found in TST5, then the program should check TST4 etc until it finds a value. If no value is found in either of these 5 columns then no value is counted. If a value is found then it does not matter what is to the left of it.

My thought process for that would be to create a new column dataframe with the values I need (so the most right values for each row) and their corresponding place. I am new to all this and unsure how to do it so any help in which direction to go would be greatly appreciated.

I am required to use python 2.7, I am also using seaborn for the plotting.

+-------+----------+----------+----------+--------+----------+
| PLACE | TST1     | TST2     | TST3     | TST4   | TST5     |
+-------+----------+----------+----------+--------+----------+
| L1    |          | NOT_DONE |          |        | 50       |
+-------+----------+----------+----------+--------+----------+
| L1    |          |          | 35       |        |          |
+-------+----------+----------+----------+--------+----------+
| L4    |          |          |          |        |          |
+-------+----------+----------+----------+--------+----------+
| L3    |          |          | INCOMP   |        |          |
+-------+----------+----------+----------+--------+----------+
| L2    | UNTESTED |          |          | INCOMP |          |
+-------+----------+----------+----------+--------+----------+
| L3    |          |          |          |        |          |
+-------+----------+----------+----------+--------+----------+
| L4    |          | 30       |          |        |          |
+-------+----------+----------+----------+--------+----------+
| L3    |          | INCOMP   | 40       |        |          |
+-------+----------+----------+----------+--------+----------+
| L4    |          |          |          |        | UNTESTED |
+-------+----------+----------+----------+--------+----------+
| L1    |          |          |          |        |          |
+-------+----------+----------+----------+--------+----------+
| L3    |          | INCOMP   |          |        |          |
+-------+----------+----------+----------+--------+----------+
| L2    |          |          |          |        |          |
+-------+----------+----------+----------+--------+----------+
| L2    |          | 50       |          |        |          |
+-------+----------+----------+----------+--------+----------+
| L3    |          |          | UNTESTED | 35     | NOT_DONE |
+-------+----------+----------+----------+--------+----------+
| L1    |          |          |          |        |          |
+-------+----------+----------+----------+--------+----------+
| L2    |          | 40       |          | INCOMP |          |
+-------+----------+----------+----------+--------+----------+
| L3    |          |          |          |        |          |
+-------+----------+----------+----------+--------+----------+
| L1    |          |          |          |        |          |
+-------+----------+----------+----------+--------+----------+
| L4    |          | NOT_DONE |          | 30     | NOT_DONE |
+-------+----------+----------+----------+--------+----------+

enter image description here

tdy · Accepted Answer

I am required to use python 2.7, I am also using seaborn for the plotting.

Tested on python 2.7.18 and pandas 0.24.2 (though it works fine in python 3):

Propagate the right-most values (ignoring PLACE) using ffill along columns:

df['TST'] = df.drop(columns='PLACE').ffill(axis='columns').iloc[:, -1]

Group by PLACE and get their value_counts:

data = df.groupby('PLACE')['TST'].value_counts().reset_index(name='COUNT')

#   PLACE       TST  COUNT
# 0    L1        35      1
# 1    L1        50      1
# 2    L2    INCOMP      2
# 3    L2        50      1
# 4    L3    INCOMP      2
# 5    L3        40      1
# 6    L3  NOT_DONE      1
# 7    L4        30      1
# 8    L4  NOT_DONE      1
# 9    L4  UNTESTED      1

Then pass this data into catplot (use the order param to set your preferred tick order):

incompletes = ['NOT_DONE', 'INCOMP', 'UNTESTED']
ticks = incompletes + sorted(data.TST.unique())[:len(incompletes)]

g = sns.catplot(x='TST', y='COUNT', col='PLACE', col_wrap=2,
                data=data, order=ticks, kind='bar')
g.set_xticklabels(rotation=90)

catplot output

Versions:

>>> sys.version
2.7.18 (default, Mar 15 2021, 14:29:03) 
[GCC 10.2.0]
>>> pandas.__version__
0.24.2
>>> matplotlib.__version__
2.2.5
>>> seaborn.__version__
0.9.1

Barplot from unorganised data - dataframe creation?

Tags:

python

pandas

matplotlib

python-2.7

seaborn

coding amat

1 Answers

tdy

Recent Activity

Donate For Us

Barplot from unorganised data - dataframe creation?

Tags:

python

pandas

matplotlib

python-2.7

seaborn

coding amat

1 Answers

tdy

Related questions

Recent Activity

Donate For Us