I have the following example dataframe:
df = pd.DataFrame(data = {'RecordID' : [1,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5], 'DisplayLabel' : ['Source','Test','Value 1','Value 2','Value3','Source','Test','Value 1','Value 2','Source','Test','Value 1','Value 2','Source','Test','Value 1','Value 2','Source','Test','Value 1','Value 2'],
'Value' : ['Web','Logic','S','I','Complete','Person','Voice','>20','P','Mail','OCR','A','I','Dictation','Understandable','S','I','Web','Logic','R','S']})
which creates this dataframe:
+-------+----------+---------------+----------------+
| Index | RecordID | Display Label | Value |
+-------+----------+---------------+----------------+
| 0 | 1 | Source | Web |
| 1 | 1 | Test | Logic |
| 2 | 1 | Value 1 | S |
| 3 | 1 | Value 2 | I |
| 4 | 1 | Value 3 | Complete |
| 5 | 2 | Source | Person |
| 6 | 2 | Test | Voice |
| 7 | 2 | Value 1 | >20 |
| 8 | 2 | Value 2 | P |
| 9 | 3 | Source | Mail |
| 10 | 3 | Test | OCR |
| 11 | 3 | Value 1 | A |
| 12 | 3 | Value 2 | I |
| 13 | 4 | Source | Dictation |
| 14 | 4 | Test | Understandable |
| 15 | 4 | Value 1 | S |
| 16 | 4 | Value 2 | I |
| 17 | 5 | Source | Web |
| 18 | 5 | Test | Logic |
| 19 | 5 | Value 1 | R |
| 20 | 5 | Value 2 | S |
+-------+----------+---------------+----------------+
I am trying to "unmelt" though not exactly the source and test columns into new dataframe Columns such that it will look like this:
+-------+----------+-----------+----------------+---------------+----------+
| Index | RecordID | Source | Test | Result | Value |
+-------+----------+-----------+----------------+---------------+----------+
| 0 | 1 | Web | Logic | Value 1 | S |
| 1 | 1 | Web | Logic | Value 2 | I |
| 2 | 1 | Web | Logic | Value 3 | Complete |
| 3 | 2 | Person | Voice | Value 1 | >20 |
| 4 | 2 | Person | Voice | Value 2 | P |
| 5 | 3 | Mail | OCR | Value 1 | A |
| 6 | 3 | Mail | OCR | Value 2 | I |
| 7 | 4 | Dictation | Understandable | Value 1 | S |
| 8 | 4 | Dictation | Understandable | Value 2 | I |
| 9 | 5 | Web | Logic | Value 1 | R |
| 10 | 5 | Web | Logic | Value 2 | S |
+-------+----------+-----------+----------------+---------------+----------+
It's my understanding that pivot and melt will do the entire DisplayLabel column and not just some of the values.
Any help would be greatly appreciated as I have read the Pandas Melt and the Pandas Pivot as well as some references on stackoverflow and I can't seem to figure out a way to do this quickly.
Thanks!
We can achieve your result by applying logic and pivotting, we split your data by checking if DisplayLabel
contains Value
and then we join
them back together:
mask = df['DisplayLabel'].str.contains('Value')
df2 = df[~mask].pivot(index='RecordID', columns='DisplayLabel', values='Value')
dfpiv = (
df[mask].rename(columns={'DisplayLabel':'Result'})
.set_index('RecordID')
.join(df2)
.reset_index()
)
RecordID Result Value Source Test
0 1 Value 1 S Web Logic
1 1 Value 2 I Web Logic
2 1 Value3 Complete Web Logic
3 2 Value 1 >20 Person Voice
4 2 Value 2 P Person Voice
5 3 Value 1 A Mail OCR
6 3 Value 2 I Mail OCR
7 4 Value 1 S Dictation Understandable
8 4 Value 2 I Dictation Understandable
9 5 Value 1 R Web Logic
10 5 Value 2 S Web Logic
If you want the exact column order as your example, use DataFrame.reindex
:
dfpiv.reindex(columns=['RecordID', 'Source', 'Test', 'Result', 'Value'])
RecordID Source Test Result Value
0 1 Web Logic Value 1 S
1 1 Web Logic Value 2 I
2 1 Web Logic Value3 Complete
3 2 Person Voice Value 1 >20
4 2 Person Voice Value 2 P
5 3 Mail OCR Value 1 A
6 3 Mail OCR Value 2 I
7 4 Dictation Understandable Value 1 S
8 4 Dictation Understandable Value 2 I
9 5 Web Logic Value 1 R
10 5 Web Logic Value 2 S
# mask all rows where "Value" is in column DisplayLabel
mask = df['DisplayLabel'].str.contains('Value')
0 False
1 False
2 True
3 True
4 True
5 False
6 False
7 True
8 True
9 False
10 False
11 True
12 True
13 False
14 False
15 True
16 True
17 False
18 False
19 True
20 True
Name: DisplayLabel, dtype: bool
# select all rows which do NOT have "Value" in DisplayLabel
df[~mask]
RecordID DisplayLabel Value
0 1 Source Web
1 1 Test Logic
5 2 Source Person
6 2 Test Voice
9 3 Source Mail
10 3 Test OCR
13 4 Source Dictation
14 4 Test Understandable
17 5 Source Web
18 5 Test Logic
# pivot the values in DisplayLabel to columns
df2 = df[~mask].pivot(index='RecordID', columns='DisplayLabel', values='Value')
DisplayLabel Source Test
RecordID
1 Web Logic
2 Person Voice
3 Mail OCR
4 Dictation Understandable
5 Web Logic
df[mask].rename(columns={'DisplayLabel':'Result'}) # rename the column DisplayLabel to Result
.set_index('RecordID') # set RecordId as index so we can join df2
.join(df2) # join df2 back to our dataframe based RecordId
.reset_index() # reset index so we get RecordId back as column
RecordID Result Value Source Test
0 1 Value 1 S Web Logic
1 1 Value 2 I Web Logic
2 1 Value3 Complete Web Logic
3 2 Value 1 >20 Person Voice
4 2 Value 2 P Person Voice
5 3 Value 1 A Mail OCR
6 3 Value 2 I Mail OCR
7 4 Value 1 S Dictation Understandable
8 4 Value 2 I Dictation Understandable
9 5 Value 1 R Web Logic
10 5 Value 2 S Web Logic
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With