I have a Dataframe like below
+----------+-------+-------+-------+-------+-------+
| Date | Loc 1 | Loc 2 | Loc 3 | Loc 4 | Loc 5 |
+----------+-------+-------+-------+-------+-------+
| 1-Jan-19 | 50 | 0 | 40 | 80 | 60 |
| 2-Jan-19 | 60 | 80 | 60 | 80 | 90 |
| 3-Jan-19 | 80 | 20 | 0 | 50 | 30 |
| 4-Jan-19 | 90 | 20 | 10 | 90 | 20 |
| 5-Jan-19 | 80 | 0 | 10 | 10 | 0 |
| 6-Jan-19 | 100 | 90 | 100 | 0 | 10 |
| 7-Jan-19 | 20 | 10 | 30 | 20 | 0 |
+----------+-------+-------+-------+-------+-------+
I want to extract all the data points (row label & column Label) if the value is zero and produce a new dataframe.
my desired output is as below
+--------------+----------------+
| Missing Date | Missing column |
+--------------+----------------+
| 1-Jan-19 | Loc 2 |
| 3-Jan-19 | Loc 3 |
| 5-Jan-19 | Loc 2 |
| 5-Jan-19 | Loc 5 |
| 6-Jan-19 | Loc 4 |
| 7-Jan-19 | Loc 5 |
+--------------+----------------+
Note on 5-Jan-19
, there are two entries Loc 2
& Loc 5
.
I know how to do this in Excel VBA. But, I'm looking for a more scalable solution with python-pandas
.
so far i have attempted with the below code
import pandas as pd
df = pd.read_csv('data.csv')
new_df = pd.DataFrame(columns=['Missing Date','Missing Column'])
for c in df.columns:
if c != 'Date':
if df[df[c] == 0]:
new_df.append(df[c].index, c)
I'm new to pandas. Hence, guide me how to solve this issue.
melt
+ query
(df.melt(id_vars='Date', var_name='Missing column')
.query('value == 0')
.drop(columns='value')
)
Date Missing column
7 1-Jan-19 Loc 2
11 5-Jan-19 Loc 2
16 3-Jan-19 Loc 3
26 6-Jan-19 Loc 4
32 5-Jan-19 Loc 5
34 7-Jan-19 Loc 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With