Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find out the percentage of missing values in each column in the given dataset

import pandas as pd df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0') percent= 100*(len(df.loc[:,df.isnull().sum(axis=0)>=1 ].index) / len(df.index)) print(round(percent,2)) 

input is https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0

and the output should be

Ord_id                 0.00 Prod_id                0.00 Ship_id                0.00 Cust_id                0.00 Sales                  0.24 Discount               0.65 Order_Quantity         0.65 Profit                 0.65 Shipping_Cost          0.65 Product_Base_Margin    1.30 dtype: float64 
like image 485
Shaswata Avatar asked Jun 27 '18 20:06

Shaswata


People also ask

How do you check the percentage of missing values in each column of the data frame?

To find the percentage of missing values in each column of an R data frame, we can use colMeans function with is.na function. This will find the mean of missing values in each column. After that we can multiply the output with 100 to get the percentage.

How do you find the missing values in each column?

Extract rows/columns with missing values in specific columns/rows. You can use the isnull() or isna() method of pandas. DataFrame and Series to check if each element is a missing value or not. isnull() is an alias for isna() , whose usage is the same.

How do you find the missing value in percentages?

How to Calculate a Percentage from Two Numbers. To find what percentage a first number is of a second number, the shortcut method is to simply divide the first number (numerator) by the second number (denominator). This will yield a decimal number, which can then be converted into a percentage.

How do I find the percentage of a column in R?

How to find the percentage of values that lie within a range in R data frame column? First of all, create a data frame. Then, use sum function along with extreme values for range and length function to find the percentage of values that lie within that range.


1 Answers

How about this? I think I actually found something similar on here once before, but I'm not seeing it now...

percent_missing = df.isnull().sum() * 100 / len(df) missing_value_df = pd.DataFrame({'column_name': df.columns,                                  'percent_missing': percent_missing}) 

And if you want the missing percentages sorted, follow the above with:

missing_value_df.sort_values('percent_missing', inplace=True) 

As mentioned in the comments, you may also be able to get by with just the first line in my code above, i.e.:

percent_missing = df.isnull().sum() * 100 / len(df) 
like image 101
Engineero Avatar answered Sep 29 '22 22:09

Engineero