Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elegant way to report missing values in a data.frame

Here's a little piece of code I wrote to report variables with missing values from a data frame. I'm trying to think of a more elegant way to do this, one that perhaps returns a data.frame, but I'm stuck:

for (Var in names(airquality)) {     missing <- sum(is.na(airquality[,Var]))     if (missing > 0) {         print(c(Var,missing))     } } 

Edit: I'm dealing with data.frames with dozens to hundreds of variables, so it's key that we only report variables with missing values.

like image 496
Zach Avatar asked Nov 29 '11 20:11

Zach


People also ask

What is a good way to fill in missing values in a dataset?

Use the fillna() Method: The fillna() function iterates through your dataset and fills all null rows with a specified value. It accepts some optional arguments—take note of the following ones: Value: This is the value you want to insert into the missing rows. Method: Lets you fill missing values forward or in reverse.

How do you write a missing data report?

In their impact report, researchers should report missing data rates by variable, explain the reasons for missing data (to the extent known), and provide a detailed description of how missing data were handled in the analysis, consistent with the original plan.

How do you describe missing data?

Missing data (or missing values) is defined as the data value that is not stored for a variable in the observation of interest. The problem of missing data is relatively common in almost all research and can have a significant effect on the conclusions that can be drawn from the data [1].


1 Answers

Just use sapply

> sapply(airquality, function(x) sum(is.na(x)))   Ozone Solar.R    Wind    Temp   Month     Day       37       7       0       0       0       0 

You could also use apply or colSums on the matrix created by is.na()

> apply(is.na(airquality),2,sum)   Ozone Solar.R    Wind    Temp   Month     Day       37       7       0       0       0       0 > colSums(is.na(airquality))   Ozone Solar.R    Wind    Temp   Month     Day       37       7       0       0       0       0  
like image 89
Joshua Ulrich Avatar answered Oct 08 '22 04:10

Joshua Ulrich