This is a sample of my data from CSV. It contains ~10 columns.
Product_id Product_Weight Product_Name Shop_Name ...
[1] A 10 xxxx Walmart
[2] B 12 yyyy Target
[3] C 11 zzzz Target
[4] A NA xxxx Walmart
[5] C NA zzzz Target
I would like to fill NA's in row 4 and 5 with 10 and 11 respectively (since the product weight of A and C are already known from row 1 and 3). I want the final data frame to be like this
Product_id Product_Weight Product_Name Shop_Name ...
[1] A 10 xxxx Walmart
[2] B 12 yyyy Target
[3] C 11 zzzz Target
[4] A 10 xxxx Walmart
[5] C 11 zzzz Target
What is the best way to do this in R?
Although the question asked for the "previous occurrence" this would have the disadvantage that if the first Product_Weight
in any Product_id
were NA
then it could not be filled in even if we knew the Product_Weight
from a subsequent Product_id
so instead of using the previous occurrence we take the mean of all non-NAs with the same Product_id
. Since these should all be the same their mean is their common value.
If you really do want the previous occurrence use the Prev
function where:
Prev <- function(x) na.locf(x, na.rm = FALSE)
in place of na.aggregate
in (1) and (3) and do not use (2).
The following solutions have the advantages that they all:
preserve the order of the input
work even if the first Product_Weight
in any Product_id
is NA
do not modify the input
The first solution has the additional advantage of being only one line of code (plus a library
statement) and the second solution has the additional advantage of not using any packages.
1) zoo::na.aggregate We use na.aggregate
in the zoo package (which replaces all NAs with the average of the non-NAs) and we apply it to Product_Weight
separately for each Product_id
.
library(zoo)
transform(DF, Product_Weight = ave(Product_Weight, Product_id, FUN = na.aggregate))
giving:
Product_id Product_Weight Product_Name Shop_Name
1 A 10 xxxx Walmart
2 B 12 yyyy Target
3 C 11 zzzz Target
4 A 10 xxxx Walmart
5 C 11 zzzz Target
2) No packages Alternately use Mean
in place of na.aggregate
where Mean
is defined as:
Mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
3) dplyr/zoo Add row numbers, group by the Product_id
, fill in the NAs as in the prior solutions using either na.aggregate
as shown below or Mean
, arrange back to the original order and remove the row numbers:
library(dplyr)
library(zoo)
DF %>%
mutate(row = row_number()) %>%
group_by(Product_id) %>%
mutate(Product_Weight = na.aggregate(Product_Weight)) %>%
ungroup() %>%
arrange(row) %>%
select(-row)
Note: This was used for the input DF
:
Lines <- " Product_id Product_Weight Product_Name Shop_Name
A 10 xxxx Walmart
B 12 yyyy Target
C 11 zzzz Target
A NA xxxx Walmart
C NA zzzz Target"
DF <- read.table(text = Lines, header = TRUE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With