Replace NA in column with value in adjacent column

Tags:

This question is related to a post with a similar title (replace NA in an R vector with adjacent values). I would like to scan a column in a data frame and replace NA's with the value in the adjacent cell. In the aforementioned post, the solution was to replace the NA not with the value from the adjacent vector (e.g. the adjacent element in the data matrix) but was a conditional replace for a fixed value. Below is a reproducible example of my problem:

UNIT <- c(NA,NA, 200, 200, 200, 200, 200, 300, 300, 300,300) STATUS <-c('ACTIVE','INACTIVE','ACTIVE','ACTIVE','INACTIVE','ACTIVE','INACTIVE','ACTIVE','ACTIVE',                     'ACTIVE','INACTIVE')  TERMINATED <- c('1999-07-06' , '2008-12-05' , '2000-08-18' , '2000-08-18' ,'2000-08-18' ,'2008-08-18',                         '2008-08-18','2006-09-19','2006-09-19' ,'2006-09-19' ,'1999-03-15')  START <- c('2007-04-23','2008-12-06','2004-06-01','2007-02-01','2008-04-19','2010-11-29','2010-12-30',                    '2007-10-29','2008-02-05','2008-06-30','2009-02-07') STOP <- c('2008-12-05','4712-12-31','2007-01-31','2008-04-18','2010-11-28','2010-12-29','4712-12-31',                   '2008-02-04','2008-06-29','2009-02-06','4712-12-31') #creating dataframe TEST <- data.frame(UNIT,STATUS,TERMINATED,START,STOP); TEST                       UNIT   STATUS TERMINATED      START       STOP 1    NA   ACTIVE 1999-07-06 2007-04-23 2008-12-05 2    NA INACTIVE 2008-12-05 2008-12-06 4712-12-31 3   200   ACTIVE 2000-08-18 2004-06-01 2007-01-31 4   200   ACTIVE 2000-08-18 2007-02-01 2008-04-18 5   200 INACTIVE 2000-08-18 2008-04-19 2010-11-28 6   200   ACTIVE 2008-08-18 2010-11-29 2010-12-29 7   200 INACTIVE 2008-08-18 2010-12-30 4712-12-31 8   300   ACTIVE 2006-09-19 2007-10-29 2008-02-04 9   300   ACTIVE 2006-09-19 2008-02-05 2008-06-29 10  300   ACTIVE 2006-09-19 2008-06-30 2009-02-06 11  300 INACTIVE 1999-03-15 2009-02-07 4712-12-31  #using the syntax for a conditional replace and hoping it works :/           TEST$UNIT[is.na(TEST$UNIT)] <- TEST$STATUS; TEST      UNIT   STATUS TERMINATED      START       STOP 1     1   ACTIVE 1999-07-06 2007-04-23 2008-12-05 2     2 INACTIVE 2008-12-05 2008-12-06 4712-12-31 3   200   ACTIVE 2000-08-18 2004-06-01 2007-01-31 4   200   ACTIVE 2000-08-18 2007-02-01 2008-04-18 5   200 INACTIVE 2000-08-18 2008-04-19 2010-11-28 6   200   ACTIVE 2008-08-18 2010-11-29 2010-12-29 7   200 INACTIVE 2008-08-18 2010-12-30 4712-12-31 8   300   ACTIVE 2006-09-19 2007-10-29 2008-02-04 9   300   ACTIVE 2006-09-19 2008-02-05 2008-06-29 10  300   ACTIVE 2006-09-19 2008-06-30 2009-02-06 11  300 INACTIVE 1999-03-15 2009-02-07 4712-12-31

The outcome should be:

      UNIT   STATUS TERMINATED      START       STOP 1   ACTIVE   ACTIVE 1999-07-06 2007-04-23 2008-12-05 2 INACTIVE INACTIVE 2008-12-05 2008-12-06 4712-12-31 3      200   ACTIVE 2000-08-18 2004-06-01 2007-01-31 4      200   ACTIVE 2000-08-18 2007-02-01 2008-04-18 5      200 INACTIVE 2000-08-18 2008-04-19 2010-11-28 6      200   ACTIVE 2008-08-18 2010-11-29 2010-12-29 7      200 INACTIVE 2008-08-18 2010-12-30 4712-12-31 8      300   ACTIVE 2006-09-19 2007-10-29 2008-02-04 9      300   ACTIVE 2006-09-19 2008-02-05 2008-06-29 10     300   ACTIVE 2006-09-19 2008-06-30 2009-02-06 11     300 INACTIVE 1999-03-15 2009-02-07 4712-12-31

830

asked Mar 26 '13 05:03

hubert_farnsworth

Video Answer

2 Answers

It didn't work because status was a factor. When you mix factor with numeric then numeric is the least restrictive. By forcing status to be character you get the results you're after and the column is now a character vector:

TEST$UNIT[is.na(TEST$UNIT)] <- as.character(TEST$STATUS[is.na(TEST$UNIT)])  ##        UNIT   STATUS TERMINATED      START       STOP ## 1    ACTIVE   ACTIVE 1999-07-06 2007-04-23 2008-12-05 ## 2  INACTIVE INACTIVE 2008-12-05 2008-12-06 4712-12-31 ## 3       200   ACTIVE 2000-08-18 2004-06-01 2007-01-31 ## 4       200   ACTIVE 2000-08-18 2007-02-01 2008-04-18 ## 5       200 INACTIVE 2000-08-18 2008-04-19 2010-11-28 ## 6       200   ACTIVE 2008-08-18 2010-11-29 2010-12-29 ## 7       200 INACTIVE 2008-08-18 2010-12-30 4712-12-31 ## 8       300   ACTIVE 2006-09-19 2007-10-29 2008-02-04 ## 9       300   ACTIVE 2006-09-19 2008-02-05 2008-06-29 ## 10      300   ACTIVE 2006-09-19 2008-06-30 2009-02-06 ## 11      300 INACTIVE 1999-03-15 2009-02-07 4712-12-31

111

answered Sep 18 '22 08:09

Tyler Rinker

You have to do

TEST$UNIT[is.na(TEST$UNIT)] <- TEST$STATUS[is.na(TEST$UNIT)]

so that the value will be replaced with the adjacent value. Otherwise there is a mismatch between the number of values to be replaced and the values to replace them with. This would result in the values being replaced in row order. It works in this case because the two values being replaced are the first two.

answered Sep 22 '22 08:09

Brennan

Related questions
                            
                                Error: vector memory exhausted (limit reached?) R 3.5.0 macOS
                            
                                Setting the default value in a function?
                            
                                Running R Code from Command Line (Windows)
                            
                                Missing legend with ggplot2 and geom_line
                            
                                Create a Data Frame of Unequal Lengths
                            
                                What is the difference between `%in%` and `==`?
                            
                                list all factor levels of a data.frame
                            
                                How to measure similarity between strings?
                            
                                Place title of multiplot panel with ggplot2
                            
                                Diagonal labels orientation on x-axis in heatmap(s)
                            
                                After I upgrade my R version, how can I easily reinstall all the packages that were installed in the old version? [duplicate]
                            
                                Determining UTM zone (to convert) from longitude/latitude
                            
                                Sentiment analysis using R [closed]
                            
                                R lubridate converting seconds to date
                            
                                inconsolata missing to build R vignette
                            
                                dplyr join warning: joining factors with different levels
                            
                                How can I add a subtitle and change the font size of ggplot plots in R?
                            
                                How to control the igraph plot layout with Fixed Positions?
                            
                                Memory efficient alternative to rbind - in-place rbind?
                            
                                Subset data.frame by date

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Replace NA in column with value in adjacent column

Tags:

replace

r

missing-data

na