Filling in values in a data frame in R?

Q: How to find missing values in a Dataframe in R?

Cells in dataframe can contain missing values or NA as its elements, and they can be verified using is.na () method in R language. Column values can be subjected to constraints to filter and subset the data. The values can be mapped to specific occurrences or within a range.

Q: What are the properties of Dataframe columns in R?

In R Programming Language, dataframe columns can be subjected to constraints, and produce smaller subsets. However, while the conditions are applied, the following properties are maintained : Rows are considered to be a subset of the input. Rows in the subset appear in the same order as the original dataframe. Columns remain unmodified.

Tags:

dataframe

r

dplyr

Suppose I have this data frame:

  times vals
1     1    2
2     3    4
3     7    6

set up with

foo <- data.frame(times=c(1,3,7), vals=c(2,4,6))

and I want this one:

  times vals
1     1    2
2     2    2
3     3    4
4     4    4
5     5    4
6     6    4
7     7    6

That is, I want to fill in all the times from 1 to 7, and fill in the vals from the latest time that is not greater than the given time.

I have some code to do it using dplyr, but it is ugly. Suggestions for better?

library(dplyr)

foo <- merge(foo, data.frame(times=1:max(foo$times)), all.y=TRUE)
foo2 <- merge(foo, foo, by=c(), suffixes=c('', '.1'))

foo2 <- foo2 %>% filter(is.na(vals) & !is.na(vals.1) & times.1 <= times) %>%
  group_by(times) %>% arrange(-times.1) %>% mutate(rn = row_number()) %>%
  filter(rn == 1) %>%
  mutate(vals = vals.1,
         rn = NULL,
         vals.1 = NULL,
         times.1 = NULL)

foo <- merge(foo, foo2, by=c('times'), all.x=TRUE, suffixes=c('', '.2'))
foo <- mutate(foo,
              vals = ifelse(is.na(vals), vals.2, vals),
              vals.2 = NULL)

566

asked May 11 '16 16:05

dfrankow

1 Answers

This is a standard rolling join problem:

library(data.table)

setDT(foo)[.(1:7), on = 'times', roll = T]
#   times vals
#1:     1    2
#2:     2    2
#3:     3    4
#4:     4    4
#5:     5    4
#6:     6    4
#7:     7    6

The above is for devel version (1.9.7+), which is smarter about column matching during joins. For 1.9.6 you still need to specify column name for the inner table:

setDT(foo)[.(times = 1:7), on = 'times', roll = T]

173

answered Oct 20 '22 23:10

eddi

Related questions
                            
                                Adding a new column to matrix error
                            
                                Get windows system folders (user home directory, "My documents", etc) path in R
                            
                                ggsave losing unicode characters from ggplot+gridExtra
                            
                                How to combine scales for colour and size into one legend?
                            
                                How to suppress automatic table name and number in an .Rmd file using xtable or knitr::kable?
                            
                                Cannot install ggplot2: "Error in library.dynam(lib, package, package.lib) : shared object ‘stringi.so’ not found"
                            
                                Error: x must be atomic for 'sort.list'
                            
                                Use by = each row for data table
                            
                                Making a stacked area plot using ggplot2
                            
                                Display Correlation Tables as Descending List
                            
                                Importing one long line of data into R
                            
                                List xlsx sheetnames with R
                            
                                How to handle list in R to Rcpp
                            
                                What is the difference between matrix() and as.matrix() in r?
                            
                                How do I get a list, sorted by frequency, in R
                            
                                Check if each row of a data frame is contained in another data frame
                            
                                Plotting multiple lines from a data frame with ggplot2
                            
                                pheatmap: Color for NA
                            
                                How can I change the name of a data frame
                            
                                How to add legend to geom_smooth in ggplot in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With