Fill in sequential values in a dataframe

Question

I have an R data processing problem. I have a large set of data in a .csv file that I can load in using readr. The problem is that there are missing values that I need to add to the dataframe.

Example data:

X1    X2    Value
Coal  1     300
Coal  3     100
Coal  5     150
NatG  2     175
NatG  4     150

This file will load in correctly. However, I want to add processing so that the final dataframe contains an entry for every X1 type and the entire sequence 1:5 in X2 with the missing values set equal to zero. The final dataframe would like like this:

X1    X2    Value
Coal  1     300
Coal  2     0
Coal  3     100
Coal  4     0
Coal  5     150
NatG  1     0
NatG  2     175
NatG  3     0
NatG  4     150
NatG  5     0

I tried using readr to load in a file that had all entries equal to zero and then read in the real data to the same data frame, but it was a destructive overwrite and the missing rows were removed. I need to know how to add appropriate rows to the dataframe without knowing which values in the 1:5 sequence will be missing for each value under X1.

www · Accepted Answer

We can use the complete function from the tidyr package. The complete function creates combination based on columns provided. The fill argument can specify the fill values.

library(tidyr)

dt2 <- dt %>%
  complete(X1, X2, fill = list(Value = 0))
dt2
# # A tibble: 10 x 3
#       X1    X2 Value
#    <chr> <int> <dbl>
#  1  Coal     1   300
#  2  Coal     2     0
#  3  Coal     3   100
#  4  Coal     4     0
#  5  Coal     5   150
#  6  NatG     1     0
#  7  NatG     2   175
#  8  NatG     3     0
#  9  NatG     4   150
# 10  NatG     5     0

We can also specify the group values for the combination if some group values are missing. For example, we can do the following to make sure 1 to 5 in column X2 all has been considered.

dt2 <- dt %>%
  complete(X1, X2 = 1:5, fill = list(Value = 0))

Or use full_seq, which will automatically create the sequence based on the minimum and maximum values in a column.

dt2 <- dt %>%
  complete(X1, X2 = full_seq(X2, period = 1), fill = list(Value = 0))

DATA

dt <- read.table(text = "X1    X2    Value
Coal  1     300
                 Coal  3     100
                 Coal  5     150
                 NatG  2     175
                 NatG  4     150",
                 header = TRUE, stringsAsFactors = FALSE)

Mako212 · Answer

Great solution from @www with tidyr. This is a way to do it in base R.

levels <- 1:5

type <- c("Coal", "NatG")

df2 <- expand.grid(levels,type)

df3 <- merge(dt, df2, by.x=c("X1","X2"), by.y=c("Var2","Var1"),all.y=T)

df3[is.na(df3)] <- 0

     X1 X2 Value
1  Coal  1   300
2  Coal  2     0
3  Coal  3   100
4  Coal  4     0
5  Coal  5   150
6  NatG  1     0
7  NatG  2   175
8  NatG  3     0
9  NatG  4   150
10 NatG  5     0

Fill in sequential values in a dataframe

Tags:

dataframe

r

doriengard

2 Answers

www

Mako212

Recent Activity

Donate For Us

Fill in sequential values in a dataframe

Tags:

dataframe

r

doriengard

2 Answers

www

Mako212

Related questions

Recent Activity

Donate For Us