I have a data frame with two factors (<code>distance</code>) and years (<code>years</code>). I would like to complete all <code>years</code> values for every factor by 0. i.e. from this: <pre class="prettyprint"><code> distance years area 1 NPR 3 10 2 NPR 4 20 3 NPR 7 30 4 100 1 40 5 100 5 50 6 100 6 60 </code></pre> get this: <pre class="prettyprint"><code> distance years area 1 NPR 1 0 2 NPR 2 0 3 NPR 3 10 4 NPR 4 20 5 NPR 5 0 6 NPR 6 0 7 NPR 7 30 8 100 1 40 9 100 2 0 10 100 3 0 11 100 4 0 12 100 5 50 13 100 6 60 14 100 7 0 </code></pre> I tried to apply <code>expand</code> function: <pre class="prettyprint"><code>library(tidyr) library(dplyr, warn.conflicts = FALSE) expand(df, years = 1:7) </code></pre> but this just produces one column data frame and does not expand the original one: <pre class="prettyprint"><code># A tibble: 7 x 1 years <int> 1 1 2 2 3 3 4 4 5 5 6 6 7 7 </code></pre> or <code>expand.grid</code> does not working neither: <pre class="prettyprint"><code>require(utils) expand.grid(df, years = 1:7) Error in match.names(clabs, names(xi)) : names do not match previous names In addition: Warning message: In format.data.frame(x, digits = digits, na.encode = FALSE) : corrupt data frame: columns will be truncated or padded with NAs </code></pre> Is there a simple way to <code>expand</code> my data frame? And how to expand it based on two categories: <code>distance</code> and <code>uniqueLoc</code>? <pre class="prettyprint"><code>distance <- rep(c("NPR", "100"), each = 3) years <-c(3,4,7, 1,5,6) area <-seq(10,60,10) uniqueLoc<-rep(c("a", "b"), 3) df<-data.frame(uniqueLoc, distance, years, area) > df uniqueLoc distance years area 1 a NPR 3 10 2 b NPR 4 20 3 a NPR 7 30 4 b 100 1 40 5 a 100 5 50 6 b 100 6 60 </code></pre>

You can use the <code>tidyr::complete</code> function: <pre class="prettyprint"><code>complete(df, distance, years = full_seq(years, period = 1), fill = list(area = 0)) # A tibble: 14 x 3 distance years area <fct> <dbl> <dbl> 1 100 1. 40. 2 100 2. 0. 3 100 3. 0. 4 100 4. 0. 5 100 5. 50. 6 100 6. 60. 7 100 7. 0. 8 NPR 1. 0. 9 NPR 2. 0. 10 NPR 3. 10. 11 NPR 4. 20. 12 NPR 5. 0. 13 NPR 6. 0. 14 NPR 7. 30. </code></pre> or slightly shorter: <pre class="prettyprint"><code>complete(df, distance, years = 1:7, fill = list(area = 0)) </code></pre>

Combining <code>tidyr::pivot_wider()</code> and <code>tidyr::pivot_longer()</code> also makes implicit missing values explicit. <pre class="prettyprint"><code># Load packages library(tidyverse) # Your data df <- tibble(distance = c(rep("NPR",3), rep(100, 3)), years = c(3,4,7,1,5,6), area = seq(10, 60, by = 10)) # Solution df %>% pivot_wider(names_from = years, values_from = area) %>% # pivot_wider() makes your implicit missing values explicit pivot_longer(2:7, names_to = "years", values_to = "area") %>% # Turn to your desired format (long) mutate(area = replace_na(area, 0)) # Replace missing values (NA) with 0s </code></pre>

Complete dataframe with missing combinations of values

Tags:

r

tidyr

I have a data frame with two factors (distance) and years (years). I would like to complete all years values for every factor by 0.

i.e. from this:

    distance years area
1      NPR     3   10
2      NPR     4   20
3      NPR     7   30
4      100     1   40
5      100     5   50
6      100     6   60

get this:

   distance years area
1       NPR     1    0
2       NPR     2    0
3       NPR     3   10
4       NPR     4   20
5       NPR     5    0
6       NPR     6    0
7       NPR     7   30
8       100     1   40
9       100     2    0
10      100     3    0
11      100     4    0
12      100     5   50
13      100     6   60
14      100     7    0

I tried to apply expand function:

library(tidyr)
library(dplyr, warn.conflicts = FALSE)

expand(df, years = 1:7)

but this just produces one column data frame and does not expand the original one:

# A tibble: 7 x 1
  years
  <int>
1     1
2     2
3     3
4     4
5     5
6     6
7     7

or expand.grid does not working neither:

require(utils)    
expand.grid(df, years = 1:7)

Error in match.names(clabs, names(xi)) : 
  names do not match previous names
In addition: Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs

Is there a simple way to expand my data frame? And how to expand it based on two categories: distance and uniqueLoc?

distance <- rep(c("NPR", "100"), each = 3)
years <-c(3,4,7, 1,5,6)
area <-seq(10,60,10)
uniqueLoc<-rep(c("a", "b"), 3)

df<-data.frame(uniqueLoc, distance, years, area)

> df
  uniqueLoc distance years area
1         a      NPR     3   10
2         b      NPR     4   20
3         a      NPR     7   30
4         b      100     1   40
5         a      100     5   50
6         b      100     6   60

926

asked Jun 25 '18 08:06

maycca

2 Answers

You can use the tidyr::complete function:

complete(df, distance, years = full_seq(years, period = 1), fill = list(area = 0))

# A tibble: 14 x 3
   distance years  area
   <fct>    <dbl> <dbl>
 1 100         1.   40.
 2 100         2.    0.
 3 100         3.    0.
 4 100         4.    0.
 5 100         5.   50.
 6 100         6.   60.
 7 100         7.    0.
 8 NPR         1.    0.
 9 NPR         2.    0.
10 NPR         3.   10.
11 NPR         4.   20.
12 NPR         5.    0.
13 NPR         6.    0.
14 NPR         7.   30.

or slightly shorter:

complete(df, distance, years = 1:7, fill = list(area = 0))

answered Sep 20 '22 02:09

talat

Combining tidyr::pivot_wider() and tidyr::pivot_longer() also makes implicit missing values explicit.

# Load packages 
library(tidyverse)

# Your data
    df <- tibble(distance = c(rep("NPR",3), rep(100, 3)),
                 years = c(3,4,7,1,5,6),
                 area = seq(10, 60, by = 10))
# Solution 
    df %>%
      pivot_wider(names_from = years, 
                  values_from = area) %>% # pivot_wider() makes your implicit missing values explicit 
      pivot_longer(2:7, names_to = "years", 
                   values_to = "area") %>% # Turn to your desired format (long)
      mutate(area = replace_na(area, 0)) # Replace missing values (NA) with 0s

answered Sep 19 '22 02:09

Jae

Related questions
                            
                                R lattice package: add legend to a figure
                            
                                Searching for a straightforward way to do Stata's bysort tasks in R
                            
                                Can I cache data loading in R?
                            
                                Split string column to create new binary columns
                            
                                Extract values from column under given conditions of other column [duplicate]
                            
                                Retain attributes when using gather from tidyr (attributes are not identical)
                            
                                How to add polylines from one location to others separately using leaflet in shiny?
                            
                                How to name each variable using melt
                            
                                How to escape closed bracket "]" in regex in R
                            
                                dplyr pipes - How to change the original dataframe
                            
                                Default NULL parameter Rcpp
                            
                                R networkD3 package: node coloring in simpleNetwork()
                            
                                One shared legend for a cowplot grid in R
                            
                                Rstudio greyed out Git commands and (No branch)
                            
                                Merge error : negative length vectors are not allowed
                            
                                Adding a weighted least squares trendline in ggplot2
                            
                                Getting file path from Shiny UI (Not just directory) using browse button without uploading the file
                            
                                Fill and dodge boxplots by group on a continuous x axis
                            
                                rvest - scrape 2 classes in 1 tag
                            
                                ggpairs plot with heatmap of correlation values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With