Conditionally concatenate strings in vector based on a value in secondary vector in R

Question

I am working with reports in the form of Excel spreadsheets, and they have headings with varying indent levels (example results from tidyxl below). The indent levels can also be inconsistent between heading groups as in the example of Heading 1 versus Heading 2.

contents <- c('Heading 1', 'Subheading 1', 'Item 1', 'Item 2', 'Subheading 2', 'Item 1', 'Item 2', 'Heading 2','Subheading 1','Item 1','Item 2')
indents <- c(0,1,2,2,1,2,2,0,2,4,4)
df <- data.frame(indents,contents)
   indents     contents
1        0    Heading 1
2        1 Subheading 1
3        2       Item 1
4        2       Item 2
5        1 Subheading 2
6        2       Item 1
7        2       Item 2
8        0    Heading 2
9        2 Subheading 1
10       4       Item 1
11       4       Item 2

I would like to produce a vector of headings where headings are concatenated together, like is shown below under 'headings.'

headings <- c('Heading 1','Heading 1 | Subheading 1','Heading 1 | Subheading 1 | Item 1','Heading 1 | Subheading 1 | Item 2','Heading 1 | Subheading 2','Heading 1 | Subheading 2 | Item 1','Heading 1 | Subheading 2 | Item 2','Heading 2','Heading 2 | Subheading 1','Heading 2 | Subheading 1 | Item 1','Heading 2 | Subheading 1 | Item 2')
data.frame(indents,contents, headings)
   indents     contents                          headings
1        0    Heading 1                         Heading 1
2        1 Subheading 1          Heading 1 | Subheading 1
3        2       Item 1 Heading 1 | Subheading 1 | Item 1
4        2       Item 2 Heading 1 | Subheading 1 | Item 2
5        1 Subheading 2          Heading 1 | Subheading 2
6        2       Item 1 Heading 1 | Subheading 2 | Item 1
7        2       Item 2 Heading 1 | Subheading 2 | Item 2
8        0    Heading 2                         Heading 2
9        2 Subheading 1          Heading 2 | Subheading 1
10       4       Item 1 Heading 2 | Subheading 1 | Item 1
11       4       Item 2 Heading 2 | Subheading 1 | Item 2

I have searched for a solution but haven't found anything that does what I need. I'm imagining a paste or paste0 call within a loop or apply function may do it but I haven't had any luck thus far.

knitz3 · Accepted Answer

This doesn't feel very efficient. I'm trying to imagine a good nested list object, or maybe something else. Perhaps a better answer will come along.

That being said, something like this might work, though I'd probably test some edge cases. It just goes down the rows and stores previous headings that will need to be used. The indents 0 to 4 are stored in index positions 1 to 5 in the stored_headings vector.

contents <- c(
  "Heading 1", "Subheading 1", "Item 1", "Item 2", "Subheading 2", "Item 1",
  "Item 2", "Heading 2", "Subheading 1", "Item 1", "Item 2"
)
indents <- c(0, 1, 2, 2, 1, 2, 2, 0, 2, 4, 4)
df <- data.frame(indents, contents)

# Store up to 5 levels of headings in memory (in this case for 0 through 4)
# Use extra level at end to avoid error in assignment below,
# so we use a vector of length 6
stored_headings <- rep(NA, 6) |> as.character()

# Logic according to indent level, using +1 on everything to
# agree with R indexing starting at 1
for (i in seq_len(nrow(df))) {

  # Store heading at appropriate level in stored_headings
  stored_headings[df$indents[i] + 1] <- df$contents[i]

  # Remove all headings below (greater indent) than current
  stored_headings[(df$indents[i] + 1 + 1):length(stored_headings)] <- NA

  # Concatenate headings, removing NA headings between or below
  df$headings[i] <- stored_headings[!is.na(stored_headings)] |>
    paste(collapse = " | ")

}

df

   indents     contents                          headings
1        0    Heading 1                         Heading 1
2        1 Subheading 1          Heading 1 | Subheading 1
3        2       Item 1 Heading 1 | Subheading 1 | Item 1
4        2       Item 2 Heading 1 | Subheading 1 | Item 2
5        1 Subheading 2          Heading 1 | Subheading 2
6        2       Item 1 Heading 1 | Subheading 2 | Item 1
7        2       Item 2 Heading 1 | Subheading 2 | Item 2
8        0    Heading 2                         Heading 2
9        2 Subheading 1          Heading 2 | Subheading 1
10       4       Item 1 Heading 2 | Subheading 1 | Item 1
11       4       Item 2 Heading 2 | Subheading 1 | Item 2

AnilGoyal · Answer

I don't agree with another answer that the data is not tidy. Though the item levels are to be defined at least once. The below strategy (by adding one extra data.frame for definition of levels) will work for any number of levels (I am using purrr which is one of my fav libs in tidyverse/R).

Note that I am adding two sub-Items for demonstration and one extra line defining the level of those.

levels <- data.frame(
  content = c("Heading", "Subheading", "Item", "Subitem"),
  level = c(1:4)
)

library(tidyverse)

df %>% 
# Create level
  separate(contents, into = c("content", "number"), remove = FALSE) %>% 
  left_join(levels, by = "content") %>% 
  mutate(headings = accumulate2(contents, level, .init = first(contents), ~ {
    # If main heading
    if (..3 == 1){
      list(..2)
    } else {
    # Else remove last element and append in remaining
        c(..1[1:(..3 -1)], ..2)
    }
  })[-1],
  # Mutate again to paste as desired
  headings = map(headings, ~ paste(unlist(.x), collapse = " | "))) %>% 
  # Remove unwanted columns
  select(-c("content", "number", "level"))
#>    indents     contents                                      headings
#> 1        0    Heading 1                                     Heading 1
#> 2        1 Subheading 1                      Heading 1 | Subheading 1
#> 3        2       Item 1             Heading 1 | Subheading 1 | Item 1
#> 4        2       Item 2             Heading 1 | Subheading 1 | Item 2
#> 5        1 Subheading 2                      Heading 1 | Subheading 2
#> 6        2       Item 1             Heading 1 | Subheading 2 | Item 1
#> 7        2       Item 2             Heading 1 | Subheading 2 | Item 2
#> 8        0    Heading 2                                     Heading 2
#> 9        2 Subheading 1                      Heading 2 | Subheading 1
#> 10       4       Item 1             Heading 2 | Subheading 1 | Item 1
#> 11       4       Item 2             Heading 2 | Subheading 1 | Item 2
#> 12       8    Subitem 1 Heading 2 | Subheading 1 | Item 2 | Subitem 1
#> 13       8    Subitem 2 Heading 2 | Subheading 1 | Item 2 | Subitem 2

^{Created on 2024-02-07 with reprex v2.0.2}

Conditionally concatenate strings in vector based on a value in secondary vector in R

Tags:

iteration

r

excel

purrr

user23357415

2 Answers

knitz3

AnilGoyal

Recent Activity

Donate For Us

Conditionally concatenate strings in vector based on a value in secondary vector in R

Tags:

iteration

r

excel

purrr

user23357415

2 Answers

knitz3

AnilGoyal

Related questions

Recent Activity

Donate For Us