I am working with reports in the form of Excel spreadsheets, and they have headings with varying indent levels (example results from tidyxl below). The indent levels can also be inconsistent between heading groups as in the example of Heading 1 versus Heading 2.
contents <- c('Heading 1', 'Subheading 1', 'Item 1', 'Item 2', 'Subheading 2', 'Item 1', 'Item 2', 'Heading 2','Subheading 1','Item 1','Item 2')
indents <- c(0,1,2,2,1,2,2,0,2,4,4)
df <- data.frame(indents,contents)
indents contents
1 0 Heading 1
2 1 Subheading 1
3 2 Item 1
4 2 Item 2
5 1 Subheading 2
6 2 Item 1
7 2 Item 2
8 0 Heading 2
9 2 Subheading 1
10 4 Item 1
11 4 Item 2
I would like to produce a vector of headings where headings are concatenated together, like is shown below under 'headings.'
headings <- c('Heading 1','Heading 1 | Subheading 1','Heading 1 | Subheading 1 | Item 1','Heading 1 | Subheading 1 | Item 2','Heading 1 | Subheading 2','Heading 1 | Subheading 2 | Item 1','Heading 1 | Subheading 2 | Item 2','Heading 2','Heading 2 | Subheading 1','Heading 2 | Subheading 1 | Item 1','Heading 2 | Subheading 1 | Item 2')
data.frame(indents,contents, headings)
indents contents headings
1 0 Heading 1 Heading 1
2 1 Subheading 1 Heading 1 | Subheading 1
3 2 Item 1 Heading 1 | Subheading 1 | Item 1
4 2 Item 2 Heading 1 | Subheading 1 | Item 2
5 1 Subheading 2 Heading 1 | Subheading 2
6 2 Item 1 Heading 1 | Subheading 2 | Item 1
7 2 Item 2 Heading 1 | Subheading 2 | Item 2
8 0 Heading 2 Heading 2
9 2 Subheading 1 Heading 2 | Subheading 1
10 4 Item 1 Heading 2 | Subheading 1 | Item 1
11 4 Item 2 Heading 2 | Subheading 1 | Item 2
I have searched for a solution but haven't found anything that does what I need. I'm imagining a paste or paste0 call within a loop or apply function may do it but I haven't had any luck thus far.
This doesn't feel very efficient. I'm trying to imagine a good nested list object, or maybe something else. Perhaps a better answer will come along.
That being said, something like this might work, though I'd probably test some edge cases. It just goes down the rows and stores previous headings that will need to be used. The indents 0 to 4 are stored in index positions 1 to 5 in the stored_headings vector.
contents <- c(
"Heading 1", "Subheading 1", "Item 1", "Item 2", "Subheading 2", "Item 1",
"Item 2", "Heading 2", "Subheading 1", "Item 1", "Item 2"
)
indents <- c(0, 1, 2, 2, 1, 2, 2, 0, 2, 4, 4)
df <- data.frame(indents, contents)
# Store up to 5 levels of headings in memory (in this case for 0 through 4)
# Use extra level at end to avoid error in assignment below,
# so we use a vector of length 6
stored_headings <- rep(NA, 6) |> as.character()
# Logic according to indent level, using +1 on everything to
# agree with R indexing starting at 1
for (i in seq_len(nrow(df))) {
# Store heading at appropriate level in stored_headings
stored_headings[df$indents[i] + 1] <- df$contents[i]
# Remove all headings below (greater indent) than current
stored_headings[(df$indents[i] + 1 + 1):length(stored_headings)] <- NA
# Concatenate headings, removing NA headings between or below
df$headings[i] <- stored_headings[!is.na(stored_headings)] |>
paste(collapse = " | ")
}
df
indents contents headings
1 0 Heading 1 Heading 1
2 1 Subheading 1 Heading 1 | Subheading 1
3 2 Item 1 Heading 1 | Subheading 1 | Item 1
4 2 Item 2 Heading 1 | Subheading 1 | Item 2
5 1 Subheading 2 Heading 1 | Subheading 2
6 2 Item 1 Heading 1 | Subheading 2 | Item 1
7 2 Item 2 Heading 1 | Subheading 2 | Item 2
8 0 Heading 2 Heading 2
9 2 Subheading 1 Heading 2 | Subheading 1
10 4 Item 1 Heading 2 | Subheading 1 | Item 1
11 4 Item 2 Heading 2 | Subheading 1 | Item 2
I don't agree with another answer that the data is not tidy. Though the item levels are to be defined at least once. The below strategy (by adding one extra data.frame for definition of levels) will work for any number of levels (I am using purrr which is one of my fav libs in tidyverse/R).
Note that I am adding two sub-Items for demonstration and one extra line defining the level of those.
levels <- data.frame(
content = c("Heading", "Subheading", "Item", "Subitem"),
level = c(1:4)
)
library(tidyverse)
df %>%
# Create level
separate(contents, into = c("content", "number"), remove = FALSE) %>%
left_join(levels, by = "content") %>%
mutate(headings = accumulate2(contents, level, .init = first(contents), ~ {
# If main heading
if (..3 == 1){
list(..2)
} else {
# Else remove last element and append in remaining
c(..1[1:(..3 -1)], ..2)
}
})[-1],
# Mutate again to paste as desired
headings = map(headings, ~ paste(unlist(.x), collapse = " | "))) %>%
# Remove unwanted columns
select(-c("content", "number", "level"))
#> indents contents headings
#> 1 0 Heading 1 Heading 1
#> 2 1 Subheading 1 Heading 1 | Subheading 1
#> 3 2 Item 1 Heading 1 | Subheading 1 | Item 1
#> 4 2 Item 2 Heading 1 | Subheading 1 | Item 2
#> 5 1 Subheading 2 Heading 1 | Subheading 2
#> 6 2 Item 1 Heading 1 | Subheading 2 | Item 1
#> 7 2 Item 2 Heading 1 | Subheading 2 | Item 2
#> 8 0 Heading 2 Heading 2
#> 9 2 Subheading 1 Heading 2 | Subheading 1
#> 10 4 Item 1 Heading 2 | Subheading 1 | Item 1
#> 11 4 Item 2 Heading 2 | Subheading 1 | Item 2
#> 12 8 Subitem 1 Heading 2 | Subheading 1 | Item 2 | Subitem 1
#> 13 8 Subitem 2 Heading 2 | Subheading 1 | Item 2 | Subitem 2
Created on 2024-02-07 with reprex v2.0.2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With