I am trying to extract the colnames of a data frame, based on the values in the cells. My data is a series of a couple hundred categories, with a simple binary 0 or 1 in the cells to indicate which column name I want in my new df.
To illustrate my point:
year cat1 cat2 cat3 ... catN
2000 0 0 1 0
2001 1 0 0 0
2002 0 0 0 1
....
2018 0 1 0 0
I am trying to get a df like:
year category
2000 cat3
2001 cat1
2002 catN
....
2018 cat2
My code:
newdf <- as.data.frame(colnames(mydf)[which(mydf == "1", arr.ind = TRUE)[2]])
But alas this only returns one category name!
Any help would be greatly appreciated!
A base R solution:
Using sapply
to find which are the ones and get the names.
out <- data.frame(year = df1$year, category = names(sapply(df1[, -1], function(x) which(x == 1))))
out
year category
1 2000 cat1
2 2001 cat2
3 2002 cat3
4 2018 catN
data:
df1 <- structure(list(year = c(2000L, 2001L, 2002L, 2018L), cat1 = c(0L,
1L, 0L, 0L), cat2 = c(0L, 0L, 0L, 1L), cat3 = c(1L, 0L, 0L, 0L
), catN = c(0L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA,
-4L))
A possible solution is this:
library(tidyverse)
df = data.frame(year = 2000:2002,
cat1 = c(0,0,1),
cat2 = c(1,0,0),
cat3 = c(0,1,0))
df %>%
gather(category, value, -year) %>% # reshape data
filter(value == 1) %>% # keep rows with 1s
select(-value) %>% # remove that column
arrange(year) # order that column (if needed)
# year category
# 1 2000 cat2
# 2 2001 cat3
# 3 2002 cat1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With