Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count how many times a value appears and adding the result to a column

I have this data frame:

   Generacion 1 2 3 4 5 6 NP1 NP2 NP3 NP4 NP5 NP6
1:          1 0 0 0 0 0 0   4   4   4   4   5   5
2:          2 0 0 0 0 0 0   4   4   4   4   4   4
3:          3 0 0 0 0 0 0   5   5   5   5   5   5
4:          4 0 0 0 0 0 0   4   5   5   5   4   4
5:          5 0 0 0 0 0 0   5   4   4   4   4   4
6:          6 0 0 0 0 0 0   5   5   5   5   4   4

I want to modify columns 1 through 6 such that each column counts the occurrences of that value in the the right columns (NP1 - NP6). That is, the 4 column should count the number of times 4 occurs. I wish to repeat this process with every number. The number that can take values between 0 and 5. The final result should be like this:

head(t2 %>% select(1, 2, 3, 4, 5, 6, 7, NP1, NP2, NP3, NP4, NP5, NP6))
   Generacion 1 2 3 4 5 6 NP1 NP2 NP3 NP4 NP5 NP6
1:          1 0 0 0 4 2 0   4   4   4   4   5   5
2:          2 0 0 0 6 0 0   4   4   4   4   4   4
3:          3 0 0 0 0 6 0   5   5   5   5   5   5
4:          4 0 0 0 3 3 0   4   5   5   5   4   4
5:          5 0 0 0 5 1 0   5   4   4   4   4   4
6:          6 0 0 0 2 4 0   5   5   5   5   4   4

I have tried using the package data.table, I have done the following:

 t2[NP1 == 4]$`4` <- t2[NP1 == 4]$`4` + 1

But I had the following error:

Error in [<-.data.table(*tmp*, NP1 == 4, value = c(1, 1, 1, 1)) : Can't assign to the same column twice in the same query (duplicates detected).

So I have 2 questions:

  • Why do I get this error?
  • Is there an easier, more intuitive way to do it?
like image 663
Qiyao Avatar asked Sep 18 '21 17:09

Qiyao


People also ask

How do you count how many times something shows up in a column?

Use the COUNTIF function to count how many times a particular value appears in a range of cells.

How do you count the occurrence of a value in a column?

You can use the =UNIQUE() and =COUNTIF() functions to count the number of occurrences of different values in a column in Excel.

How do you count occurrences of items in a list excel?

Use the =Countif function to count the number of times each unique entry appears in the original list.


4 Answers

With data.table:

library(data.table)

setDT(t2)

t2[,as.character(1:6):=lapply(1:6, function(n) rowSums(.SD==n)),.SDcols=NP1:NP6][]

#   Generacion 1 2 3 4 5 6 NP1 NP2 NP3 NP4 NP5 NP6
#1:          1 0 0 0 4 2 0   4   4   4   4   5   5
#2:          2 0 0 0 6 0 0   4   4   4   4   4   4
#3:          3 0 0 0 0 6 0   5   5   5   5   5   5
#4:          4 0 0 0 3 3 0   4   5   5   5   4   4
#5:          5 0 0 0 5 1 0   5   4   4   4   4   4
#6:          6 0 0 0 2 4 0   5   5   5   5   4   4

Data:

t2 <- read.table(text=
"Generacion 1 2 3 4 5 6 NP1 NP2 NP3 NP4 NP5 NP6
          1 0 0 0 0 0 0   4   4   4   4   5   5
          2 0 0 0 0 0 0   4   4   4   4   4   4
          3 0 0 0 0 0 0   5   5   5   5   5   5
          4 0 0 0 0 0 0   4   5   5   5   4   4
          5 0 0 0 0 0 0   5   4   4   4   4   4
          6 0 0 0 0 0 0   5   5   5   5   4   4",header=T)

colnames(t2) <- c('Generacion','1','2','3','4','5','6','NP1','NP2','NP3','NP4','NP5','NP6')
like image 57
Waldi Avatar answered Oct 16 '22 16:10

Waldi


One option using dplyr could be (data imported with corrected column names):

df %>%
    mutate(across(X1:X6, ~ rowSums(across(NP1:NP6) == as.numeric(sub("\\D+", "", cur_column())))))

   Generacion X1 X2 X3 X4 X5 X6 NP1 NP2 NP3 NP4 NP5 NP6
1:          1  0  0  0  4  2  0   4   4   4   4   5   5
2:          2  0  0  0  6  0  0   4   4   4   4   4   4
3:          3  0  0  0  0  6  0   5   5   5   5   5   5
4:          4  0  0  0  3  3  0   4   5   5   5   4   4
5:          5  0  0  0  5  1  0   5   4   4   4   4   4
6:          6  0  0  0  2  4  0   5   5   5   5   4   4

If you want to use column names containing only numbers:

df %>%
    mutate(across(`1`:`6`, ~ rowSums(across(NP1:NP6) == as.numeric(cur_column()))))

 Generacion 1 2 3 4 5 6 NP1 NP2 NP3 NP4 NP5 NP6
1          1 0 0 0 4 2 0   4   4   4   4   5   5
2          2 0 0 0 6 0 0   4   4   4   4   4   4
3          3 0 0 0 0 6 0   5   5   5   5   5   5
4          4 0 0 0 3 3 0   4   5   5   5   4   4
5          5 0 0 0 5 1 0   5   4   4   4   4   4
6          6 0 0 0 2 4 0   5   5   5   5   4   4
like image 36
tmfmnk Avatar answered Oct 16 '22 17:10

tmfmnk


First, get the columns that must be equal to a integer and the corresponding columns with those integers as names.

This part of the code is common to both solutions below.

cols_to_add <- grep("^NP", names(t2), value = TRUE)
cols_to_change <- match(gsub("[^[:digit:]]", "", cols_to_add), names(t2)[-1])

Base R

The simplest is, in my opinion, base R function rowSums.

t2[as.character(cols_to_change)] <- lapply(cols_to_change, \(x) rowSums(t2[cols_to_add] == x))
t2
#  Generacion 1 2 3 4 5 6 NP1 NP2 NP3 NP4 NP5 NP6
#1          1 0 0 0 4 2 0   4   4   4   4   5   5
#2          2 0 0 0 6 0 0   4   4   4   4   4   4
#3          3 0 0 0 0 6 0   5   5   5   5   5   5
#4          4 0 0 0 3 3 0   4   5   5   5   4   4
#5          5 0 0 0 5 1 0   5   4   4   4   4   4
#6          6 0 0 0 2 4 0   5   5   5   5   4   4

Package data.table.

Here is a data.table solution, also with a lapply loop.

library(data.table)

setDT(t2)
t2[, as.character(cols_to_change) := lapply(
  cols_to_change, \(x) rowSums(.SD == x)), 
  .SDcols = cols_to_add]
t2
#   Generacion 1 2 3 4 5 6 NP1 NP2 NP3 NP4 NP5 NP6
#1:          1 0 0 0 4 2 0   4   4   4   4   5   5
#2:          2 0 0 0 6 0 0   4   4   4   4   4   4
#3:          3 0 0 0 0 6 0   5   5   5   5   5   5
#4:          4 0 0 0 3 3 0   4   5   5   5   4   4
#5:          5 0 0 0 5 1 0   5   4   4   4   4   4
#6:          6 0 0 0 2 4 0   5   5   5   5   4   4
like image 4
Rui Barradas Avatar answered Oct 16 '22 16:10

Rui Barradas


A tidyverse solution:

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(starts_with("NP")) %>% 
  count(Generacion, value)%>% 
  rbind(expand.grid(Generacion = 1:nrow(df), value = 1:6, n = 0)) %>%
  group_by(Generacion, value) %>% summarise(n = sum(n))%>%
  pivot_wider(id_cols = Generacion, names_from = value, values_from = n) %>%
  bind_cols(df %>% select(NP1:NP6))

# A tibble: 6 x 13
# Groups:   Generacion [6]
  Generacion   `1`   `2`   `3`   `4`   `5`   `6`   NP1   NP2   NP3   NP4   NP5   NP6
       <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> <int> <int> <int> <int>
1          1     0     0     0     4     2     0     4     4     4     4     5     5
2          2     0     0     0     6     0     0     4     4     4     4     4     4
3          3     0     0     0     0     6     0     5     5     5     5     5     5
4          4     0     0     0     3     3     0     4     5     5     5     4     4
5          5     0     0     0     5     1     0     5     4     4     4     4     4
6          6     0     0     0     2     4     0     5     5     5     5     4     4
like image 1
Daniel Avatar answered Oct 16 '22 16:10

Daniel