I have this data frame:
Generacion 1 2 3 4 5 6 NP1 NP2 NP3 NP4 NP5 NP6
1: 1 0 0 0 0 0 0 4 4 4 4 5 5
2: 2 0 0 0 0 0 0 4 4 4 4 4 4
3: 3 0 0 0 0 0 0 5 5 5 5 5 5
4: 4 0 0 0 0 0 0 4 5 5 5 4 4
5: 5 0 0 0 0 0 0 5 4 4 4 4 4
6: 6 0 0 0 0 0 0 5 5 5 5 4 4
I want to modify columns 1
through 6
such that each column counts the occurrences of that value in the the right columns (NP1
- NP6
). That is, the 4
column should count the number of times 4
occurs. I wish to repeat this process with every number. The number that can take values between 0
and 5
. The final result should be like this:
head(t2 %>% select(1, 2, 3, 4, 5, 6, 7, NP1, NP2, NP3, NP4, NP5, NP6))
Generacion 1 2 3 4 5 6 NP1 NP2 NP3 NP4 NP5 NP6
1: 1 0 0 0 4 2 0 4 4 4 4 5 5
2: 2 0 0 0 6 0 0 4 4 4 4 4 4
3: 3 0 0 0 0 6 0 5 5 5 5 5 5
4: 4 0 0 0 3 3 0 4 5 5 5 4 4
5: 5 0 0 0 5 1 0 5 4 4 4 4 4
6: 6 0 0 0 2 4 0 5 5 5 5 4 4
I have tried using the package data.table
, I have done the following:
t2[NP1 == 4]$`4` <- t2[NP1 == 4]$`4` + 1
But I had the following error:
Error in
[<-.data.table
(*tmp*
, NP1 == 4, value = c(1, 1, 1, 1)) : Can't assign to the same column twice in the same query (duplicates detected).
So I have 2 questions:
Use the COUNTIF function to count how many times a particular value appears in a range of cells.
You can use the =UNIQUE() and =COUNTIF() functions to count the number of occurrences of different values in a column in Excel.
Use the =Countif function to count the number of times each unique entry appears in the original list.
With data.table
:
library(data.table)
setDT(t2)
t2[,as.character(1:6):=lapply(1:6, function(n) rowSums(.SD==n)),.SDcols=NP1:NP6][]
# Generacion 1 2 3 4 5 6 NP1 NP2 NP3 NP4 NP5 NP6
#1: 1 0 0 0 4 2 0 4 4 4 4 5 5
#2: 2 0 0 0 6 0 0 4 4 4 4 4 4
#3: 3 0 0 0 0 6 0 5 5 5 5 5 5
#4: 4 0 0 0 3 3 0 4 5 5 5 4 4
#5: 5 0 0 0 5 1 0 5 4 4 4 4 4
#6: 6 0 0 0 2 4 0 5 5 5 5 4 4
Data:
t2 <- read.table(text=
"Generacion 1 2 3 4 5 6 NP1 NP2 NP3 NP4 NP5 NP6
1 0 0 0 0 0 0 4 4 4 4 5 5
2 0 0 0 0 0 0 4 4 4 4 4 4
3 0 0 0 0 0 0 5 5 5 5 5 5
4 0 0 0 0 0 0 4 5 5 5 4 4
5 0 0 0 0 0 0 5 4 4 4 4 4
6 0 0 0 0 0 0 5 5 5 5 4 4",header=T)
colnames(t2) <- c('Generacion','1','2','3','4','5','6','NP1','NP2','NP3','NP4','NP5','NP6')
One option using dplyr
could be (data imported with corrected column names):
df %>%
mutate(across(X1:X6, ~ rowSums(across(NP1:NP6) == as.numeric(sub("\\D+", "", cur_column())))))
Generacion X1 X2 X3 X4 X5 X6 NP1 NP2 NP3 NP4 NP5 NP6
1: 1 0 0 0 4 2 0 4 4 4 4 5 5
2: 2 0 0 0 6 0 0 4 4 4 4 4 4
3: 3 0 0 0 0 6 0 5 5 5 5 5 5
4: 4 0 0 0 3 3 0 4 5 5 5 4 4
5: 5 0 0 0 5 1 0 5 4 4 4 4 4
6: 6 0 0 0 2 4 0 5 5 5 5 4 4
If you want to use column names containing only numbers:
df %>%
mutate(across(`1`:`6`, ~ rowSums(across(NP1:NP6) == as.numeric(cur_column()))))
Generacion 1 2 3 4 5 6 NP1 NP2 NP3 NP4 NP5 NP6
1 1 0 0 0 4 2 0 4 4 4 4 5 5
2 2 0 0 0 6 0 0 4 4 4 4 4 4
3 3 0 0 0 0 6 0 5 5 5 5 5 5
4 4 0 0 0 3 3 0 4 5 5 5 4 4
5 5 0 0 0 5 1 0 5 4 4 4 4 4
6 6 0 0 0 2 4 0 5 5 5 5 4 4
First, get the columns that must be equal to a integer and the corresponding columns with those integers as names.
This part of the code is common to both solutions below.
cols_to_add <- grep("^NP", names(t2), value = TRUE)
cols_to_change <- match(gsub("[^[:digit:]]", "", cols_to_add), names(t2)[-1])
The simplest is, in my opinion, base R function rowSums
.
t2[as.character(cols_to_change)] <- lapply(cols_to_change, \(x) rowSums(t2[cols_to_add] == x))
t2
# Generacion 1 2 3 4 5 6 NP1 NP2 NP3 NP4 NP5 NP6
#1 1 0 0 0 4 2 0 4 4 4 4 5 5
#2 2 0 0 0 6 0 0 4 4 4 4 4 4
#3 3 0 0 0 0 6 0 5 5 5 5 5 5
#4 4 0 0 0 3 3 0 4 5 5 5 4 4
#5 5 0 0 0 5 1 0 5 4 4 4 4 4
#6 6 0 0 0 2 4 0 5 5 5 5 4 4
data.table
.Here is a data.table
solution, also with a lapply
loop.
library(data.table)
setDT(t2)
t2[, as.character(cols_to_change) := lapply(
cols_to_change, \(x) rowSums(.SD == x)),
.SDcols = cols_to_add]
t2
# Generacion 1 2 3 4 5 6 NP1 NP2 NP3 NP4 NP5 NP6
#1: 1 0 0 0 4 2 0 4 4 4 4 5 5
#2: 2 0 0 0 6 0 0 4 4 4 4 4 4
#3: 3 0 0 0 0 6 0 5 5 5 5 5 5
#4: 4 0 0 0 3 3 0 4 5 5 5 4 4
#5: 5 0 0 0 5 1 0 5 4 4 4 4 4
#6: 6 0 0 0 2 4 0 5 5 5 5 4 4
A tidyverse
solution:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(starts_with("NP")) %>%
count(Generacion, value)%>%
rbind(expand.grid(Generacion = 1:nrow(df), value = 1:6, n = 0)) %>%
group_by(Generacion, value) %>% summarise(n = sum(n))%>%
pivot_wider(id_cols = Generacion, names_from = value, values_from = n) %>%
bind_cols(df %>% select(NP1:NP6))
# A tibble: 6 x 13
# Groups: Generacion [6]
Generacion `1` `2` `3` `4` `5` `6` NP1 NP2 NP3 NP4 NP5 NP6
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> <int> <int> <int> <int>
1 1 0 0 0 4 2 0 4 4 4 4 5 5
2 2 0 0 0 6 0 0 4 4 4 4 4 4
3 3 0 0 0 0 6 0 5 5 5 5 5 5
4 4 0 0 0 3 3 0 4 5 5 5 4 4
5 5 0 0 0 5 1 0 5 4 4 4 4 4
6 6 0 0 0 2 4 0 5 5 5 5 4 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With