I have a protein sequence with 100 aminoacids (AA) that can be handled as a data.frame. Each AA has a position and for now all that matters is the position:
Protein <- data.frame(AA = 1:100)
Than I have a data.frame with peptides from the protein (after digestion / sequence breakdown) with Initial and Final position of the AA related to the protein:
df <- data.frame(
Peptides = c("Peptide_A", "Peptide_B", "Peptide_C", "Peptide_D"),
Initial.AA = c(1, 23, 59, 77),
Final.AA = c(18, 58, 70, 100)
)
Output:
Peptides Initial.AA Final.AA
1 Peptide_A 1 18
2 Peptide_B 23 58
3 Peptide_C 59 70
4 Peptide_D 77 100
Inspecting df it´s clear that some AA were not mapped (19:22 and 71:76, total of 10 unmapped AA).
I would like the have as output the total percentual of mapped AA, which in this example is 90% (90 mapped AA from all the peptides / 100 protein AA).
All answers are welcome as always, but tidyverse ones are prefered.
This solution should work even when df$Initial.AA does not start at 1:
library(dplyr)
library(tidyr)
df <-
rowwise(df) |>
mutate(seq = list(seq(Initial.AA, Final.AA, by = 1))) |>
unnest(seq)
1 - sum(!Protein$AA %in% df$seq)/length(Protein$AA)
#> [1] 0.9
Created on 2024-04-16 with reprex v2.1.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With