Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating substrings and random strings in R

Please bear with me, I come from a Python background and I am still learning string manipulation in R.

Ok, so lets say I have a string of length 100 with random A, B, C, or D letters:

> df<-c("ABCBDBDBCBABABDBCBCBDBDBCBDBACDBCCADCDBCDACDDCDACBCDACABACDACABBBCCCBDBDDCACDDACADDDDACCADACBCBDCACD")
> df
[1]"ABCBDBDBCBABABDBCBCBDBDBCBDBACDBCCADCDBCDACDDCDACBCDACABACDACABBBCCCBDBDDCACDDACADDDDACCADACBCBDCACD"

I would like to do the following two things:

1) Generate a '.txt' file that is comprised of 20-length subsections of the above string, each starting one letter after the previous with their own unique name on the line above it, like this:

NAME1
ABCBDBDBCBABABDBCBCB
NAME2
BCBDBDBCBABABDBCBCBD
NAME3
CBDBDBCBABABDBCBCBDB
NAME4
BDBDBCBABABDBCBCBDBD

... and so forth

2) Take that generated list and from it comprise another list that has the same exact substrings with the only difference being a change of one or two of the A, B, C, or Ds to another A, B, C, or D (any of those four letters only).

So, this:

NAME1
ABCBDBDBCBABABDBCBCB

Would become this:

NAME1.1
ABBBDBDBCBDBABDBCBCB

As you can see, the "C" in the third position became a "B" and the "A" in position 11 became a "D", with no implied relationship between those changed letters. Purely random.

I know this is a convoluted question, but like I said, I am still learning basic text and string manipulation in R.

Thanks in advance.

like image 898
tomathon Avatar asked Jun 16 '26 14:06

tomathon


1 Answers

  1. Create a text file of substrings

    n <- 20 # length of substrings
    
    starts <- seq(nchar(df) - 20 + 1)
    
    v1 <- mapply(substr, starts, starts + n - 1, MoreArgs = list(x = df))
    
    names(v1) <- paste0("NAME", seq_along(v1), "\n")
    
    write.table(v1, file = "filename.txt", quote = FALSE, sep = "",
                col.names = FALSE)
    
  2. Randomly replace one or two letters (A-D):

    myfun <- function() {
      idx <- sample(seq(n), sample(1:2, 1))
      rep <- sample(LETTERS[1:4], length(idx), replace = TRUE)
      return(list(idx = idx, rep = rep))
    }
    
    new <- replicate(length(v1), myfun(), simplify = FALSE)
    
    v2 <- mapply(function(x, y, z) paste(replace(x, y, z), collapse = ""),  
                 strsplit(v1, ""),
                 lapply(new, "[[", "idx"),
                 lapply(new, "[[", "rep"))
    
    names(v2) <- paste0(names(v2), ".1")
    
    write.table(v2, file = "filename2.txt", quote = FALSE, sep = "\n", 
                col.names = FALSE)
    
like image 82
Sven Hohenstein Avatar answered Jun 19 '26 05:06

Sven Hohenstein



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!