Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I find a substring within a string in F#?

Tags:

substring

f#

I found a "fun" project online for f# and the idea behind it is to find the number of substrings within a given string.

Here's the prompt:

Description:
You are given a DNA sequence:
a string that contains only characters 'A', 'C', 'G', and 'T'.
Your task is to calculate the number of substrings of sequence,
in which each of the symbols appears the same number of times.

Example 1:
For sequence = "ACGTACGT", the output should be 6
All substrings of length 4 contain each symbol exactly once (+5),
and the whole sequence contains each symbol twice (+1).

Example 2:
For sequence = "AAACCGGTTT", the output should be 1
Only substring "AACCGGTT" satisfies the criterion above: it contains each symbol twice.


Input: String, a sequence that consists only of symbols 'A', 'C', 'G', and 'T'.
Length constraint: 0 < sequence.length < 100000.

Output: Integer, the number of substrings where each symbol appears equally many times.

I'm not exactly sure where to go with this, or more specifically what to do. I've looked around on the internet to try and find what I'm supposed to do and I've only found the following code (I added the input variable, var variable, and changed the show "things" to input then the substring to search for (i hope that makes sense)):

open System

let countSubstring (where :string) (what : string) =
match what with
| "" -> 0
| _ -> (where.Length - where.Replace(what, @"").Length) / what.Length


[<EntryPoint>]
let main argv =

let input = System.Console.ReadLine();
let var = input.Length;
Console.WriteLine(var);
let show where what =
    printfn @"countSubstring(""%s"", ""%s"") = %d" where what (countSubstring where what)
show input "ACGT"
show input "CGTA"
show input "GTAC"
show input "TACG"
0

Anyways, if anyone can help me with this, it would be greatly appreciated.

Thanks in advance

like image 923
Marc Karam Avatar asked Oct 13 '16 01:10

Marc Karam


People also ask

How do you find a specific substring in a string?

To locate a substring in a string, use the indexOf() method. Let's say the following is our string. String str = "testdemo"; Find a substring 'demo' in a string and get the index.

How do you search for a part of a string in Python?

Python String find() method returns the lowest index or first occurrence of the substring if it is found in a given string. If it is not found, then it returns -1. Parameters: sub: Substring that needs to be searched in the given string.

Which string function is used to find the substring from a string?

(Search String for Substring) In the C Programming Language, the strstr function searches within the string pointed to by s1 for the string pointed to by s2. It returns a pointer to the first occurrence in s1 of s2.


1 Answers

First declare a function numberACGT that from a string returns 1 if the number of characters A is the same as C, G and T and 0 otherwise. For this, declare an array N of 4 integers initialized to 0 and run throw the string, incrementing the corresponding counter. In late compare array elements between them.

Then for each sub-string (fixed length multiple of 4) call numberACGT and add to counter count (initialized to 0 at the beginning)

let numberACGT (aString:string) =
    let N = Array.create 4 (0:int)
    let last = aString.Length - 1 
    for i = 0 to last do
        match aString.[i] with
        | 'A' -> N.[0] <- N.[0] + 1
        | 'C' -> N.[1] <- N.[1] + 1
        | 'G' -> N.[2] <- N.[2] + 1
        | _ -> N.[3] <- N.[3] + 1
    if (N.[0] = N.[1]) && (N.[1] = N.[2]) && (N.[2] = N.[3]) then 1 else 0 

let numberSubStrings (aString:string) =
    let mutable count = 0
    let len = aString.Length 
    for k = 1 to len / 4 do //only multiple of 4
        for pos = 0 to len - 4*k do
            count <- count + numberACGT (aString.[pos..pos+4*k-1])
    count

I hope that it is fast enough.

[<EntryPoint>]
let main argv = 
  let stopWatch = System.Diagnostics.Stopwatch.StartNew()
  let input =  Console.ReadLine() in
    printf "%i  " (numberSubStrings input)
  stopWatch.Stop()
  let g =  Console.ReadLine()
  0

Result:

62    4.542700

An new version in O(n²):

let numberSubStringsBis (aString:string) =
    let mutable count = 0 
    let len = aString.Length 
    for pos = 0 to len - 1 do
        let mutable a = 0 
        let mutable  c = 0 
        let mutable g = 0 
        let mutable t = 0 
        let mutable k = pos 
        while k + 3 <= len - 1 do
            for i in [k..k+3] do
                match aString.[i] with
                | 'A' -> a <- a + 1
                | 'C' -> c <- c + 1
                | 'G' -> g <- g + 1
                | _ -> t <- t + 1
            k <- k + 4 
            if a=c && c=g && g=t then count <- count + 1               
    count
like image 110
Jean-Claude Colette Avatar answered Sep 19 '22 15:09

Jean-Claude Colette