Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java codility training Genomic-range-query

Tags:

java

algorithm

The task is:

A non-empty zero-indexed string S is given. String S consists of N characters from the set of upper-case English letters A, C, G, T.

This string actually represents a DNA sequence, and the upper-case letters represent single nucleotides.

You are also given non-empty zero-indexed arrays P and Q consisting of M integers. These arrays represent queries about minimal nucleotides. We represent the letters of string S as integers 1, 2, 3, 4 in arrays P and Q, where A = 1, C = 2, G = 3, T = 4, and we assume that A < C < G < T.

Query K requires you to find the minimal nucleotide from the range (P[K], Q[K]), 0 ≤ P[i] ≤ Q[i] < N.

For example, consider string S = GACACCATA and arrays P, Q such that:

P[0] = 0    Q[0] = 8 P[1] = 0    Q[1] = 2 P[2] = 4    Q[2] = 5 P[3] = 7    Q[3] = 7 

The minimal nucleotides from these ranges are as follows:

    (0, 8) is A identified by 1,     (0, 2) is A identified by 1,     (4, 5) is C identified by 2,     (7, 7) is T identified by 4. 

Write a function:

class Solution { public int[] solution(String S, int[] P, int[] Q); }  

that, given a non-empty zero-indexed string S consisting of N characters and two non-empty zero-indexed arrays P and Q consisting of M integers, returns an array consisting of M characters specifying the consecutive answers to all queries.

The sequence should be returned as:

    a Results structure (in C), or     a vector of integers (in C++), or     a Results record (in Pascal), or     an array of integers (in any other programming language). 

For example, given the string S = GACACCATA and arrays P, Q such that:

P[0] = 0    Q[0] = 8 P[1] = 0    Q[1] = 2 P[2] = 4    Q[2] = 5 P[3] = 7    Q[3] = 7 

the function should return the values [1, 1, 2, 4], as explained above.

Assume that:

    N is an integer within the range [1..100,000];     M is an integer within the range [1..50,000];     each element of array P, Q is an integer within the range [0..N − 1];     P[i] ≤ Q[i];     string S consists only of upper-case English letters A, C, G, T. 

Complexity:

    expected worst-case time complexity is O(N+M);     expected worst-case space complexity is O(N),           beyond input storage           (not counting the storage required for input arguments). 

Elements of input arrays can be modified.

My solution is:

class Solution {     public int[] solution(String S, int[] P, int[] Q) {         final  char c[] = S.toCharArray();         final int answer[] = new int[P.length];         int tempAnswer;         char tempC;          for (int iii = 0; iii < P.length; iii++) {             tempAnswer = 4;             for (int zzz = P[iii]; zzz <= Q[iii]; zzz++) {                 tempC = c[zzz];                 if (tempC == 'A') {                     tempAnswer = 1;                     break;                 } else if (tempC == 'C') {                     if (tempAnswer > 2) {                         tempAnswer = 2;                     }                 } else if (tempC == 'G') {                     if (tempAnswer > 3) {                         tempAnswer = 3;                     }                  }             }             answer[iii] = tempAnswer;         }          return answer;     } } 

It is not optimal, I believe it's supposed to be done within one loop, any hint how can I achieve it?

You can check quality of your solution here https://codility.com/train/ test name is Genomic-range-query.

like image 362
pshemek Avatar asked Oct 23 '13 21:10

pshemek


1 Answers

Here is the solution that got 100 out of 100 in codility.com. Please read about prefix sums to understand the solution:

public static int[] solveGenomicRange(String S, int[] P, int[] Q) {         //used jagged array to hold the prefix sums of each A, C and G genoms         //we don't need to get prefix sums of T, you will see why.         int[][] genoms = new int[3][S.length()+1];         //if the char is found in the index i, then we set it to be 1 else they are 0         //3 short values are needed for this reason         short a, c, g;         for (int i=0; i<S.length(); i++) {             a = 0; c = 0; g = 0;             if ('A' == (S.charAt(i))) {                 a=1;             }             if ('C' == (S.charAt(i))) {                 c=1;             }             if ('G' == (S.charAt(i))) {                 g=1;             }             //here we calculate prefix sums. To learn what's prefix sums look at here https://codility.com/media/train/3-PrefixSums.pdf             genoms[0][i+1] = genoms[0][i] + a;             genoms[1][i+1] = genoms[1][i] + c;             genoms[2][i+1] = genoms[2][i] + g;         }          int[] result = new int[P.length];         //here we go through the provided P[] and Q[] arrays as intervals         for (int i=0; i<P.length; i++) {             int fromIndex = P[i];             //we need to add 1 to Q[i],              //because our genoms[0][0], genoms[1][0] and genoms[2][0]             //have 0 values by default, look above genoms[0][i+1] = genoms[0][i] + a;              int toIndex = Q[i]+1;             if (genoms[0][toIndex] - genoms[0][fromIndex] > 0) {                 result[i] = 1;             } else if (genoms[1][toIndex] - genoms[1][fromIndex] > 0) {                 result[i] = 2;             } else if (genoms[2][toIndex] - genoms[2][fromIndex] > 0) {                 result[i] = 3;             } else {                 result[i] = 4;             }         }          return result;     } 
like image 193
codebusta Avatar answered Oct 17 '22 00:10

codebusta