Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing sum of chunks of array through one pass

Tags:

c#

algorithm

Let's say I have the array

1,2,3,4,5,6,7,8,9,10,11,12

if my chunck size = 4

then I want to be able to have a method that will output an array of ints int[] a =

a[0] = 1
a[1] = 3
a[2] = 6
a[3] = 10
a[4] = 14
a[5] = 18
a[6] = 22
a[7] = 26
a[8] = 30
a[9] = 34
a[10] = 38
a[11] = 42

note that a[n] = a[n] + a[n-1] + a[n-2] + a[n-3] because the chunk size is 4 thus I sum the last 4 items

I need to have the method without a nested loop

 for(int i=0; i<12; i++)
 {
     for(int k = i; k>=0 ;k--)
     {
         // do sumation
         counter++;
         if(counter==4)
           break;
     }
 }

for example i don't want to have something like that... in order to make code more efficient

also the chunck size may change so I cannot do:

a[3] = a[0] + a[1] + a[2] + a[3]

edit

The reason why I asked this question is because I need to implement check sum rolling for my data structures class. I basically open a file for reading. I then have a byte array. then I will perform a hash function on parts of the file. lets say the file is 100 bytes. I split it in chunks of 10 bytes. I perform a hash function in each chunck thus I get 10 hashes. then I need to compare those hashes with a second file that is similar. let's say the second file has the same 100 bytes but with an additional 5 so it contains a total of 105 bytes. becasuse those extra bytes may have been in the middle of the file if I perform the same algorithm that I did on the first file it is not going to work. Hope I explain my self correctly. and because some files are large. it is not efficient to have a nested loop in my algorithm.

also the real rolling hashing functions are very complex. Most of them are in c++ and I have a hard time understanding them. That's why I want to create my own hashing function very simple just to demonstrate how check sum rolling works...

Edit 2

        int chunckSize = 4;

        int[] a = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12 }; // the bytes of the file
        int[] b = new int[a.Length]; // array where we will place the checksums
        int[] sum = new int[a.Length]; // array needed to avoid nested loop

        for (int i = 0; i < a.Length; i++)
        {
            int temp = 0;
            if (i == 0)
            {
                temp = 1;
            }

            sum[i] += a[i] + sum[i-1+temp];

            if (i < chunckSize)
            {
                b[i] = sum[i];
            }
            else
            {
                b[i] = sum[i] - sum[i - chunckSize];
            }

        }

the problem with this algorithm is that with large files the sum will at some point be larger than int.Max thus it is not going to work....

but at least know it is more efficient. getting rid of that nested loop helped a lot!

edit 3

Based on edit two I have worked this out. It does not work with large files and also the checksum algorithm is very bad. but at least I think it explains the hashing rolling that I am trying to explain...

    Part1(@"A:\fileA.txt");
    Part2(@"A:\fileB.txt", null);

.....

    // split the file in chuncks and return the checksums of the chuncks
    private static UInt64[] Part1(string file)
    {
        UInt64[] hashes = new UInt64[(int)Math.Pow(2, 20)];

        var stream = File.OpenRead(file);


        int chunckSize = (int)Math.Pow(2, 22); // 10 => kilobite   20 => megabite  30 => gigabite etc..
        byte[] buffer = new byte[chunckSize];

        int bytesRead;    // how many bytes where read
        int counter = 0;  // counter

        while ( // while bytesRead > 0
                    (bytesRead =
                        (stream.Read(buffer, 0, buffer.Length)) // returns the number of bytes read or 0 if no bytes read
                    ) > 0)
        {                
            hashes[counter] = 0;

            for (int i = 0; i < bytesRead; i++)
            {
                hashes[counter] = hashes[counter] + buffer[i]; // simple algorithm not realistic to perform check sum of file                    
            }
            counter++;

        }// end while loop     

        return hashes;
    }



    // split the file in chuncks rolling it. In reallity this file will be on a different computer..       
    private static void Part2(string file, UInt64[] hash)
    {            

        UInt64[] hashes = new UInt64[(int)Math.Pow(2, 20)];

        var stream = File.OpenRead(file);

        int chunckSize = (int)Math.Pow(2, 22); // chunks must be as big as in pervious method
        byte[] buffer = new byte[chunckSize];

        int bytesRead;    // how many bytes where read
        int counter = 0;  // counter

        UInt64[] sum = new UInt64[(int)Math.Pow(2, 20)];

        while ( // while bytesRead > 0
                    (bytesRead =
                        (stream.Read(buffer, 0, buffer.Length)) // returns the number of bytes read or 0 if no bytes read
                    ) > 0)
        {

            for (int i = 0; i < bytesRead; i++)
            {
                int temp = 0;
                if (counter == 0)
                    temp = 1;

                sum[counter] += (UInt64)buffer[i] + sum[counter - 1 + temp];

                if (counter < chunckSize)
                {
                    hashes[counter] = (UInt64)sum[counter];
                }else
                {
                    hashes[counter] = (UInt64)sum[counter] - (UInt64)sum[counter - chunckSize];
                }
                counter++;                    
            }



        }// end while loop

        // mising to compare hashes arrays
    }
like image 598
Tono Nam Avatar asked Nov 25 '25 14:11

Tono Nam


1 Answers

Add an array r for the result, and initialize its first chunk members using a loop from 0 to chunk-1. Now observe that to get r[i+1] you can add a[i+1] to r[i], and subtract a[i-chunk+1]. Now you can do the rest of the items in a single non-nested loop:

for (int i=chunk+1 ; i < N-1 ; i++) {
    r[i+1] = a[i+1] + r[i] - a[i-chunk+1];
}
like image 68
Sergey Kalinichenko Avatar answered Nov 27 '25 03:11

Sergey Kalinichenko



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!