Let's say I have the array
1,2,3,4,5,6,7,8,9,10,11,12
if my chunck size = 4
then I want to be able to have a method that will output an array of ints int[] a =
a[0] = 1
a[1] = 3
a[2] = 6
a[3] = 10
a[4] = 14
a[5] = 18
a[6] = 22
a[7] = 26
a[8] = 30
a[9] = 34
a[10] = 38
a[11] = 42
note that a[n] = a[n] + a[n-1] + a[n-2] + a[n-3] because the chunk size is 4 thus I sum the last 4 items
I need to have the method without a nested loop
for(int i=0; i<12; i++)
{
for(int k = i; k>=0 ;k--)
{
// do sumation
counter++;
if(counter==4)
break;
}
}
for example i don't want to have something like that... in order to make code more efficient
also the chunck size may change so I cannot do:
a[3] = a[0] + a[1] + a[2] + a[3]
The reason why I asked this question is because I need to implement check sum rolling for my data structures class. I basically open a file for reading. I then have a byte array. then I will perform a hash function on parts of the file. lets say the file is 100 bytes. I split it in chunks of 10 bytes. I perform a hash function in each chunck thus I get 10 hashes. then I need to compare those hashes with a second file that is similar. let's say the second file has the same 100 bytes but with an additional 5 so it contains a total of 105 bytes. becasuse those extra bytes may have been in the middle of the file if I perform the same algorithm that I did on the first file it is not going to work. Hope I explain my self correctly. and because some files are large. it is not efficient to have a nested loop in my algorithm.
also the real rolling hashing functions are very complex. Most of them are in c++ and I have a hard time understanding them. That's why I want to create my own hashing function very simple just to demonstrate how check sum rolling works...
int chunckSize = 4;
int[] a = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12 }; // the bytes of the file
int[] b = new int[a.Length]; // array where we will place the checksums
int[] sum = new int[a.Length]; // array needed to avoid nested loop
for (int i = 0; i < a.Length; i++)
{
int temp = 0;
if (i == 0)
{
temp = 1;
}
sum[i] += a[i] + sum[i-1+temp];
if (i < chunckSize)
{
b[i] = sum[i];
}
else
{
b[i] = sum[i] - sum[i - chunckSize];
}
}
the problem with this algorithm is that with large files the sum will at some point be larger than int.Max thus it is not going to work....
but at least know it is more efficient. getting rid of that nested loop helped a lot!
Based on edit two I have worked this out. It does not work with large files and also the checksum algorithm is very bad. but at least I think it explains the hashing rolling that I am trying to explain...
Part1(@"A:\fileA.txt");
Part2(@"A:\fileB.txt", null);
.....
// split the file in chuncks and return the checksums of the chuncks
private static UInt64[] Part1(string file)
{
UInt64[] hashes = new UInt64[(int)Math.Pow(2, 20)];
var stream = File.OpenRead(file);
int chunckSize = (int)Math.Pow(2, 22); // 10 => kilobite 20 => megabite 30 => gigabite etc..
byte[] buffer = new byte[chunckSize];
int bytesRead; // how many bytes where read
int counter = 0; // counter
while ( // while bytesRead > 0
(bytesRead =
(stream.Read(buffer, 0, buffer.Length)) // returns the number of bytes read or 0 if no bytes read
) > 0)
{
hashes[counter] = 0;
for (int i = 0; i < bytesRead; i++)
{
hashes[counter] = hashes[counter] + buffer[i]; // simple algorithm not realistic to perform check sum of file
}
counter++;
}// end while loop
return hashes;
}
// split the file in chuncks rolling it. In reallity this file will be on a different computer..
private static void Part2(string file, UInt64[] hash)
{
UInt64[] hashes = new UInt64[(int)Math.Pow(2, 20)];
var stream = File.OpenRead(file);
int chunckSize = (int)Math.Pow(2, 22); // chunks must be as big as in pervious method
byte[] buffer = new byte[chunckSize];
int bytesRead; // how many bytes where read
int counter = 0; // counter
UInt64[] sum = new UInt64[(int)Math.Pow(2, 20)];
while ( // while bytesRead > 0
(bytesRead =
(stream.Read(buffer, 0, buffer.Length)) // returns the number of bytes read or 0 if no bytes read
) > 0)
{
for (int i = 0; i < bytesRead; i++)
{
int temp = 0;
if (counter == 0)
temp = 1;
sum[counter] += (UInt64)buffer[i] + sum[counter - 1 + temp];
if (counter < chunckSize)
{
hashes[counter] = (UInt64)sum[counter];
}else
{
hashes[counter] = (UInt64)sum[counter] - (UInt64)sum[counter - chunckSize];
}
counter++;
}
}// end while loop
// mising to compare hashes arrays
}
Add an array r for the result, and initialize its first chunk members using a loop from 0 to chunk-1. Now observe that to get r[i+1] you can add a[i+1] to r[i], and subtract a[i-chunk+1]. Now you can do the rest of the items in a single non-nested loop:
for (int i=chunk+1 ; i < N-1 ; i++) {
r[i+1] = a[i+1] + r[i] - a[i-chunk+1];
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With