I have a question about dynamic memory allocation.
Context: I'm writing a program that reads a text file of words and counts the frequency with which each word occurs (one word per line).
This particular function reads the file, counts the lines and characters, then dynamically allocates memory to the array of string pointers, an array storing the count of characters for each line and the strings themselves. (The other parts are less directly relevant to my question).
Question: How often should I reallocate memory if I run out of space? I set a constant ("memstart") for setting the initial memory allocation value. In the below code snippet I realloc for every line over the value of "memstart". Would the program process faster if a reallocated a larger block of memory instead of increasing the memory space by 1 "variable type" each time?
What would be best practice for something like this?
Code Snip:
int read_alloc(FILE* fin, FILE *tmp, char **wdp, int *sz){
int line_cnt= 0, chr, let=1;
do{
chr=getc(fin);
let++;
//count characters
if(chr!=EOF){
chr=tolower(chr);
fputc(chr, tmp);
}
//convert to lcase and write to temp file
if ('\n' == chr || chr==EOF){
sz[(line_cnt)]=((let)*sizeof(char)); //save size needed to store string in array
*(wdp+(line_cnt))=malloc((let)*sizeof(char)); //allocate space for the string
if ((line_cnt-1) >= memstart){
realloc(wdp, (sizeof(wdp)*(memstart+line_cnt))); //if more space needed increase size
realloc(sz, (sizeof(sz)*(memstart+line_cnt)));
}
line_cnt++;
let=1;
}
} while (EOF != chr);
return (line_cnt);
}
The point to note is that realloc() should only be used for dynamically allocated memory. If the memory is not dynamically allocated, then behavior is undefined. For example, program 1 demonstrates incorrect use of realloc() and program 2 demonstrates correct use of realloc().
It's perfectly safe to use realloc . It is the way to reallocate memory in a C program.
Will I lose my data? No, the data will be copied for you into the new block that the returned p points at, before the old block is freed. This all happens before realloc returns, so the new p points to your data still.
"There is no reason for realloc()ing to a smaller size will fail." is an assertion without evidence. As the spec for the Standard C library does not require a reduction to never fail, robust code would not assume an error is not possible, even if unlikely.
The classic answer is to double each time, but a factor of 1.5 might be better. The important bit is that you multiply your array size by some factor each time, rather than adding additional space each time.
Each re-allocation might need to copy the previous array into a new one. We'd like to minimize these copies. If we will be adding n
items, and we start with an array of size a
, increase by a factor of r
each re-allocation, to end with a value of n
, the sequence of (re-)allocations will be a, ar, ar^2, ar^3, ..., n. The sum of that sequence is (nr-a)/(r-1). Thus the total space is of order O(n).
Suppose instead we start with a, and this time add r each time. The sequence is a, a+r, a+2r, a+3r, ..., n. The sum of that sequence will be 0.5*((n^2-a^2)/r + a + n). In this case the total space of order O(n^2). Much worse!
With a constant factor of 2, the array will be in the worse case 1/2 empty. That's probably ok. You can always shrink the allocation when you're done and know the final size.
As pointed out in another answer, there are several bugs in the manner in which you call realloc()
, but that wasn't the question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With