I am trying to read in a text file of unknown size into an array of characters. This is what I have so far.
#include<stdio.h>
#include<string.h>
int main()
{
FILE *ptr_file;
char buf[1000];
char output[];
ptr_file =fopen("CodeSV.txt","r");
if (!ptr_file)
return 1;
while (fgets(buf,1000, ptr_file)!=NULL)
strcat(output, buf);
printf("%s",output);
fclose(ptr_file);
printf("%s",output);
return 0;
}
But I do not know how to allocate a size for the output array when I am reading in a file of unknown size. Also when I put in a size for the output say n=1000, I get segmentation fault. I am a very inexperienced programmer any guidance is appreciated :)
The textfile itself is technically a .csv file so the contents look like the following : "0,0,0,1,0,1,0,1,1,0,1..."
The standard way to do this is to use malloc
to allocate an array of some size, and start reading into it, and if you run out of array before you run out of characters (that is, if you don't reach EOF
before filling up the array), pick a bigger size for the array and use realloc
to make it bigger.
Here's how the read-and-allocate loop might look. I've chosen to read input a character at a time using getchar
(rather than a line at a time using fgets
).
int c;
int nch = 0;
int size = 10;
char *buf = malloc(size);
if(buf == NULL)
{
fprintf(stderr, "out of memory\n");
exit(1);
}
while((c = getchar()) != EOF)
{
if(nch >= size-1)
{
/* time to make it bigger */
size += 10;
buf = realloc(buf, size);
if(buf == NULL)
{
fprintf(stderr, "out of memory\n");
exit(1);
}
}
buf[nch++] = c;
}
buf[nch++] = '\0';
printf("\"%s\"", buf);
Two notes about this code:
-1
in if(nch >= size-1)
.I would be remiss if I didn't add to the answers probably one of the most standard ways of reading an unknown number of lines of unknown length from a text file. In C you have two primary methods of character input. (1) character-oriented input (i.e. getchar
, getc
, etc..) and (2) line-oriented input (i.e. fgets
, getline
).
From that mix of functions, the POSIX function getline
by default will allocate sufficient space to read a line of any length (up to the exhaustion of system memory). Further, when reading lines of input, line-oriented input is generally the proper choice.
To read an unknown number of lines, the general approach is to allocate an anticipated number of pointers (in an array of pointers-to-char) and then reallocate as necessary if you end up needing more. If you want to work with the complexities of stringing pointers-to-struct together in a linked-list, that's fine, but it is far simpler to handle an array of strings. (a linked-list is more appropriate when you have a struct with multiple members, rather than a single line)
The process is straight forward. (1) allocate memory for some initial number of pointers (LMAX
below at 255
) and then as each line is read (2) allocate memory to hold the line and copy the line to the array (strdup
is used below which both (a) allocates memory to hold the string, and (b) copies the string to the new memory block returning a pointer to its address)(You assign the pointer returned to your array of strings as array[x]
)
As with any dynamic allocation of memory, you are responsible for keeping track of the memory allocated, preserving a pointer to the start of each allocated block of memory (so you can free it later), and then freeing the memory when it is no longer needed. (Use valgrind
or some similar memory checker to confirm you have no memory errors and have freed all memory you have created)
Below is an example of the approach which simply reads any text file and prints its lines back to stdout
before freeing the memory allocated to hold the file. Once you have read all lines (or while you are reading all lines), you can easily parse your csv input into individual values.
Note: below, when LMAX
lines have been read, the array
is reallocated to hold twice as many as before and the read continues. (You can set LMAX
to 1
if you want to allocate a new pointer for each line, but that is a very inefficient way to handle memory allocation) Choosing some reasonable anticipated starting value, and then reallocating 2X
the current is a standard reallocation approach, but you are free to allocate additional blocks in any size you choose.
Look over the code and let me know if you have any questions.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define LMAX 255
int main (int argc, char **argv) {
if (argc < 2 ) {
fprintf (stderr, "error: insufficient input, usage: %s <filename>\n",
argv[0]);
return 1;
}
char **array = NULL; /* array of pointers to char */
char *ln = NULL; /* NULL forces getline to allocate */
size_t n = 0; /* buf size, 0 use getline default */
ssize_t nchr = 0; /* number of chars actually read */
size_t idx = 0; /* array index for number of lines */
size_t it = 0; /* general iterator variable */
size_t lmax = LMAX; /* current array pointer allocation */
FILE *fp = NULL; /* file pointer */
if (!(fp = fopen (argv[1], "r"))) { /* open file for reading */
fprintf (stderr, "error: file open failed '%s'.", argv[1]);
return 1;
}
/* allocate LMAX pointers and set to NULL. Each of the 255 pointers will
point to (hold the address of) the beginning of each string read from
the file below. This will allow access to each string with array[x].
*/
if (!(array = calloc (LMAX, sizeof *array))) {
fprintf (stderr, "error: memory allocation failed.");
return 1;
}
/* prototype - ssize_t getline (char **ln, size_t *n, FILE *fp)
above we declared: char *ln and size_t n. Why don't they match? Simple,
we will be passing the address of each to getline, so we simply precede
the variable with the urinary '&' which forces an addition level of
dereference making char* char** and size_t size_t *. Now the arguments
match the prototype.
*/
while ((nchr = getline (&ln, &n, fp)) != -1) /* read line */
{
while (nchr > 0 && (ln[nchr-1] == '\n' || ln[nchr-1] == '\r'))
ln[--nchr] = 0; /* strip newline or carriage rtn */
/* allocate & copy ln to array - this will create a block of memory
to hold each character in ln and copy the characters in ln to that
memory address. The address will then be stored in array[idx].
(idx++ just increases idx by 1 so it is ready for the next address)
There is a lot going on in that simple: array[idx++] = strdup (ln);
*/
array[idx++] = strdup (ln);
if (idx == lmax) { /* if lmax lines reached, realloc */
char **tmp = realloc (array, lmax * 2 * sizeof *array);
if (!tmp)
return -1;
array = tmp;
lmax *= 2;
}
}
if (fp) fclose (fp); /* close file */
if (ln) free (ln); /* free memory allocated to ln */
/*
process/use lines in array as needed
(simple print all lines example below)
*/
printf ("\nLines in file:\n\n"); /* print lines in file */
for (it = 0; it < idx; it++)
printf (" array [%3zu] %s\n", it, array[it]);
printf ("\n");
for (it = 0; it < idx; it++) /* free array memory */
free (array[it]);
free (array);
return 0;
}
Use/Output
$ ./bin/getline_rdfile dat/damages.txt
Lines in file:
array [ 0] Personal injury damage awards are unliquidated
array [ 1] and are not capable of certain measurement; thus, the
array [ 2] jury has broad discretion in assessing the amount of
array [ 3] damages in a personal injury case. Yet, at the same
array [ 4] time, a factual sufficiency review insures that the
array [ 5] evidence supports the jury's award; and, although
array [ 6] difficult, the law requires appellate courts to conduct
array [ 7] factual sufficiency reviews on damage awards in
array [ 8] personal injury cases. Thus, while a jury has latitude in
array [ 9] assessing intangible damages in personal injury cases,
array [ 10] a jury's damage award does not escape the scrutiny of
array [ 11] appellate review.
array [ 12]
array [ 13] Because Texas law applies no physical manifestation
array [ 14] rule to restrict wrongful death recoveries, a
array [ 15] trial court in a death case is prudent when it chooses
array [ 16] to submit the issues of mental anguish and loss of
array [ 17] society and companionship. While there is a
array [ 18] presumption of mental anguish for the wrongful death
array [ 19] beneficiary, the Texas Supreme Court has not indicated
array [ 20] that reviewing courts should presume that the mental
array [ 21] anguish is sufficient to support a large award. Testimony
array [ 22] that proves the beneficiary suffered severe mental
array [ 23] anguish or severe grief should be a significant and
array [ 24] sometimes determining factor in a factual sufficiency
array [ 25] analysis of large non-pecuniary damage awards.
Memory Check
$ valgrind ./bin/getline_rdfile dat/damages.txt
==14321== Memcheck, a memory error detector
==14321== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==14321== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==14321== Command: ./bin/getline_rdfile dat/damages.txt
==14321==
Lines in file:
array [ 0] Personal injury damage awards are unliquidated
<snip>
...
array [ 25] analysis of large non-pecuniary damage awards.
==14321==
==14321== HEAP SUMMARY:
==14321== in use at exit: 0 bytes in 0 blocks
==14321== total heap usage: 29 allocs, 29 frees, 3,997 bytes allocated
==14321==
==14321== All heap blocks were freed -- no leaks are possible
==14321==
==14321== For counts of detected and suppressed errors, rerun with: -v
==14321== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With