Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to copy text file to string in C?

Tags:

c

file

copy

I need to copy the contents of a text file to a dynamically-allocated character array.

My problem is getting the size of the contents of the file; Google reveals that I need to use fseek and ftell, but for that the file apparently needs to be opened in binary mode, and that gives only garbage.

EDIT: I tried opening in text mode, but I get weird numbers. Here's the code (I've omitted simple error checking for clarity):

long f_size;
char* code;
size_t code_s, result;
FILE* fp = fopen(argv[0], "r");
fseek(fp, 0, SEEK_END);
f_size = ftell(fp); /* This returns 29696, but file is 85 bytes */
fseek(fp, 0, SEEK_SET);
code_s = sizeof(char) * f_size;
code = malloc(code_s);
result = fread(code, 1, f_size, fp); /* This returns 1045, it should be the same as f_size */
like image 347
Javier Avatar asked Nov 28 '22 23:11

Javier


2 Answers

The root of the problem is here:

FILE* fp = fopen(argv[0], "r");

argv[0] is your executable program, NOT the parameter. It certainly won't be a text file. Try argv[1], and see what happens then.

like image 80
Roddy Avatar answered Dec 06 '22 10:12

Roddy


You cannot determine the size of a file in characters without reading the data, unless you're using a fixed-width encoding.

For example, a file in UTF-8 which is 8 bytes long could be anything from 2 to 8 characters in length.

That's not a limitation of the file APIs, it's a natural limitation of there not being a direct mapping from "size of binary data" to "number of characters."

If you have a fixed-width encoding then you can just divide the size of the file in bytes by the number of bytes per character. ASCII is the most obvious example of this, but if your file is encoded in UTF-16 and you happen to be on a system which treats UTF-16 code points as the "native" internal character type (which includes Java, .NET and Windows) then you can predict the number of "characters" to allocate as if UTF-16 were fixed width. (UTF-16 is variable width due to Unicode characters above U+FFFF being encoded in multiple code points, but a lot of the time developers ignore this.)

like image 26
Jon Skeet Avatar answered Dec 06 '22 10:12

Jon Skeet