This is the problem that will never end. The task is to parse a request line in a web server -- of indeterminate length -- in C. I pulled the following off of the web as an example with which to work.
GET /path/script.cgi?field1=value1&field2=value2 HTTP/1.1
I must extract the absolute path: /path/script.cgi
and the query: ?field1=value1&field2=value2
. I'm told the following functions hold the key: strchr
, strcpy
, strncmp
, strncpy
, and/or strstr
.
Here's what has happened so far: I've learned that using functions like strchr
and strstr
will absolutely allow me to truncate the request line at certain points, but will never allow me to get rid of portions of the request line I do not want, and it doesn't matter how I layer them.
For example, here's some code that get's me close to isolating the query, but I can't eliminate the http version.
bool parse(const char* line)
{
// request line w/o method
const char ch = '/';
char* lineptr = strchr(line, ch);
// request line w/ query and HTTP version
char ch_1 = '?';
char* lineptr_1 = strchr(lineptr, ch_1);
// request line w/o query
char ch_2 = ' ';
char* lineptr_2 = strchr(lineptr_1, ch_2);
printf("%s\n", lineptr_2);
if (lineptr_2 != NULL)
return true;
else
return false;
}
Needless to say, I have a similar issue trying to isolate the absolute path (I can ditch the method, but not the ? or anything thereafter), and I see no occasion on which I can use the functions that require me to know a priori how many chars I'd like to copy from one location (usually an array) to another because, when this is run in real time, I will have no clue what the request line will look like in advance. If someone sees something that I am missing and could point me in the right direction, I would be most grateful!
A more elegant solution.
#include <stdio.h>
#include <string.h>
int parse(const char* line)
{
/* Find out where everything is */
const char *start_of_path = strchr(line, ' ') + 1;
const char *start_of_query = strchr(start_of_path, '?');
const char *end_of_query = strchr(start_of_query, ' ');
/* Get the right amount of memory */
char path[start_of_query - start_of_path];
char query[end_of_query - start_of_query];
/* Copy the strings into our memory */
strncpy(path, start_of_path, start_of_query - start_of_path);
strncpy(query, start_of_query, end_of_query - start_of_query);
/* Null terminators (because strncpy does not provide them) */
path[sizeof(path)] = 0;
query[sizeof(query)] = 0;
/*Print */
printf("%s\n", query, sizeof(query));
printf("%s\n", path, sizeof(path));
}
int main(void)
{
parse("GET /path/script.cgi?field1=value1&field2=value2 HTTP/1.1");
return 0;
}
I wrote some functions in C a while back that manually parse c-strings up to a delimiter, similar to getline in C++.
// Trims all leading whitespace along with consecutive whitespace from provided cstring into destination char*. WARNING: ensure size <= sizeof(destination)
void Trim(char* destination, char* source, int size)
{
bool trim = true;
int index = 0;
int i;
for (i = 0; i < size; ++i)
{
if (source[i] == '\n' || source[i] == '\0')
{
destination[index++] = '\0';
break;
}
else if (source[i] != ' ' && source[i] != '\t')
{
destination[index++] = source[i];
trim = false;
}
else if (trim)
continue;
else
{
if (index > 0 && destination[index - 1] != ' ')
destination[index++] = ' ';
}
}
}
// Parses text up to the provided delimiter (or newline) into the destination char*. WARNING: ensure size <= sizeof(destination)
void ParseUpToSymbol(char* destination, char* source, int size, char delimiter)
{
int index = 0;
int i;
for (i = 0; i < size; ++i)
{
if (source[i] != delimiter && source[i] != '\n' && source[i] != '\0' && source[i] != ' '))
{
destination[index++] = source[i];
}
else
{
destination[i] = '\0';
break;
}
}
Trim(destination, destination, size);
}
Then you could parse your c-string with something along these lines:
char* buffer = (char*)malloc(64);
char* temp = (char*)malloc(256);
strcpy(temp, "GET /path/script.cgi?field1=value1&field2=value2 HTTP/1.1");
Trim(temp, temp, 256);
ParseUpToSymbol(buffer, cstr, 64, '?');
temp = temp + strlen(buffer) + 1;
Trim(temp, temp, 256);
The code above trims any leading and trailing whitespace from the target string, in this case "GET /path/script.cgi?field1=value1&field2=value2 HTTP/1.1", and then stores the parsed value into the variable buffer. Running this the first time should put the word "GET" inside of buffer. When you do the "temp = temp + strlen(buffer) + 1" you are readjusting the temp char-pointer so you can call ParseUpToSymbol again with the remaining part of the string. If you were to call it again, you should get the absolute path leading up to the first question mark. You could repeat this to get each individual query string or change the delimiter to a space and get the entire query string portion of the URL. I think you get the idea. This is just one of many solutions of course.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With