Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search for string in text file C

Tags:

c

string

search

The following code reads a text file one character at the time and print it to stdout:

#include <stdio.h>

int main()
{
    char file_to_open[] = "text_file.txt", ch;
    FILE *file_ptr;

    if((file_ptr = fopen(file_to_open, "r")) != NULL)
    {
        while((ch = fgetc(file_ptr)) != EOF)
        {
            putchar(ch);
        }
    }
    else
    {
        printf("Could not open %s\n", file_to_open);
        return 1;
    }
    return(0);
}

But instead of printing to stdout [putchar(ch)] I want to search the file for specific strings provided in another textfile ie. strings.txt and output the line with the match to out.txt

text_file.txt:

1993 - 1999 Pentium
1997 - 1999 Pentium II
1999 - 2003 Pentium III
1998 - 2009 Xeon
2006 - 2009 Intel Core 2

strings.txt:

Nehalem
AMD Athlon
Pentium

In this case the three first lines of text_file.txt would match. I have done some research on file operations in C, and it seems that I can read one character at the time with fgetc [like I do in my code], one line with fgets and one block with fread, but no word as I guess would be perfect in my situation?

like image 961
CHR_1980 Avatar asked Oct 26 '09 22:10

CHR_1980


People also ask

How do you search for a string in a file?

In the if statement we use the strstr() function to search for our string in the content we fetched from the file. If we found a match we display line-number, line and we increase find_result with one. The while loop will continue until we reach the end of the file.

How to search a list of strings in a string array?

You'll probably need to use dynamic allocation for the list of strings. A simple-minded search will simply apply 'strstr ()' searching for each of the required strings in each line of input (making sure to break the loop once you've found a match so a line is not repeated if there are multiple matches on a single line).

How do I search a string in a loop in Python?

1 Read (parts of) the file into memory, then use standard string function strstr()to search the string (in a loop). – pmg

How do I find the number of substrings in a string?

One way is using the fget function and finding substrings in the text. Try something like this: Thanks for contributing an answer to Stack Overflow! Please be sure to answer the question.


3 Answers

I am assuming this is a learning exercise and you are simply looking for a place to start. Otherwise, you should not reinvent the wheel.

The code below should give you an idea of what is involved. It is a program that allows you to specify the name of file to be searched and a single argument to search in that file. You should be able to modify this to put the phrases to search for in an array of strings and check if any of the words in that array appear in any of the lines read.

The key function you are looking for is strstr.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#ifdef DEBUG
#define INITIAL_ALLOC 2
#else
#define INITIAL_ALLOC 512
#endif

char *
read_line(FILE *fin) {
    char *buffer;
    char *tmp;
    int read_chars = 0;
    int bufsize = INITIAL_ALLOC;
    char *line = malloc(bufsize);

    if ( !line ) {
        return NULL;
    }

    buffer = line;

    while ( fgets(buffer, bufsize - read_chars, fin) ) {
        read_chars = strlen(line);

        if ( line[read_chars - 1] == '\n' ) {
            line[read_chars - 1] = '\0';
            return line;
        }

        else {
            bufsize = 2 * bufsize;
            tmp = realloc(line, bufsize);
            if ( tmp ) {
                line = tmp;
                buffer = line + read_chars;
            }
            else {
                free(line);
                return NULL;
            }
        }
    }
    return NULL;
}

int
main(int argc, char *argv[]) {
    FILE *fin;
    char *line;

    if ( argc != 3 ) {
        return EXIT_FAILURE;
    }

    fin = fopen(argv[1], "r");

    if ( fin ) {
        while ( line = read_line(fin) ) {
            if ( strstr(line, argv[2]) ){
                fprintf(stdout, "%s\n", line);
            }
            free(line);
        }
    }

    fclose(fin);
    return 0;
}

Sample output:

E:\Temp> searcher.exe searcher.c char
char *
    char *buffer;
    char *tmp;
    int read_chars = 0;
    char *line = malloc(bufsize);
    while ( fgets(buffer, bufsize - read_chars, fin) ) {
        read_chars = strlen(line);
        if ( line[read_chars - 1] == '\n' ) {
            line[read_chars - 1] = '\0';
                buffer = line + read_chars;
main(int argc, char *argv[]) {
    char *line;
like image 139
Sinan Ünür Avatar answered Sep 27 '22 17:09

Sinan Ünür


Remember: fgetc(), getc(), getchar() all return an integer, not a char. The integer might be EOF or a valid character - but it returns one more value than the range supported by the char type.

You're writing a surrogate for the 'fgrep' command:

fgrep -f strings.txt text_file.txt > out.txt

Instead of reading characters, you are going to need to read lines - using fgets(). (Forget that the gets() function exists!)

I indented your code and inserted a return 0; at the end for you (though C99 does an implicit 'return 0;' if you fall off the end of main()). However, C99 also demands an explicit return type for every function - and I added the 'int' to 'int main()' for you (but you can't use the C99-compliant excuse for not returning 0 at the end). Error messages should be written to standard error rather than standard output.

You'll probably need to use dynamic allocation for the list of strings. A simple-minded search will simply apply 'strstr()' searching for each of the required strings in each line of input (making sure to break the loop once you've found a match so a line is not repeated if there are multiple matches on a single line).

A more sophisticated search would precompute which characters can be ignored so that you can search for all the strings in parallel, skipping through the text faster than the loop-in-a-loop. This might be a modification of a search algorithm such as Boyer-Moore or Knuth-Morris-Pratt (added: or Rabin-Karp which is designed for parallel searching for multiple strings).

like image 27
Jonathan Leffler Avatar answered Sep 27 '22 19:09

Jonathan Leffler


cat strings.txt |while read x; do grep "$x" text_file.txt; done
like image 22
Ewan Todd Avatar answered Sep 27 '22 17:09

Ewan Todd