Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Intensive file I/O and data processing in C#

Tags:

I'm writing an app which needs to process a large text file (comma-separated with several different types of records - I do not have the power or inclination to change the data storage format). It reads in records (often all the records in the file sequentially, but not always), then the data for each record is passed off for some processing.

Right now this part of the application is single threaded (read a record, process it, read the next record, etc.) I'm thinking it might be more efficient to read records in a queue in one thread, and process them in another thread in small blocks or as they become available.

I have no idea how to start programming something like that, including the data structure that would be necessary or how to implement the multithreading properly. Can anyone give any pointers, or offer other suggestions about how I might improve performance here?

like image 722
We Are All Monica Avatar asked Jan 20 '10 21:01

We Are All Monica


People also ask

How do you execute file input and output in C programming?

File Input and Output in C 1) Create a variable to represent the file. 2) Open the file and store this "file" with the file variable. 3) Use the fprintf or fscanf functions to write/read from the file.

What is data file in C?

Such information is stored on the storage device in the form of data file. Thus, data files allow us to store information permanently and to access later on and alter that information whenever necessary. In C, a large number of library functions is available for creating and processing data files.


2 Answers

You might get a benefit if you can balance the time processing records against the time reading records; in which case you could use a producer/consumer setup, for example synchronized queue and a worker (or a few) dequeueing and processing. I might also be tempted to investigate parallel extensions; it is pertty easy to write an IEnumerable<T> version of your reading code, after which Parallel.ForEach (or one of the other Parallel methods) should actually do everything you want; for example:

static IEnumerable<Person> ReadPeople(string path) {
    using(var reader = File.OpenText(path)) {
        string line;
        while((line = reader.ReadLine()) != null) {
            string[] parts = line.Split(',');
            yield return new Person(parts[0], int.Parse(parts[1]);
        }
    }
}
like image 63
Marc Gravell Avatar answered Oct 13 '22 00:10

Marc Gravell


Take a look at this tutorial, it contains all you need... These are the microsoft tutorials including code samples for a similiar case as you describe. Your producer fills the queue, while the consumer pops records off.

Creating, starting, and interacting between threads

Synchronizing two threads: a producer and a consumer

like image 33
Chris Kannon Avatar answered Oct 13 '22 00:10

Chris Kannon