I'm writing an app which needs to process a large text file (comma-separated with several different types of records - I do not have the power or inclination to change the data storage format). It reads in records (often all the records in the file sequentially, but not always), then the data for each record is passed off for some processing.
Right now this part of the application is single threaded (read a record, process it, read the next record, etc.) I'm thinking it might be more efficient to read records in a queue in one thread, and process them in another thread in small blocks or as they become available.
I have no idea how to start programming something like that, including the data structure that would be necessary or how to implement the multithreading properly. Can anyone give any pointers, or offer other suggestions about how I might improve performance here?
File Input and Output in C 1) Create a variable to represent the file. 2) Open the file and store this "file" with the file variable. 3) Use the fprintf or fscanf functions to write/read from the file.
Such information is stored on the storage device in the form of data file. Thus, data files allow us to store information permanently and to access later on and alter that information whenever necessary. In C, a large number of library functions is available for creating and processing data files.
You might get a benefit if you can balance the time processing records against the time reading records; in which case you could use a producer/consumer setup, for example synchronized queue and a worker (or a few) dequeueing and processing. I might also be tempted to investigate parallel extensions; it is pertty easy to write an IEnumerable<T>
version of your reading code, after which Parallel.ForEach
(or one of the other Parallel
methods) should actually do everything you want; for example:
static IEnumerable<Person> ReadPeople(string path) {
using(var reader = File.OpenText(path)) {
string line;
while((line = reader.ReadLine()) != null) {
string[] parts = line.Split(',');
yield return new Person(parts[0], int.Parse(parts[1]);
}
}
}
Take a look at this tutorial, it contains all you need... These are the microsoft tutorials including code samples for a similiar case as you describe. Your producer fills the queue, while the consumer pops records off.
Creating, starting, and interacting between threads
Synchronizing two threads: a producer and a consumer
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With