Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel programming in C#

I'm interested in learning about parallel programming in C#.NET (not like everything there is to know, but the basics and maybe some good-practices), therefore I've decided to reprogram an old program of mine which is called ImageSyncer. ImageSyncer is a really simple program, all it does is to scan trough a folder and find all files ending with .jpg, then it calculates the new position of the files based on the date they were taken (parsing of xif-data, or whatever it's called). After a location has been generated the program checks for any existing files at that location, and if one exist it looks at the last write-time of both the file to copy, and the file "in its way". If those are equal the file is skipped. If not a md5 checksum of both files is created and matched. If there is no match the file to be copied is given a new location to be copied to (for instance, if it was to be copied to "C:\test.jpg" it's copied to "C:\test(1).jpg" instead). The result of this operation is populated into a queue of a struct-type that contains two strings, the original file and the position to copy it to. Then that queue is iterated over untill it is empty and the files are copied.

In other words there are 4 operations:

1. Scan directory for jpegs  
2. Parse files for xif and generate copy-location  
3. Check for file existence and if needed generate new path  
4. Copy files

And so I want to rewrite this program to make it paralell and be able to perform several of the operations at the same time, and I was wondering what the best way to achieve that would be. I've came up with two different models I can think of, but neither one of them might be any good at all. The first one is to parallelize the 4 steps of the old program, so that when step one is to be executed it's done on several threads, and when the entire of step 1 is finished step 2 is began. The other one (which I find more interesting because I have no idea of how to do that) is to create a sort of worker and consumer model, so when a thread is finished with step 1 another one takes over and performs step 2 at that object (or something like that). But as said, I don't know if any of these are any good solutions. Also, I don't know much about parallel programming at all. I know how to make a thread, and how to make it perform a function taking in an object as its only parameter, and I've also used the BackgroundWorker-class on one occasion, but I'm not that familiar with any of them.

Any input would be appreciated.

like image 746
Alxandr Avatar asked Feb 15 '10 02:02

Alxandr


People also ask

What is an example of parallel programming?

With parallel programming, a developer writes code with specialized software to make it easy for them to run their program across on multiple nodes or processors. A simple example of where parallel programming could be used to speed up processing is recoloring an image.

What is a parallel program?

Parallel programming is all about breaking down tasks into smaller tasks and executing them simultaneously.

What is parallel programming techniques?

August 26, 2022. Parallel processing is a computing technique when multiple streams of calculations or data processing tasks co-occur through numerous central processing units (CPUs) working concurrently.


2 Answers

There are few a options:

  • Parallel LINQ: Running Queries On Multi-Core Processors

  • Task Parallel Library (TPL): Optimize Managed Code For Multi-Core Machines

  • If you are interested in basic threading primitives and concepts: Threading in C#

[But as @John Knoeller pointed out, the example you gave is likely to be sequential I/O bound]

like image 151
Mitch Wheat Avatar answered Nov 15 '22 23:11

Mitch Wheat


This is the reference I use for C# thread: http://www.albahari.com/threading/

As a single PDF: http://www.albahari.com/threading/threading.pdf

For your second approach:

I've worked on some producer/consumer multithreaded apps where each task is some code that loops for ever. An external "initializer" starts a separate thread for each task and initializes an EventWaitHandle for each task. For each task is a global queue that can be used to produce/consume input.

In your case, your external program would add each directory to the queue for Task1, and Set the EventWaitHandler for Task1. Task 1 would "wake up" from its EventWaitHandler, get the count of directories in its queue, and then while the count is greater than 0, get the directory from the queue, scan for all the .jpgs, and add each .jpg location to a second queue, and set the EventWaitHandle for task 2. Task 2 reads its input, processes it, forwards it to a queue for Task 3...

It can be a bit of a pain getting all the locking to work right (I basically lock any access to the queue, even something as simple as getting its count). .NET 4.0 is supposed to have data structures that will automatically support a producer/consumer queue with no locks.

like image 39
chocojosh Avatar answered Nov 15 '22 23:11

chocojosh