I've read some answers ( for example) here at SO where some say that parallelism is not going to increase performance ( maybe in read IO).
But I've created few tests which shows that also WRITE operations are much faster.
— READ TEST:
I've created random 6000 files with dummy data :
Let's try to read them w/ w/o parallelism :
var files = Directory.GetFiles("c:\\temp\\2\\", "*.*", SearchOption.TopDirectoryOnly).Take(1000).ToList(); var sw = Stopwatch.StartNew(); files.ForEach(f => ReadAllBytes(f).GetHashCode()); sw.ElapsedMilliseconds.Dump("Run READ- Serial"); sw.Stop(); sw.Restart(); files.AsParallel().ForAll(f => ReadAllBytes(f).GetHashCode()); sw.ElapsedMilliseconds.Dump("Run READ- Parallel"); sw.Stop();
Result1:
Run READ- Serial 595
Run READ- Parallel 193
Result2:
Run READ- Serial 316
Run READ- Parallel 192
— WRITE TEST:
Going to create 1000 random files where each file is 300K. (I've emptied the directory from prev test)
var bytes = new byte[300000]; Random r = new Random(); r.NextBytes(bytes); var list = Enumerable.Range(1, 1000).ToList(); sw.Restart(); list.ForEach((f) => WriteAllBytes(@"c:\\temp\\2\\" + Path.GetRandomFileName(), bytes)); sw.ElapsedMilliseconds.Dump("Run WRITE serial"); sw.Stop(); sw.Restart(); list.AsParallel().ForAll((f) => WriteAllBytes(@"c:\\temp\\2\\" + Path.GetRandomFileName(), bytes)); sw.ElapsedMilliseconds.Dump("Run WRITE Parallel"); sw.Stop();
Result 1:
Run WRITE serial 2028
Run WRITE Parallel 368
Result 2:
Run WRITE serial 784
Run WRITE Parallel 426
Question:
The results have surprised me. It is clear that against all expectations ( especially with WRITE operations)- the performance are better with parallelism , yet with IO operations.
How/Why come the parallelism results better ? It seems that SSD can work with threads and that there is no/less bottleneck when running more than one job at a time in the IO device.
Nb I didn't test it with HDD (I'll be happy that one that has HDD will run the tests.)
C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...
C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.
What is C? C is a general-purpose programming language created by Dennis Ritchie at the Bell Laboratories in 1972. It is a very popular language, despite being old. C is strongly associated with UNIX, as it was developed to write the UNIX operating system.
In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr.
Benchmarking is a tricky art, you are just not measuring what you think you are. That it is not actually I/O overhead is somewhat obvious from the test results, why is the single threaded code faster the second time you run it?
What you are not counting on is the behavior of the file system cache. It keeps a copy of the disk content in RAM. This has a particularly big impact on the multi-threaded code measurement, it is not using any I/O at all. In a nutshell:
Reads come from RAM if the file system cache has a copy of the data. This operates at memory bus speeds, typically around 35 gigabytes/second. If it does not have a copy then the read is delayed until the disk supplies the data. It does not just read the requested cluster but an entire cylinder worth of data off the disk.
Writes go straight to RAM, completes very quickly. That data is written to the disk lazily in the background while the program keeps executing, optimized to minimize write head movement in cylinder order. Only if no more RAM is available will a write ever stall.
Actual cache size depends on the installed amount of RAM and the need for RAM imposed by running processes. A very rough guideline is that you can count on 1GB on a machine with 4GB of RAM, 3GB on a machine with 8GB of RAM. It is visible in Resource Monitor, Memory tab, displayed as the "Cached" value. Keep in mind that it is highly variable.
So enough to make sense of what you see, the Parallel test benefits greatly from the Serial test already have read all the data. If you had written the test so that the Parallel test was run first then you'd have gotten very different results. Only if the cache is cold could you see the loss of perf due to threading. You'd have to restart your machine to ensure that condition. Or read another very large file first, large enough to evict useful data from the cache.
Only if you have a-priori knowledge of your program only ever reading data that was just written can you safely use threads without risking a perf loss. That guarantee is normally pretty hard to come by. It does exist, a good example is Visual Studio building your project. The compiler writes the build result to the obj\Debug directory, then MSBuild copies it to bin\Debug. Looks very wasteful, but it is not, that copy will always complete very quickly since the file is hot in the cache. The cache also explains the difference between a cold and a hot start of a .NET program and why using NGen is not always best.
It is a very interesting topic! I'm sorry that I can't explain the technical details, but there are some concerns need to be raised. It is a bit long, so I can't fit them into the comment. Please forgive me to post it into as an "answer".
I think you need to think about both large and small files, also, the test must run a few times and get the average time to make sure the result is verifiable. A general guideline is to run it 25 times as a paper in evolutionary computing suggests.
Another concern is about system caching. You only created one bytes
buffer and always write the same thing, I don't know how the system handles buffer, but to minimise the difference, I would suggest you to create different buffer for different files.
(Update: maybe GC also affect the performance, so I revised again to put GC aside as much as I could.)
I luckily have both SSD and HDD on my computer, and revised the test code. I executed the it with different configurations and get the following results. Hope I can inspire someone for better explanation.
1KB, 256 Files
Avg Write Parallel SSD: 46.88 Avg Write Serial SSD: 94.32 Avg Read Parallel SSD: 4.28 Avg Read Serial SSD: 15.48 Avg Write Parallel HDD: 35.4 Avg Write Serial HDD: 71.52 Avg Read Parallel HDD: 4.52 Avg Read Serial HDD: 14.68
512KB, 256 Files
Avg Write Parallel SSD: 86.84 Avg Write Serial SSD: 210.84 Avg Read Parallel SSD: 65.64 Avg Read Serial SSD: 80.84 Avg Write Parallel HDD: 85.52 Avg Write Serial HDD: 186.76 Avg Read Parallel HDD: 63.24 Avg Read Serial HDD: 82.12 // Note: GC seems still kicked in the parallel reads on this test
My machine is: i7-6820HQ / 32G / Windows 7 Enterprise x64 / VS2017 Professional / Target .NET 4.6 / Running in debug mode.
The two harddrives are:
C drive: IDE\Crucial_CT275MX300SSD4___________________M0CR021
D drive: IDE\ST2000LM003_HN-M201RAD__________________2BE10001
The revised code is as follows:
Stopwatch sw = new Stopwatch(); string path; int fileSize = 1024 * 1024 * 1024; int numFiles = 2; byte[] bytes = new byte[fileSize]; Random r = new Random(DateTimeOffset.UtcNow.Millisecond); List<int> list = Enumerable.Range(0, numFiles).ToList(); List<List<byte>> allBytes = new List<List<byte>>(numFiles); List<string> files; int numTests = 1; List<long> wss = new List<long>(numTests); List<long> wps = new List<long>(numTests); List<long> rss = new List<long>(numTests); List<long> rps = new List<long>(numTests); List<long> wsh = new List<long>(numTests); List<long> wph = new List<long>(numTests); List<long> rsh = new List<long>(numTests); List<long> rph = new List<long>(numTests); Enumerable.Range(1, numTests).ToList().ForEach((i) => { path = @"C:\SeqParTest\"; allBytes.Clear(); GC.Collect(); GC.WaitForFullGCComplete(); list.ForEach((x) => { r.NextBytes(bytes); allBytes.Add(new List<byte>(bytes)); }); try { GC.TryStartNoGCRegion(0, true); } catch (Exception) { } sw.Restart(); list.AsParallel().ForAll((x) => File.WriteAllBytes(path + Path.GetRandomFileName(), allBytes[x].ToArray())); wps.Add(sw.ElapsedMilliseconds); sw.Stop(); try { GC.EndNoGCRegion(); } catch (Exception) { } Debug.Print($"Write parallel SSD #{i}: {wps[i - 1]}"); allBytes.Clear(); GC.Collect(); GC.WaitForFullGCComplete(); list.ForEach((x) => { r.NextBytes(bytes); allBytes.Add(new List<byte>(bytes)); }); try { GC.TryStartNoGCRegion(0, true); } catch (Exception) { } sw.Restart(); list.ForEach((x) => File.WriteAllBytes(path + Path.GetRandomFileName(), allBytes[x].ToArray())); wss.Add(sw.ElapsedMilliseconds); sw.Stop(); try { GC.EndNoGCRegion(); } catch (Exception) { } Debug.Print($"Write serial SSD #{i}: {wss[i - 1]}"); files = Directory.GetFiles(path, "*.*", SearchOption.TopDirectoryOnly).Take(numFiles).ToList(); try { GC.TryStartNoGCRegion(0, true); } catch (Exception) { } sw.Restart(); files.AsParallel().ForAll(f => File.ReadAllBytes(f).GetHashCode()); rps.Add(sw.ElapsedMilliseconds); sw.Stop(); try { GC.EndNoGCRegion(); } catch (Exception) { } files.ForEach(f => File.Delete(f)); Debug.Print($"Read parallel SSD #{i}: {rps[i - 1]}"); GC.Collect(); GC.WaitForFullGCComplete(); files = Directory.GetFiles(path, "*.*", SearchOption.TopDirectoryOnly).Take(numFiles).ToList(); try { GC.TryStartNoGCRegion(0, true); } catch (Exception) { } sw.Restart(); files.ForEach(f => File.ReadAllBytes(f).GetHashCode()); rss.Add(sw.ElapsedMilliseconds); sw.Stop(); try { GC.EndNoGCRegion(); } catch (Exception) { } files.ForEach(f => File.Delete(f)); Debug.Print($"Read serial SSD #{i}: {rss[i - 1]}"); GC.Collect(); GC.WaitForFullGCComplete(); path = @"D:\SeqParTest\"; allBytes.Clear(); GC.Collect(); GC.WaitForFullGCComplete(); list.ForEach((x) => { r.NextBytes(bytes); allBytes.Add(new List<byte>(bytes)); }); try { GC.TryStartNoGCRegion(0, true); } catch (Exception) { } sw.Restart(); list.AsParallel().ForAll((x) => File.WriteAllBytes(path + Path.GetRandomFileName(), allBytes[x].ToArray())); wph.Add(sw.ElapsedMilliseconds); sw.Stop(); try { GC.EndNoGCRegion(); } catch (Exception) { } Debug.Print($"Write parallel HDD #{i}: {wph[i - 1]}"); allBytes.Clear(); GC.Collect(); GC.WaitForFullGCComplete(); list.ForEach((x) => { r.NextBytes(bytes); allBytes.Add(new List<byte>(bytes)); }); try { GC.TryStartNoGCRegion(0, true); } catch (Exception) { } sw.Restart(); list.ForEach((x) => File.WriteAllBytes(path + Path.GetRandomFileName(), allBytes[x].ToArray())); wsh.Add(sw.ElapsedMilliseconds); sw.Stop(); try { GC.EndNoGCRegion(); } catch (Exception) { } Debug.Print($"Write serial HDD #{i}: {wsh[i - 1]}"); files = Directory.GetFiles(path, "*.*", SearchOption.TopDirectoryOnly).Take(numFiles).ToList(); try { GC.TryStartNoGCRegion(0, true); } catch (Exception) { } sw.Restart(); files.AsParallel().ForAll(f => File.ReadAllBytes(f).GetHashCode()); rph.Add(sw.ElapsedMilliseconds); sw.Stop(); try { GC.EndNoGCRegion(); } catch (Exception) { } files.ForEach(f => File.Delete(f)); Debug.Print($"Read parallel HDD #{i}: {rph[i - 1]}"); GC.Collect(); GC.WaitForFullGCComplete(); files = Directory.GetFiles(path, "*.*", SearchOption.TopDirectoryOnly).Take(numFiles).ToList(); try { GC.TryStartNoGCRegion(0, true); } catch (Exception) { } sw.Restart(); files.ForEach(f => File.ReadAllBytes(f).GetHashCode()); rsh.Add(sw.ElapsedMilliseconds); sw.Stop(); try { GC.EndNoGCRegion(); } catch (Exception) { } files.ForEach(f => File.Delete(f)); Debug.Print($"Read serial HDD #{i}: {rsh[i - 1]}"); GC.Collect(); GC.WaitForFullGCComplete(); }); Debug.Print($"Avg Write Parallel SSD: {wps.Average()}"); Debug.Print($"Avg Write Serial SSD: {wss.Average()}"); Debug.Print($"Avg Read Parallel SSD: {rps.Average()}"); Debug.Print($"Avg Read Serial SSD: {rss.Average()}"); Debug.Print($"Avg Write Parallel HDD: {wph.Average()}"); Debug.Print($"Avg Write Serial HDD: {wsh.Average()}"); Debug.Print($"Avg Read Parallel HDD: {rph.Average()}"); Debug.Print($"Avg Read Serial HDD: {rsh.Average()}");
Well, I haven't fully tested the code, so it may buggy. I realised it sometimes stops on the parallel read, I assume it was because of the deletion of files from sequential read was completed AFTER reading the list of existing files in the next step, so it complains file not found error.
Another problem is that I used the newly created files for read test. Theoretically it is better to not do so (even restart computer / fill out empty space on SSD to avoid caching), but I didn't bother because the intended comparison is between sequential and parallel performance.
Update:
I don't know how to explain the reason, but I think it may because the IO resource is pretty idle? I'll try two things the next:
Update 2:
Some results from large files (512M, 32 files):
Write parallel SSD #1: 140935 Write serial SSD #1: 133656 Read parallel SSD #1: 62150 Read serial SSD #1: 43355 Write parallel HDD #1: 172448 Write serial HDD #1: 138381 Read parallel HDD #1: 173436 Read serial HDD #1: 142248 Write parallel SSD #2: 122286 Write serial SSD #2: 119564 Read parallel SSD #2: 53227 Read serial SSD #2: 43022 Write parallel HDD #2: 175922 Write serial HDD #2: 137572 Read parallel HDD #2: 204972 Read serial HDD #2: 142174 Write parallel SSD #3: 121700 Write serial SSD #3: 117730 Read parallel SSD #3: 107546 Read serial SSD #3: 42872 Write parallel HDD #3: 171914 Write serial HDD #3: 145923 Read parallel HDD #3: 193097 Read serial HDD #3: 142211 Write parallel SSD #4: 125805 Write serial SSD #4: 118252 Read parallel SSD #4: 113385 Read serial SSD #4: 42951 Write parallel HDD #4: 176920 Write serial HDD #4: 137520 Read parallel HDD #4: 208123 Read serial HDD #4: 142273 Write parallel SSD #5: 116394 Write serial SSD #5: 116592 Read parallel SSD #5: 61273 Read serial SSD #5: 43315 Write parallel HDD #5: 172259 Write serial HDD #5: 138554 Read parallel HDD #5: 275791 Read serial HDD #5: 142311 Write parallel SSD #6: 107839 Write serial SSD #6: 135071 Read parallel SSD #6: 79846 Read serial SSD #6: 43328 Write parallel HDD #6: 176034 Write serial HDD #6: 138671 Read parallel HDD #6: 218533 Read serial HDD #6: 142481 Write parallel SSD #7: 120438 Write serial SSD #7: 118032 Read parallel SSD #7: 45375 Read serial SSD #7: 42978 Write parallel HDD #7: 173151 Write serial HDD #7: 140579 Read parallel HDD #7: 176492 Read serial HDD #7: 142153 Write parallel SSD #8: 108862 Write serial SSD #8: 123556 Read parallel SSD #8: 120162 Read serial SSD #8: 42983 Write parallel HDD #8: 174699 Write serial HDD #8: 137619 Read parallel HDD #8: 204069 Read serial HDD #8: 142480 Write parallel SSD #9: 111618 Write serial SSD #9: 117854 Read parallel SSD #9: 51224 Read serial SSD #9: 42970 Write parallel HDD #9: 173069 Write serial HDD #9: 136936 Read parallel HDD #9: 159978 Read serial HDD #9: 143401 Write parallel SSD #10: 115381 Write serial SSD #10: 118545 Read parallel SSD #10: 79509 Read serial SSD #10: 43818 Write parallel HDD #10: 179545 Write serial HDD #10: 138556 Read parallel HDD #10: 167978 Read serial HDD #10: 143033 Write parallel SSD #11: 113105 Write serial SSD #11: 116849 Read parallel SSD #11: 84309 Read serial SSD #11: 42620 Write parallel HDD #11: 179432 Write serial HDD #11: 139014 Read parallel HDD #11: 219161 Read serial HDD #11: 142515 Write parallel SSD #12: 124901 Write serial SSD #12: 121769 Read parallel SSD #12: 137192 Read serial SSD #12: 43144 Write parallel HDD #12: 176091 Write serial HDD #12: 139042 Read parallel HDD #12: 214205 Read serial HDD #12: 142576 Write parallel SSD #13: 110896 Write serial SSD #13: 123152 Read parallel SSD #13: 56633 Read serial SSD #13: 42665 Write parallel HDD #13: 173123 Write serial HDD #13: 138514 Read parallel HDD #13: 210003 Read serial HDD #13: 142215 Write parallel SSD #14: 117762 Write serial SSD #14: 126865 Read parallel SSD #14: 90005 Read serial SSD #14: 44089 Write parallel HDD #14: 172958 Write serial HDD #14: 139908 Read parallel HDD #14: 217826 Read serial HDD #14: 142216 Write parallel SSD #15: 109912 Write serial SSD #15: 121276 Read parallel SSD #15: 72285 Read serial SSD #15: 42827 Write parallel HDD #15: 176255 Write serial HDD #15: 139084 Read parallel HDD #15: 183926 Read serial HDD #15: 142111 Write parallel SSD #16: 122476 Write serial SSD #16: 126283 Read parallel SSD #16: 47875 Read serial SSD #16: 43799 Write parallel HDD #16: 173436 Write serial HDD #16: 137203 Read parallel HDD #16: 294374 Read serial HDD #16: 142387 Write parallel SSD #17: 112168 Write serial SSD #17: 121079 Read parallel SSD #17: 79001 Read serial SSD #17: 43207
I regret I don't have time to complete all 25 runs, but the result shows on large files the sequential R/W could be faster than parallel if the disk usage is full. I think it may the reason of other discussions on SO.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With