Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# I/O Parallelism does increase performance with SSD?

Tags:

I've read some answers ( for example) here at SO where some say that parallelism is not going to increase performance ( maybe in read IO).

But I've created few tests which shows that also WRITE operations are much faster.

READ TEST:

I've created random 6000 files with dummy data :

enter image description here

Let's try to read them w/ w/o parallelism :

var files =     Directory.GetFiles("c:\\temp\\2\\", "*.*", SearchOption.TopDirectoryOnly).Take(1000).ToList();      var sw = Stopwatch.StartNew();     files.ForEach(f => ReadAllBytes(f).GetHashCode());      sw.ElapsedMilliseconds.Dump("Run READ- Serial");     sw.Stop();        sw.Restart();     files.AsParallel().ForAll(f => ReadAllBytes(f).GetHashCode());      sw.ElapsedMilliseconds.Dump("Run READ- Parallel");     sw.Stop(); 

Result1:

Run READ- Serial 595

Run READ- Parallel 193

Result2:

Run READ- Serial 316

Run READ- Parallel 192

WRITE TEST:

Going to create 1000 random files where each file is 300K. (I've emptied the directory from prev test)

enter image description here

var bytes = new byte[300000]; Random r = new Random(); r.NextBytes(bytes); var list = Enumerable.Range(1, 1000).ToList();  sw.Restart(); list.ForEach((f) => WriteAllBytes(@"c:\\temp\\2\\" + Path.GetRandomFileName(), bytes));  sw.ElapsedMilliseconds.Dump("Run WRITE serial"); sw.Stop();  sw.Restart(); list.AsParallel().ForAll((f) => WriteAllBytes(@"c:\\temp\\2\\" +  Path.GetRandomFileName(), bytes));  sw.ElapsedMilliseconds.Dump("Run  WRITE Parallel"); sw.Stop(); 

Result 1:

Run WRITE serial 2028

Run WRITE Parallel 368

Result 2:

Run WRITE serial 784

Run WRITE Parallel 426

Question:

The results have surprised me. It is clear that against all expectations ( especially with WRITE operations)- the performance are better with parallelism , yet with IO operations.

How/Why come the parallelism results better ? It seems that SSD can work with threads and that there is no/less bottleneck when running more than one job at a time in the IO device.

Nb I didn't test it with HDD (I'll be happy that one that has HDD will run the tests.)

like image 516
Royi Namir Avatar asked Jun 06 '17 07:06

Royi Namir


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

Is C language easy?

C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.

What is C in C language?

What is C? C is a general-purpose programming language created by Dennis Ritchie at the Bell Laboratories in 1972. It is a very popular language, despite being old. C is strongly associated with UNIX, as it was developed to write the UNIX operating system.

What is the full name of C?

In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr.


2 Answers

Benchmarking is a tricky art, you are just not measuring what you think you are. That it is not actually I/O overhead is somewhat obvious from the test results, why is the single threaded code faster the second time you run it?

What you are not counting on is the behavior of the file system cache. It keeps a copy of the disk content in RAM. This has a particularly big impact on the multi-threaded code measurement, it is not using any I/O at all. In a nutshell:

  • Reads come from RAM if the file system cache has a copy of the data. This operates at memory bus speeds, typically around 35 gigabytes/second. If it does not have a copy then the read is delayed until the disk supplies the data. It does not just read the requested cluster but an entire cylinder worth of data off the disk.

  • Writes go straight to RAM, completes very quickly. That data is written to the disk lazily in the background while the program keeps executing, optimized to minimize write head movement in cylinder order. Only if no more RAM is available will a write ever stall.

Actual cache size depends on the installed amount of RAM and the need for RAM imposed by running processes. A very rough guideline is that you can count on 1GB on a machine with 4GB of RAM, 3GB on a machine with 8GB of RAM. It is visible in Resource Monitor, Memory tab, displayed as the "Cached" value. Keep in mind that it is highly variable.

So enough to make sense of what you see, the Parallel test benefits greatly from the Serial test already have read all the data. If you had written the test so that the Parallel test was run first then you'd have gotten very different results. Only if the cache is cold could you see the loss of perf due to threading. You'd have to restart your machine to ensure that condition. Or read another very large file first, large enough to evict useful data from the cache.

Only if you have a-priori knowledge of your program only ever reading data that was just written can you safely use threads without risking a perf loss. That guarantee is normally pretty hard to come by. It does exist, a good example is Visual Studio building your project. The compiler writes the build result to the obj\Debug directory, then MSBuild copies it to bin\Debug. Looks very wasteful, but it is not, that copy will always complete very quickly since the file is hot in the cache. The cache also explains the difference between a cold and a hot start of a .NET program and why using NGen is not always best.

like image 128
Hans Passant Avatar answered Oct 22 '22 16:10

Hans Passant


It is a very interesting topic! I'm sorry that I can't explain the technical details, but there are some concerns need to be raised. It is a bit long, so I can't fit them into the comment. Please forgive me to post it into as an "answer".

I think you need to think about both large and small files, also, the test must run a few times and get the average time to make sure the result is verifiable. A general guideline is to run it 25 times as a paper in evolutionary computing suggests.

Another concern is about system caching. You only created one bytes buffer and always write the same thing, I don't know how the system handles buffer, but to minimise the difference, I would suggest you to create different buffer for different files.

(Update: maybe GC also affect the performance, so I revised again to put GC aside as much as I could.)

I luckily have both SSD and HDD on my computer, and revised the test code. I executed the it with different configurations and get the following results. Hope I can inspire someone for better explanation.

1KB, 256 Files

Avg Write Parallel SSD: 46.88 Avg Write Serial   SSD: 94.32 Avg Read  Parallel SSD: 4.28 Avg Read  Serial   SSD: 15.48 Avg Write Parallel HDD: 35.4 Avg Write Serial   HDD: 71.52 Avg Read  Parallel HDD: 4.52 Avg Read  Serial   HDD: 14.68 

512KB, 256 Files

Avg Write Parallel SSD: 86.84 Avg Write Serial   SSD: 210.84 Avg Read  Parallel SSD: 65.64 Avg Read  Serial   SSD: 80.84 Avg Write Parallel HDD: 85.52 Avg Write Serial   HDD: 186.76 Avg Read  Parallel HDD: 63.24 Avg Read  Serial   HDD: 82.12 // Note: GC seems still kicked in the parallel reads on this test 

My machine is: i7-6820HQ / 32G / Windows 7 Enterprise x64 / VS2017 Professional / Target .NET 4.6 / Running in debug mode.

The two harddrives are:

C drive: IDE\Crucial_CT275MX300SSD4___________________M0CR021

D drive: IDE\ST2000LM003_HN-M201RAD__________________2BE10001

The revised code is as follows:

Stopwatch sw = new Stopwatch(); string path; int fileSize = 1024 * 1024 * 1024; int numFiles = 2;  byte[] bytes = new byte[fileSize]; Random r = new Random(DateTimeOffset.UtcNow.Millisecond); List<int> list = Enumerable.Range(0, numFiles).ToList(); List<List<byte>> allBytes = new List<List<byte>>(numFiles);  List<string> files;  int numTests = 1;  List<long> wss = new List<long>(numTests); List<long> wps = new List<long>(numTests); List<long> rss = new List<long>(numTests); List<long> rps = new List<long>(numTests);  List<long> wsh = new List<long>(numTests); List<long> wph = new List<long>(numTests); List<long> rsh = new List<long>(numTests); List<long> rph = new List<long>(numTests);  Enumerable.Range(1, numTests).ToList().ForEach((i) => {     path = @"C:\SeqParTest\";      allBytes.Clear();     GC.Collect();     GC.WaitForFullGCComplete();     list.ForEach((x) => { r.NextBytes(bytes); allBytes.Add(new List<byte>(bytes)); });     try { GC.TryStartNoGCRegion(0, true); } catch (Exception) { }     sw.Restart();     list.AsParallel().ForAll((x) => File.WriteAllBytes(path + Path.GetRandomFileName(), allBytes[x].ToArray()));     wps.Add(sw.ElapsedMilliseconds);     sw.Stop();     try { GC.EndNoGCRegion(); } catch (Exception) { }     Debug.Print($"Write parallel SSD #{i}: {wps[i - 1]}");      allBytes.Clear();     GC.Collect();     GC.WaitForFullGCComplete();     list.ForEach((x) => { r.NextBytes(bytes); allBytes.Add(new List<byte>(bytes)); });     try { GC.TryStartNoGCRegion(0, true); } catch (Exception) { }     sw.Restart();     list.ForEach((x) => File.WriteAllBytes(path + Path.GetRandomFileName(), allBytes[x].ToArray()));     wss.Add(sw.ElapsedMilliseconds);     sw.Stop();     try { GC.EndNoGCRegion(); } catch (Exception) { }     Debug.Print($"Write serial   SSD #{i}: {wss[i - 1]}");      files = Directory.GetFiles(path, "*.*", SearchOption.TopDirectoryOnly).Take(numFiles).ToList();     try { GC.TryStartNoGCRegion(0, true); } catch (Exception) { }     sw.Restart();     files.AsParallel().ForAll(f => File.ReadAllBytes(f).GetHashCode());     rps.Add(sw.ElapsedMilliseconds);     sw.Stop();     try { GC.EndNoGCRegion(); } catch (Exception) { }     files.ForEach(f => File.Delete(f));     Debug.Print($"Read  parallel SSD #{i}: {rps[i - 1]}");     GC.Collect();     GC.WaitForFullGCComplete();      files = Directory.GetFiles(path, "*.*", SearchOption.TopDirectoryOnly).Take(numFiles).ToList();     try { GC.TryStartNoGCRegion(0, true); } catch (Exception) { }     sw.Restart();     files.ForEach(f => File.ReadAllBytes(f).GetHashCode());     rss.Add(sw.ElapsedMilliseconds);     sw.Stop();     try { GC.EndNoGCRegion(); } catch (Exception) { }     files.ForEach(f => File.Delete(f));     Debug.Print($"Read  serial   SSD #{i}: {rss[i - 1]}");     GC.Collect();     GC.WaitForFullGCComplete();      path = @"D:\SeqParTest\";      allBytes.Clear();     GC.Collect();     GC.WaitForFullGCComplete();     list.ForEach((x) => { r.NextBytes(bytes); allBytes.Add(new List<byte>(bytes)); });     try { GC.TryStartNoGCRegion(0, true); } catch (Exception) { }     sw.Restart();     list.AsParallel().ForAll((x) => File.WriteAllBytes(path + Path.GetRandomFileName(), allBytes[x].ToArray()));     wph.Add(sw.ElapsedMilliseconds);     sw.Stop();     try { GC.EndNoGCRegion(); } catch (Exception) { }     Debug.Print($"Write parallel HDD #{i}: {wph[i - 1]}");      allBytes.Clear();     GC.Collect();     GC.WaitForFullGCComplete();     list.ForEach((x) => { r.NextBytes(bytes); allBytes.Add(new List<byte>(bytes)); });     try { GC.TryStartNoGCRegion(0, true); } catch (Exception) { }     sw.Restart();     list.ForEach((x) => File.WriteAllBytes(path + Path.GetRandomFileName(), allBytes[x].ToArray()));     wsh.Add(sw.ElapsedMilliseconds);     sw.Stop();     try { GC.EndNoGCRegion(); } catch (Exception) { }     Debug.Print($"Write serial   HDD #{i}: {wsh[i - 1]}");      files = Directory.GetFiles(path, "*.*", SearchOption.TopDirectoryOnly).Take(numFiles).ToList();     try { GC.TryStartNoGCRegion(0, true); } catch (Exception) { }     sw.Restart();     files.AsParallel().ForAll(f => File.ReadAllBytes(f).GetHashCode());     rph.Add(sw.ElapsedMilliseconds);     sw.Stop();     try { GC.EndNoGCRegion(); } catch (Exception) { }     files.ForEach(f => File.Delete(f));     Debug.Print($"Read  parallel HDD #{i}: {rph[i - 1]}");     GC.Collect();     GC.WaitForFullGCComplete();      files = Directory.GetFiles(path, "*.*", SearchOption.TopDirectoryOnly).Take(numFiles).ToList();     try { GC.TryStartNoGCRegion(0, true); } catch (Exception) { }     sw.Restart();     files.ForEach(f => File.ReadAllBytes(f).GetHashCode());     rsh.Add(sw.ElapsedMilliseconds);     sw.Stop();     try { GC.EndNoGCRegion(); } catch (Exception) { }     files.ForEach(f => File.Delete(f));     Debug.Print($"Read  serial   HDD #{i}: {rsh[i - 1]}");     GC.Collect();     GC.WaitForFullGCComplete(); });  Debug.Print($"Avg Write Parallel SSD: {wps.Average()}"); Debug.Print($"Avg Write Serial   SSD: {wss.Average()}"); Debug.Print($"Avg Read  Parallel SSD: {rps.Average()}"); Debug.Print($"Avg Read  Serial   SSD: {rss.Average()}");  Debug.Print($"Avg Write Parallel HDD: {wph.Average()}"); Debug.Print($"Avg Write Serial   HDD: {wsh.Average()}"); Debug.Print($"Avg Read  Parallel HDD: {rph.Average()}"); Debug.Print($"Avg Read  Serial   HDD: {rsh.Average()}"); 

Well, I haven't fully tested the code, so it may buggy. I realised it sometimes stops on the parallel read, I assume it was because of the deletion of files from sequential read was completed AFTER reading the list of existing files in the next step, so it complains file not found error.

Another problem is that I used the newly created files for read test. Theoretically it is better to not do so (even restart computer / fill out empty space on SSD to avoid caching), but I didn't bother because the intended comparison is between sequential and parallel performance.

Update:

I don't know how to explain the reason, but I think it may because the IO resource is pretty idle? I'll try two things the next:

  1. Large files (1GB) in serial / parallel
  2. When other background activities using disk IO.

Update 2:

Some results from large files (512M, 32 files):

Write parallel SSD #1: 140935 Write serial   SSD #1: 133656 Read  parallel SSD #1: 62150 Read  serial   SSD #1: 43355 Write parallel HDD #1: 172448 Write serial   HDD #1: 138381 Read  parallel HDD #1: 173436 Read  serial   HDD #1: 142248  Write parallel SSD #2: 122286 Write serial   SSD #2: 119564 Read  parallel SSD #2: 53227 Read  serial   SSD #2: 43022 Write parallel HDD #2: 175922 Write serial   HDD #2: 137572 Read  parallel HDD #2: 204972 Read  serial   HDD #2: 142174  Write parallel SSD #3: 121700 Write serial   SSD #3: 117730 Read  parallel SSD #3: 107546 Read  serial   SSD #3: 42872 Write parallel HDD #3: 171914 Write serial   HDD #3: 145923 Read  parallel HDD #3: 193097 Read  serial   HDD #3: 142211  Write parallel SSD #4: 125805 Write serial   SSD #4: 118252 Read  parallel SSD #4: 113385 Read  serial   SSD #4: 42951 Write parallel HDD #4: 176920 Write serial   HDD #4: 137520 Read  parallel HDD #4: 208123 Read  serial   HDD #4: 142273  Write parallel SSD #5: 116394 Write serial   SSD #5: 116592 Read  parallel SSD #5: 61273 Read  serial   SSD #5: 43315 Write parallel HDD #5: 172259 Write serial   HDD #5: 138554 Read  parallel HDD #5: 275791 Read  serial   HDD #5: 142311  Write parallel SSD #6: 107839 Write serial   SSD #6: 135071 Read  parallel SSD #6: 79846 Read  serial   SSD #6: 43328 Write parallel HDD #6: 176034 Write serial   HDD #6: 138671 Read  parallel HDD #6: 218533 Read  serial   HDD #6: 142481  Write parallel SSD #7: 120438 Write serial   SSD #7: 118032 Read  parallel SSD #7: 45375 Read  serial   SSD #7: 42978 Write parallel HDD #7: 173151 Write serial   HDD #7: 140579 Read  parallel HDD #7: 176492 Read  serial   HDD #7: 142153  Write parallel SSD #8: 108862 Write serial   SSD #8: 123556 Read  parallel SSD #8: 120162 Read  serial   SSD #8: 42983 Write parallel HDD #8: 174699 Write serial   HDD #8: 137619 Read  parallel HDD #8: 204069 Read  serial   HDD #8: 142480  Write parallel SSD #9: 111618 Write serial   SSD #9: 117854 Read  parallel SSD #9: 51224 Read  serial   SSD #9: 42970 Write parallel HDD #9: 173069 Write serial   HDD #9: 136936 Read  parallel HDD #9: 159978 Read  serial   HDD #9: 143401  Write parallel SSD #10: 115381 Write serial   SSD #10: 118545 Read  parallel SSD #10: 79509 Read  serial   SSD #10: 43818 Write parallel HDD #10: 179545 Write serial   HDD #10: 138556 Read  parallel HDD #10: 167978 Read  serial   HDD #10: 143033  Write parallel SSD #11: 113105 Write serial   SSD #11: 116849 Read  parallel SSD #11: 84309 Read  serial   SSD #11: 42620 Write parallel HDD #11: 179432 Write serial   HDD #11: 139014 Read  parallel HDD #11: 219161 Read  serial   HDD #11: 142515  Write parallel SSD #12: 124901 Write serial   SSD #12: 121769 Read  parallel SSD #12: 137192 Read  serial   SSD #12: 43144 Write parallel HDD #12: 176091 Write serial   HDD #12: 139042 Read  parallel HDD #12: 214205 Read  serial   HDD #12: 142576  Write parallel SSD #13: 110896 Write serial   SSD #13: 123152 Read  parallel SSD #13: 56633 Read  serial   SSD #13: 42665 Write parallel HDD #13: 173123 Write serial   HDD #13: 138514 Read  parallel HDD #13: 210003 Read  serial   HDD #13: 142215  Write parallel SSD #14: 117762 Write serial   SSD #14: 126865 Read  parallel SSD #14: 90005 Read  serial   SSD #14: 44089 Write parallel HDD #14: 172958 Write serial   HDD #14: 139908 Read  parallel HDD #14: 217826 Read  serial   HDD #14: 142216  Write parallel SSD #15: 109912 Write serial   SSD #15: 121276 Read  parallel SSD #15: 72285 Read  serial   SSD #15: 42827 Write parallel HDD #15: 176255 Write serial   HDD #15: 139084 Read  parallel HDD #15: 183926 Read  serial   HDD #15: 142111  Write parallel SSD #16: 122476 Write serial   SSD #16: 126283 Read  parallel SSD #16: 47875 Read  serial   SSD #16: 43799 Write parallel HDD #16: 173436 Write serial   HDD #16: 137203 Read  parallel HDD #16: 294374 Read  serial   HDD #16: 142387  Write parallel SSD #17: 112168 Write serial   SSD #17: 121079 Read  parallel SSD #17: 79001 Read  serial   SSD #17: 43207 

I regret I don't have time to complete all 25 runs, but the result shows on large files the sequential R/W could be faster than parallel if the disk usage is full. I think it may the reason of other discussions on SO.

like image 45
Tide Gu Avatar answered Oct 22 '22 16:10

Tide Gu