Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Windows - Does accessing data through "localhost" incur network stack overhead

I have a large number of audio files I am running through a processing algorithm to attempt to extract certain bits of data from it (ie: average volume of the entire clip). I have a number of build scripts that previously pulled the input data from a Samba network share, which I've created a network drive mapping to via net use (ie: M: ==> \\server\share0).

Now that I have a new massive 1TB SSD, I can store the files locally and process them very quickly. To avoid having to do a massive re-write of my processing scripts, I removed my network drive mapping, and re-created it using the localhost host name. ie: M: ==> \\localhost\mydata.

When I make use of such a mapping, do I risk incurring significant overhead, such as from the data having to travel through part of Windows' network stack, or does the OS use any shortcuts so it equates more-or-less to direct disk access (ie: does the machine know it's just pulling files from its own hard drive). Increased latency isn't much of a concern of mine, but maximum sustained average throughput is critical.

I ask this because I'm deciding whether or not I should modify all of my processing scripts to work with a different style for network paths.

Extra Question: Does the same apply to Linux hosts: are they smart enough to know they are pulling from a local disk?

like image 300
Cloud Avatar asked Nov 18 '15 21:11

Cloud


2 Answers

When I make use of such a mapping, do I risk incurring significant overhead,

Yes. By using an UNC path (\\hostname\sharename\filename) as opposed to a local path ([\\?\]driveletter:\directoryname\filename), you're letting all traffic occur through the Server Message Block protocol (SMB / Samba). This adds a significant overhead in terms of disk access and access times in general.

The flow over a network is like this:

Application -> SMB Client -> Network -> SMB Server -> Target file system

Now by moving your files to your local machine, but still using UNC to access them, the flow is like this:

Application -> SMB Client -> localhost -> SMB Server -> Target file system

The only thing you minimized (not eliminated, SMB traffic to localhost still involves the network layers and all computations and traffic associated) is network traffic.

Also, given SMB is specifically tailored for network traffic, its reads may not optimally use your disk's and OS's caches. It may for example perform its reads in blocks of a certain size, while your disk performs better when reading blocks of another size.

If you want optimal throughput and minimal access times, use as little layers in between as possible, in this case by directly accessing the filesystem:

Application -> Target file system
like image 157
CodeCaster Avatar answered Oct 01 '22 09:10

CodeCaster


For sure using TCP over direct file access even with "loopback" has overheads such as routing, memory allocations etc. both on linux and windows, yes loopback device is a non-physichal kernel device and faster than the other network devices but not faster than direct file access. As far as I know on windows there are additional loopback optimizations such as NetDNA and "Fast TCP Loopback".

I assume the bottleneck with loopback device will be memory (copy) processes. So directly accessing a file rather than over loopback device will always be faster (and low-resource consuming) both on linux and windows.

Additionally, both operating systems solves protocol overheads for IPC via "named pipes" on windows and "unix domain sockets" on linux, using these will also be faster than using the loopback device whenever applicable.

like image 39
mow Avatar answered Oct 01 '22 11:10

mow