Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the web server locate a file on server through URL?

Tags:

webserver

Has anyone ever tried to implement a web server? Or know something about the underhood of a working web server program? I am wondering what happens exactly from when a URL is received by the web server to a file on the web server is located and sent back as response.

Does the server just keep an internal table to remember the mapping between the URLs it supports and the corresponding local paths? Or is there anything more tricky?

Thanks!

Update

Thanks for your replies. Here's my understanding for now.

I checked with the Microsoft IIS (Internet Information Service), I noticed that IIS can host multiple sites, and foreach site IIS memorize its root path on the local file system. Different sites on the same host share the same host name or IP, and they are differentiated by separate ports. For example:

http://www.myServer.com:1111/folderA/pageA.htm

The web server will use www.myServer.com:1111 part of the URL string to locate which path on its local file system will be used, and then in that local path, it searches for subfolder folderA and then the file pageA.htm.

The web server only need to memorize the following mapping between 2 plain strings:

"http://www.myServer.com:1111/" <---> "D:\myWebRoot"

I don't know where this kind of mapping info is stored, maybe some config files for the Web Server Program in question.

But the result of this mapping granularity is that we could only access content within that mapped local folder. We couldn't do arbitray mapping.

Update - 2 -

I found where the IIS keep the mapping, here's some quotes from applicationHost.config:

<sites>
    <site name="Default Web Site" id="1" serverAutoStart="false">
        <application path="/">
            <virtualDirectory path="/" physicalPath="%SystemDrive%\inetpub\wwwroot" />
        </application>
        <bindings>
            <binding protocol="http" bindingInformation="*:80:" />
            <binding protocol="net.tcp" bindingInformation="808:*" />
            <binding protocol="net.pipe" bindingInformation="*" />
            <binding protocol="net.msmq" bindingInformation="localhost" />
            <binding protocol="msmq.formatname" bindingInformation="localhost" />
        </bindings>
    </site>
    <site name="myIISService" id="2" serverAutoStart="true">
        <application path="/" applicationPool="myIISService">
            <virtualDirectory path="/" physicalPath="D:\MySites\MyIISService" />
        </application>
        <bindings>
            <binding protocol="http" bindingInformation="*:8022:" />
        </bindings>
    </site>
    <siteDefaults>
        <logFile logFormat="W3C" directory="%SystemDrive%\inetpub\logs\LogFiles" />
        <traceFailedRequestsLogging directory="%SystemDrive%\inetpub\logs\FailedReqLogFiles" />
    </siteDefaults>
    <applicationDefaults applicationPool="DefaultAppPool" />
    <virtualDirectoryDefaults allowSubDirConfig="true" />
</sites>

Update - 3 -

After I read foo's reply, my undersanding of a "server" is enlarged. I want to make some comment based on my recent learning of WCF.

No matter what kind of server it is, we could always send messages to them by specifying the protocol, URL, port. For example:

[http://www.myserver.com:1111/]page.htm

[net.tcp://www.myserver.com/]someService.svc/someMethod

[net.msmq://www.myserver.com/]someService.svc

[net.pipe://localhost/]

After the messages arrives at the server program using the parts in square bracket of above URLs, the rest part of the url will send to the server program as input for further processing. And the following behaviour could be as simple as static content feeding or as complex as dynamic content generating.

like image 222
smwikipedia Avatar asked Jan 22 '11 04:01

smwikipedia


1 Answers

Depends on the webserver and what its focus is.

(For all items, checking access rights, remapping and such steps apply of course.)

  • General-purpose webservers like Apache start out with files and directories, so they split up the URL into a hierarchical path description, try to find a file at the given location, and serve it if it exists. (This gets more complex with modules and filetypes; some filetypes imply processing the file as a script and returning the script output rather than just piping out the file contents, and so on).

  • Application servers like Tomcat do a mapping to servlets; if they have found a servlet that will handle the URL, they call it and pass any leftover URL parts/parameters to it for further handling.

  • Embedded webservers may even use hardcoded lookup tables for available URL patterns, directly mapping to functions to be called.

  • Special-purpose webservers will do whatever is required; some won't even parse the URL but just the other headers (like some streaming servers do).

It all depends on what you want to achieve. In most cases, you will be best off with nginx or Apache and maybe some modules and/or finetuning.

Be aware that any HTTP header can be used for mapping the request to whatever means of producing output you have. Hostname, port and URL are used most often, but you may as well take language or client IP or other header data and use them in the mapping.

So for your question: Yes, it can be as simple as that; and yes, it can be substantially more tricky (with mapping, rewriting, and complex processing).

like image 95
foo Avatar answered Dec 29 '22 17:12

foo