Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a socket, physically?

Tags:

I always prefer the pyhsical meaning of a programming concept to its logical meaning. So here comes this question.

As I review the socket programming paradigm, I noticed that what the bind(), connect() functions do are just like tuning the socket created by the socket() function. So I guess that what the socket() function does is just creating a data structure (and possibly a data structure in the kernel space) to hold the details about the end-to-end communication settings between the client and the server. And bind(), connect() just fill in that data structure.

I am not familiar with the implementation of the socket API, so I hope someone could address my concern.

like image 506
smwikipedia Avatar asked Feb 14 '11 15:02

smwikipedia


People also ask

Are sockets physical or logical?

In summary: processor socket — a package of processors sharing a physical connection to the motherboard. processor core — an independent processor. processor thread — a “logical processor” sharing resources with other threads on the same core.

What is the function of a sockets?

- Sockets are very important from the security point of view as it is a method that allows directing of the data to application using TCP/IP protocols. - It provides a way to combine the IP address and the port number so that a socket can be created and used.

What is used to identify a socket?

In this context, a socket is externally identified to other hosts by its socket address, which is the triad of transport protocol, IP address, and port number. The term socket is also used for the software endpoint of node-internal inter-process communication (IPC), which often uses the same API as a network socket.


2 Answers

This is highly platform-dependent. The point of the API is so that you DON'T need to know these details.

If you're really interested in learning this (which you shouldn't be for just applications and system applications programming), you can download a linux kernel source archive from kernel.org and examine Linux's TCP/IP implementation by looking under net/ipv4

To add some clarity,

To transport data across the network, we usually adhere to standards defined by the International Standards Organization. They have a standard called the OSI, or Open Systems Interconnection, model.

This model defines 7 layers of abstraction for applications to move data across a network. I'll only talk about the first 4, as they are the the pertinent ones for your question.

Physical Layer:

This layer defines how the data is actually transmitted over the media. Hardware vendors adhere to defined standards on how to move the data. The standards agree on electrical signals and the electronic aspects of the data moving.

How it fits into the system:

Hopefully, there's very little software support required for this layer. Whatever programming is done here is likely to be done on-module and not in the kernel or application.

Data Link Layer:

This is the first layer that arguably involves some sort of programming. This layer defines the line-level protocols that operate on the physical links. Ethernet is one protocol. Frame relay is another. Token Ring is another. Each end of the link must be running the same data link protocol. This layer combines a compatible physical layer standard to give a means to actually transfer data from one host to another. In many regards it can be thought more of an appendix to the physical layer rather than its own layer, but because link-level protocols are defined here that's not a great analogy. This layer gives physical addresses to nodes on a network.

How it fits into the system:

You'd need to write a driver to talk to the interface module that runs these data-link protocols. Depending on the module and the system, the module may have all that it needs to actually work, or it may need some system-level help. Ideally, you just create a set of code interfaces (perhaps implemented as structs that contain function pointers for the appropriate handling of I/O.. I don't really know) and when you install a new physical module, a driver need only to implement those code interfaces and now your physical module is usable.

Network Layer

This is the layer that provides the ability to move data between networks (in the case of TCP/IP). The Internet Protocol is defined at this layer. This layer gives logical addresses to nodes so that they can be grouped into networks. By knowing what network (also called a subnet, determined programatically using the subnet mask) the host is on, we run algorithms that correctly move data from one network to another. If one host is on network A in China and one host is on network B in Australia, algorithms at this level are in charge of providing a path that links these networks and therefore these hosts.

An important thing about programming for this layer is that you should be able to just "plug in" any data link layer to transmit over. This means that once you create code on your system to transmit over Ethernet, Token Ring, 3G, or Frame Relay that you should be able to use all of them without the network layer needing to know what data link technology it is using. The logic of moving data between networks should not depend on the actual physical link it is operating on.

This layer puts your data into packets, and packets are what are routed over the internet.

How it fits into the system:

All of this layer must be coded as part of the system. It is entirely a software construct and should be isolated as much as possible from the data link layer. I am not enough of an expert to say in practice how well this is accomplished. Because the functionality of this layer is system-defined, we have total control over what the software must support. This makes the construction of the code interfaces that allow using this layer by higher-layer protocols rather simple compared to the ones in the data link layer.

Transport Layer:

This layer defines segmentation of data (because if you just sent giant pieces of data all at once, hardly anything would make it in order). This layer also defines TCP, which provides hand-shaking, checksums, packet ordering, variable data window sizes, and guaranteed reliabilty. TCP gives you the ability to create multiple logical channels of communication over the same physical link. It differentiates one coversation on a link from another conversation on the same link. UDP is also defined at this level, and can be thought of as an extremely light-weight TCP. UDP provided almost none of the beneifts of TCP but still provides the physical channel multiplexing.

If your transport layer is written well, your applications don't need (speaking from a code architecture standpoint) to worry about whether the transport layer is using TCP or UDP (just mentioning these two b/c the yare most popular on IP). While you may pick one or the other based on timing performance needs or reliability needs (and in practice, applications often make an assumption about which one they are running), your application doesn't need to have exact knowledge of which one is running.

Because this layer is built on top of the network layer, we don't need to worry about how our data will get from one host to another if they are on different networks. If a router is running a standard routing protocol, augmented by some statically-defined routes, we don't need to worry about that. It's all taken care of for us by the network layer. If the network-layer configuration changes on the host that we are running, it doesn't matter. We don't need to change our entire application to account for this.

How it fits into the system:

Very similar to network layer, except it provides different functionality than does the network layer. Additionally, these interfaces are used more in user-space than are the network layer interfaces. This is the layer that actually defines the sockets that you use in TCP/IP networking.


Hope this helps and you can understand why your question is a little confusing to most of us.

like image 106
San Jacinto Avatar answered Sep 22 '22 10:09

San Jacinto


Are you familiar with the OSI model? bind() specifies the local IP address and port (layer 4) to use, so when the packet is physically sent out, it specifies that IP address as the sender, and connect() specifies the remote IP address and port to physically place in those packets.

As an aside, a lot of programming is pure "logic", and doesn't really have a "physical" meaning, unless by "physical" you actually mean "implementation detail", which will vary from platform to platform. If you're actually asking about the physical implementation meaning how "meaning" is transformed into electrical signals, you would probably be happier as a computer engineer than as a programmer.

like image 41
Sam Skuce Avatar answered Sep 20 '22 10:09

Sam Skuce