6.1.1. Sockets

The most traditional interface of the network subsystem is the Berkeley socket interface. Historically, the Berkeley socket interface was developed at the University of California at Berkeley as a part of BSD 4.2 from 1981 to 1983. These days, it is present in virtually all flavors of Unix and Windows.

The Berkeley socket interface centers around the concept of a socket as an object that facilitates communication. The socket can be bound to a local address and connected to a remote address. Data can be sent and received over a socket.

int socket (int domain, int type, int protocol);

Domain specifies socket protocol class:

Type specifies socket semantics:

Protocol specifies socket protocol:

The socket call creates the socket object. An error is returned if the combination of class, type, protocol is not supported.

int bind (int sockfd, struct sockaddr *my_addr, socklen_t addrlen);

#define __SOCKADDR_COMMON(sa_prefix) \
  sa_family_t sa_prefix##family

struct sockaddr_in
{
  __SOCKADDR_COMMON (sin_);
  in_port_t      sin_port;
  struct in_addr sin_addr;
  unsigned char  sin_zero [sizeof (struct sockaddr) -
                           __SOCKADDR_COMMON_SIZE -
                           sizeof (in_port_t) -
                           sizeof (struct in_addr)];
};

struct sockaddr_in6
{
  __SOCKADDR_COMMON (sin6_);
  in_port_t       sin6_port;
  uint32_t        sin6_flowinfo;
  struct in6_addr sin6_addr;
  uint32_t        sin6_scope_id;
};

The bind call binds the socket to a given local address. The binding is typically necessary to tell the socket what local address to listen on for incoming connections.

int listen (int sockfd, int backlog);

The listen call tells the socket to listen for incoming connections and sets the length of the incoming connection queue.

int accept (int sockfd, struct sockaddr *addr, socklen_t *addrlen);

The accept call accepts an incoming connection on a listening socket that is SOCK_SEQPACKET, SOCK_STREAM or SOCK_RDM. The function returns a new socket and an address that the new socket is connected to and keeps the original socket untouched.

int connect (int sockfd,
             const struct sockaddr *serv_addr,
             socklen_t addrlen);

The connect call connects a socket that is SOCK_SEQPACKET, SOCK_STREAM or SOCK_RDM to a remote address. For other socket types, it sets a remote address of the socket.

ssize_t send (int sockfd, const void *buf, size_t len, int flags);
ssize_t sendto (int sockfd, const void *buf, size_t len, int flags,
                const struct sockaddr *to, socklen_t tolen);
ssize_t sendmsg (int sockfd, const struct msghdr *msg, int flags);

struct msghdr
{
  void         *msg_name;       // optional address
  socklen_t    msg_namelen;     // optional address length
  struct iovec *msg_iov;        // array for scatter gather
  size_t       msg_iovlen;      // array for scatter gather length
  void         *msg_control;    // additional control data
  socklen_t    msg_controllen;  // additional control data length
  int          msg_flags;
};

The send family of calls sends data over a socket. Either the socket is connected or the remote address is specified. The write call can also be used but the flags cannot be specified in that case.

ssize_t recv (int sockfd, void *buf, size_t len, int flags);
ssize_t recvfrom (int sockfd, void *buf, size_t len, int flags,
                  struct sockaddr *from, socklen_t *fromlen);
ssize_t recvmsg (int sockfd, struct msghdr *msg, int flags);

struct msghdr
{
  void         *msg_name;       // optional address
  socklen_t    msg_namelen;     // optional address length
  struct iovec *msg_iov;        // array for scatter gather
  size_t       msg_iovlen;      // array for scatter gather length
  void         *msg_control;    // additional control data
  socklen_t    msg_controllen;  // additional control data length
  int          msg_flags;
};

The recv family of calls receives data over a socket. The read call can also be used but the flags cannot be specified in that case.

The additional control data can provide data such as list of queued errors or additional protocol and transport information. The additional control data is structured as a list with headers and payload, which is protocol specific.

int select (int setsize,
            fd_set *readfds,
            fd_set *writefds,
            fd_set *exceptfds,
            struct timeval *timeout);

int poll (struct pollfd *ufds,
          unsigned int nfds,
          int timeout);

struct pollfd
{
  int fd;
  short events;         // requested events
  short revents;        // returned events
};

The select call is used to wait for data on several sockets at the same time. The arguments are sets of file descriptors, usually implemented as bitmaps. The file descriptors in readfds are waited for until a read would not block, the file descriptors in writefds are waited for until a write would not block, the file descriptors in exceptfds are waited for until an exceptional condition occurs. The call returns the number of file descriptors that meet the condition of the wait.

The poll call makes it possible to more precisely distinguish what events to wait for.

int getsockopt (int sockfd, int level,
                int optname, void *optval, socklen_t *optlen);

int setsockopt (int sockfd, int level,
                int optname, const void *optval, socklen_t optlen);

References. 

  1. Hewlett Packard: BSD Sockets Interface Programmers Guide

6.1.1.1. Example: Unix Sockets

Unix sockets represent a class of sockets used for local communication between processes. The sockets are represented by a file name or an abstract socket name.

struct sockaddr_un
{
  sa_family_t  sun_family;              // set to AF_UNIX
  char         sun_path [PATH_MAX];     // socket name
};

It is also possible to use sockets without names, the socketpair function creates a pair of connected sockets that can be inherited by child processes and used for communication.

int socketpair (int domain,
                int type,
                int protocol,
                int sockets [2]);

Unix sockets can use additional control data to send file descriptors or to send process credentials (PID, UID, GID) whose correctness is verified by kernel.

Important uses of the Unix sockets include the X protocol.

> netstat --unix --all (servers and established)
Proto RefCnt Flags   Type   State     Path
unix  2      [ ACC ] STREAM LISTENING /var/run/acpid.socket
unix  2      [ ACC ] STREAM LISTENING /tmp/.font-unix/fs7100
unix  2      [ ACC ] STREAM LISTENING /tmp/.gdm_socket
unix  2      [ ACC ] STREAM LISTENING /tmp/.X11-unix/X0
unix  2      [ ACC ] STREAM LISTENING /tmp/.ICE-unix/4088
unix  2      [ ACC ] STREAM LISTENING /var/run/dbus/system_bus_socket
unix  3      [ ]     STREAM CONNECTED /var/run/dbus/system_bus_socket
unix  2      [ ]     DGRAM            @/var/run/hal/hotplug_socket
unix  2      [ ]     DGRAM            @udevd
unix  2      [ ACC ] STREAM LISTENING /tmp/xmms_ceres.0
unix  3      [ ]     STREAM CONNECTED /tmp/.X11-unix/X0
unix  3      [ ]     STREAM CONNECTED /tmp/.ICE-unix/4088

6.1.1.2. Example: Linux Netlink Sockets

Netlink sockets represent a class of sockets used for communication between processes and kernel. The sockets are represented by a netlink family that is specified in place of protocol when creating the socket.

  • NETLINK_ARPD - ARP table

  • NETLINK_ROUTE - routing updates and modifications of IPv4 routing table

  • NETLINK_ROUTE6 - routing updates and modifications of IPv6 routing table

  • NETLINK_FIREWALL - IPv4 firewall

  • ...

Messages sent over the netlink socket have a standardized format. Macros and libraries are provided for handling messages of specific netlink families.

6.1.1.3. Example: Windows Winsock Sockets

From the application programmer perspective, Winsock sockets offer an interface that is, in principle, based on that of the Berkeley sockets. From the service programmer perspective, Winsock offers an interface that allows service providers to install multiple protocol libraries underneath the unified API. The interface, called SPI (Service Provider Interface), distinguishes two types of services, transport and naming, and allows layering of protocol libraries.