Table of Contents

Table of Chapters

6. TCP ZERO-COPY

6.1 Overview

This section documents an optional extension to the InterNiche Sockets layer, the TCP Zero-Copy API. This extension is only present if the stack has been built with the TCP_ZEROCOPY package option defined in ipport.h_h. See the Package Options section (2.2.7) for information about how to enable this option.

The TCP Zero-Copy API is intended to assist the development of higher-performance embedded network applications by allowing the application direct access to the InterNiche TCP/IP stack's packet buffers. This feature can be used to avoid the overhead of having the stack copy data between application-owned buffers and stack-owned buffers in t_send() and t_recv(), but it comes at the cost that the application will have to fit its data into, and accept its data from, the stack's buffers.

The TCP Zero-Copy API comprises two functions for allocation and freeing of packet buffers, a third function for sending a packet buffer on an open socket, an application-supplied callback function for accepting received packets, and an extension to the Sockets t_setsockopt() function for registration of the callback function. The TCP Zero-Copy API can be this small because it is simply an extension to the existing Sockets API that provides an alternate mechanism for sending and receiving data on a socket, and the Sockets API is used for all other operations on the socket.

The two functions for allocation and freeing of packet buffers are straightforward requests to allocate a packet buffer from the stack's pool of packet buffers, tcp_pktalloc(), and free a packet buffer, tcp_pktfree(). Applications using the TCP Zero-Copy API are responsible for allocating packet buffers for use in sending data, as well as for freeing buffers that have been used to receive data and those that the application has allocated but decided not to use for sending data. As these packet buffers are a limited resource, it is important that applications free them promptly when they are no longer of use.

The function for sending data, tcp_xout(), sends a packet buffer of data via a socket. If successful, it is considered to have consumed the supplied buffer and so there is no need for the application to free the buffer via tcp_pktfree().

Applications that use the TCP Zero-Copy API for receiving data must include a callback function for acceptance of received packets, and must register the callback function with the socket using the t_setsockopt() Sockets function with the SO_CALLBACK option name. The callback function, once registered, receives not only received data packets, but also connection events that result in socket errors.

6.2 Sending Data with the TCP Zero-Copy API

6.2.1 Allocate a Packet Buffer

The first step in using the TCP Zero-Copy API to send data is to allocate a packet buffer from the stack using the tcp_pktalloc() function. This function takes a single argument, the maximum length of the data you intend to send in the buffer, and returns a PACKET, a pointer to a network buffer structure as described in the section titled "The netbuf Structure and the Packet Queues".

PACKET pkt;   /* pointer to netbuf structure for packet buffer */
int datalen;   /* amount of data to send */

datalen = 512;   /* should indicate amount of data to send */
pkt = tcp_pktalloc(datalen);
if (pkt == NULL)
{
   /* error, could not allocate packet buffer */
}

Note that this limits how much data that you can send in one call using the TCP Zero-Copy API: the data sent in one call to tcp_xout() must fit in a single packet buffer, along with the TCP, IP, and lower-layer headers that the stack will need to add in order to send the packet. The actual limit is determined by the big packet buffer size, bigbufsiz (also described in "The netbuf Structure and the Packet Queues"), less the HDRSLEN definition in tcpport.h. If you try to request a larger buffer than this, tcp_pktalloc() will return NULL to indicate that it could not allocate a sufficiently-large buffer.

6.2.2 Fill the Allocated Buffer with Data

Having allocated the packet buffer, you should now fill it with the data to send. tcp_pktalloc() will have initialized the returned PACKET and so pkt->nb_prot will point to where you can start depositing data.

When you have filled the buffer, you must set pkt->nb_plen to the number of bytes of data that you have placed in the buffer.

6.2.3 Send the Packet

Finally, you send the packet by giving it back to the stack via tcp_xout(), which will send the packet via TCP, or return an error. If its return value is less than zero, then it has not accepted the packet and the application must either free it or retain it for sending later.

e = tcp_xout(s, pkt);
if (e < 0)
{
   tcp_pktfree(pkt);
}

6.3 Receiving Data with the TCP Zero-Copy API

6.3.1 Writing a Callback Function

Using the TCP Zero-Copy API for receiving data requires the application developer to write a callback function that the stack can use to inform the application of received data packets and other socket events. This function is expected to conform to the following prototype:

int rx_callback(struct socket * so, PACKET pkt, int code);

The stack will call this function when it has a received data packet or other event to report for a socket. It will identify the socket with so, pass a pointer to the packet buffer (if there is a packet buffer) in pkt, and pass an error event (if there is an error to report) in code.

If the application is using the same callback function for several sockets, it can use so to identify the socket for which the callback has occurred. For example, the following code fragment walks a list of data structures to find one with a matching socket, and illustrates a way to compare the so argument with a socket returned by t_socket().

for (ftps = ftplist; ftps; ftps = ftps->next)
   if((long)ftps->datasock == SO2LONG(so))
      break;

Once the callback function has identified the socket, it should examine the pkt and code parameters as these contain the information about the socket.

If pkt is not NULL, it is a pointer to a packet buffer containing received data for the socket. pkt->nb_prot points to the start of the received data, and pkt->nb_len indicates the number of bytes of received data in this buffer. If the callback function returns 0, it indicates that it has accepted responsibility for the packet buffer and will return it to the stack (via the tcp_pktfree() function) when it no longer requires the buffer. If the callback function returns any non-zero value, it indicates to the stack that it has not accepted responsibility for the packet buffer. The stack will keep the packet buffer queued and will call the callback function again at a later time.

If code is not 0, it is a socket error indicating that an error or other event has occurred on the socket. Typical non-zero values will be ESHUTDOWN, indicating that the connected peer has closed its end of the connection and will send no more data; and ECONNRESET, indicating that the connected peer has abruptly closed its end of the connection and will neither send nor receive more data.

Note that the callback function is called from the stack and is expected to return promptly. Some of the places where the stack calls the callback function require that the stack's data structures remain consistent through the callback, so the callback function should not call back into the stack except to call tcp_pktfree().

6.3.2 Registering the Callback Function

The application also needs to inform the stack of the callback function. If the stack has been built with the TCP_ZEROCOPY option enabled, the t_setsockopt() function provides an additional socket option, SO_CALLBACK, which should be used for this purpose once the socket has been created. The following code fragment illustrates the use of this option to register a callback function named rxupcall() on the socket sock:

t_setsockopt(sock, SO_CALLBACK, (void *)rxupcall);

t_setsockopt() is described in section 5.5, "Sockets API Calls Reference".

6.4 TCP Zero-Copy API Reference

Name

tcp_pktalloc()

Syntax

PACKET tcp_pktalloc(int datasize);

Parameters

int datasize /* size of TCP data for packet */

Description

tcp_pktalloc() allocates a packet buffer large enough to hold datasize bytes of TCP data, plus TCP, IP, and MAC headers. It is a small wrapper around the internal pk_alloc() function that provides the necessary synchronization and calculation of header length.

tcp_pktalloc() should be called to allocate a buffer for sending data via tcp_xout(). It will return the allocated packet buffer with its pkt->nb_prot field set to where the application should deposit the data to be sent.

Returns

Returns a PACKET (pointer to struct netbuf) if OK, else NULL if a big enough packet was not available.

See Also

tcp_pktfree(), tcp_xout()

Name

tcp_pktfree()

Syntax

VOID TCP_PKTFREE(PACKET P);

Description

tcp_pktfree() frees a packet allocated by (presumably) tcp_pktalloc() or passed to the application by a callback. This is a simple wrapper around pk_free() to lock and unlock the free-queue resource.

Parameters

PACKET p /* the pointer to the packet to be returned to the Protocol stack */

Returns

No value is returned. If the passed packet is already in a free queue, has been corrupted, or does not appear to be a valid packet, a dtrap() may be generated by the debugging logic.

See Also

tcp_pktalloc()

Name

tcp_xout()

Syntax

int tcp_xout(long s, PACKET pkt);

Parameters

long s /* socket on which packet is to be sent */

PACKET pkt /* pointer to packet to be sent */

Description

The tcp_xout() call sends a packet buffer on a socket. The packet buffer must be initialized with pkt->nb_prot pointing to the start of the application data to be sent (this will have been set properly by tcp_pktalloc()), and with pkt->nb_plen set to the number of bytes of data to be sent.

Returns

An integer indicating the success or failure of the function. A returned value of zero indicates that the packet was sent successfully. Returned values less than zero indicate errors, and that the packet was not accepted by the stack (so the application must either re-send the packet via a later call to tcp_xout() or free the packet via tcp_pktfree()). Returned values greater than zero indicate that the packet has been accepted and queued on the socket but has not yet been transmitted.

See Also

tcp_pktalloc(), tcp_pktfree()