Comparison of CNP and HTTP

Overview

Most CNP features are modeled on their equivalents in HTTP. However, the main goals of CNP are simplicity, consistency and being well-defined, so CNP has less features than HTTP and the ones that both share may be slightly different in CNP.

This document also provides a few example headers; for more examples, see doc/cnp-examples

Syntax

The most obvious difference between CNP and HTTP is the syntax.

CNP's header is (seemingly) a single line of lowercase text, while HTTP headers consist of multiple lines of case-insensitive parameters and end with a blank line.

HTTP headers resemble SMTP (email) headers. While the general syntax resembles Parameter-Key: parameter value, parsing them is more complex than merely splitting the string on the first occurrence of : (see the article HTTP for servers). Additionally, HTTP header values use the ISO-8859-1 text encoding by default, which is often overlooked by implementations (usually either by restricting it to ASCII or by using UTF-8). HTTP header parameter keys are also case-insensitive, which means that they often need to be normalized.

On the other hand, the CNP header is a simple string of space-separated values that ends with the first line feed byte. The CNP header is defined as a bytestring, though all predefined values consist of ASCII characters. To parse it, splitting by space and then splitting all parameters by the equals sign is sufficient. Arbitrary binary values may be used as both parameter keys and values, with only a few specific bytes needing to be escaped into a specific two-byte escape sequence. Because the protocol header is a case-sensitive bytestring and a specific string can only be escaped into exactly one bytestring, parameter key or value matching can be performed without needing to escape the value first by simply comparing it with a previously escaped search string.

The predefined CNP parameter keys are all lowercase ASCII letters and underscores, which is often simpler to represent in most programming languages than the mixed-case letters with dashes from HTTP header names.

Reading the entire CNP header line can often be accomplished using a single function call, such as fgets() from C (assuming a sufficiently large buffer is provided), rather than reading lines until a blank one is encountered. This also simplifies handling header size limiting, since only a limit for the entire line has to be used. In HTTP, both the maximum size of each individual header and the total number of headers must be limited.

All lines in HTTP headers end with a carriage return and line feed pair for historic reasons. The CNP header only ends with a line feed, which is often easier to parse, since only a single byte has to be matched and dealing with exceptions, such as either carriage return or line feed nor being present, unnecessary.

Because parsing is this simple, protocols such as CGI are not necessary, since any application can implement a simple parser for correct requests and be put behind a CNP reverse proxy that only lets through valid requests. For example, the following trivial Python 3 program parses all valid CNP messages from a file-like object conn:

def unescape(s):
	for k, v in (('\\0','\0'), ('\\n','\n'), ('\\_',' '), ('\\-','-'), ('\\\\','\\')):
		s = s.replace(k, v)
	return s
header = conn.readline()
version, intent, *params = header.rstrip('\n').split(' ')
intent = unescape(intent)
params = {unescape(k): unescape(v) for k, v in (p.split('=') for p in params)}
body = conn.read()

In C, the CNP header line can be parsed without reallocating the values in it by replacing every space, equals sign and the final line feed with a \0 character and unescaping values in place. Since escaped values can at most be twice the length of unescaped ones (when composed entirely of bytes that have to be escaped), unescaping will always fit within the space allocated for an escaped value. The \0 character can not appear in the header as itself either; it must be escaped. This results in a single chunk of memory containing zero-delimited C strings representing each header field.

Additionally, since the CNP header is just a single line, it is easier to write in fields where multi-line content is not allowed, such as parameters of a command-line utility. util/cnp-req uses this to provide CNP parameters as command-line arguments.

Request

Example HTTP request header:

GET /path/file%20name.txt HTTP/1.1
Host: example.com
If-Modified-Since: Thu, 01 Jan 1970 00:00:00 GMT
Connection: close

The HTTP request is composed of the method line, containing the verb (GET in the above example), path (/path/file name.txt) and HTTP version (1.1), and lines with header entries.

HTTP values, such as the path, can be escaped in several different ways, since percent-encoding can be used to replace any byte with its hexadecimal numeric value.

HTTP/1.0 supports the Host header and HTTP/1.1 makes it mandatory. This parameter is used to provide the hostname of the server that the client tried to access in order to distinguish between multiple domains hosted on one IP address.

Example CNP request header:

cnp/0.3 example.com/path/file\_name.txt if_modified=1970-01-01T00:00:00Z

Unlike HTTP, CNP integrates the hostname with the path into the "intent" field of the CNP message, following the CNP version, since it is an important part of the request.

Each value is always escaped into a specific byte sequence, which is in turn the sole and unique escaped representation of that value. This can make path matching faster, since the server can precompute escaped paths to match with the incoming requests.

Unlike HTTP, CNP has no request verbs. Every request without body data is equivalent to an idempotent HTTP GET request and a request with data corresponds to a HTTP POST request. Semantic HTTP request methods, such as PUT and DELETE, are functionally equivalent to the previous ones and can instead be implemented as request parameters. A HEAD equivalent is available using a CNP info selector.

Response

Example HTTP response header:

HTTP/1.1 404 Not Found
Content-Type: text/plain
Date: Thu, 01 Jan 1970 00:01:23 GMT
Connection: close

The status line in a HTTP response differs from the one in a request. It contains the HTTP version, a numeric response code and a short textual description of the status code.

HTTP defines several status codes, grouped by purpose based on the first digit. Many of these codes have similar meanings and are often misused by implementations.

Example CNP response header:

cnp/0.3 error reason=not_found type=text/plain time=1970-01-01T00:01:23Z

On the other hand, CNP responses use the same message syntax as requests, only differing on the intent field and defined parameters. The intent can only be one of a few defined choices: ok, error, not_modified and redirect. Each of these intents changes the interpretation of the response; smaller semantic meanings (such as the error reason) can be provided as a parameter instead. This way, the client can choose the general action (e.g. following a redirect, showing an error, outputting content, ...) based on the intent and then handle it more specifically based on the parameters.

Parameters

CNP defines significantly less header parameters than HTTP. This is largely intentional to avoid overcomplicating the protocol and simplify implementations.

Caching

CNP defines the if_modified request parameter and modified response parameter, which correspond to HTTP If-Modified-Since and Last-Modified headers, respectively. Additionally, a server may send a time parameter containing the current server time in the header; among other uses, this can be used to set the next if_modified parameter to an arbitrary timestamp between modified and time to reduce the impact of fingerprinting based on caching parameters.

Currently, there is no equivalent to a HTTP If-None-Match/ETag parameters. This is often unnecessary, since modification time based caching covers the majority of cases where it would be useful.

CNP 0.3 also lacks the Cache-Control header to allow caching content without issuing a request to check whether it has been modified. While this may be added in the future, it is also partially covered by the planned CNP batch requests, with which a page load will essentially contain two requests (one for the page and other for all embedded content, each of which can use if_modified). Since CNM, the primary document format used on ContNet pages, does not permit scripting, that is often the entirety of requests issued for one user interaction (following a hyperlink).

Errors

CNP error response intent covers all HTTP 4XX and 5XX status codes. The error meaning is provided in the reason response parameter.

In most cases (interactive web browser), all HTTP error responses are handled by simply showing the attached page and possibly displaying the error reason. CNP simplifies that by having one error response intent and fewer defined error reason values, which would hopefully cover most situations.

A notable case is the HTTP 400 Bad Request error. While it was originally intended to be sent as a response to invalid HTTP syntax sent by the client, it is often used to signify an application-level error, such as not providing a required GET parameter. CNP defines a specific reason (reason=rejected) for that to avoid confusing it with syntax errors (reason=syntax).

Redirect

HTTP provides several different 3XX status codes for redirects, but not all 3XX codes are redirects. This means that the client must know every new redirect code in order for it to work. The main difference between these redirects is whether they should be cached and whether non-idempotent requests, such as POST, should be allowed to be redirected.

CNP only has one redirect response intent. If an equivalent to Cache-Control HTTP header is implemented in a later version of CNP, it may be used to allow redirect caching; otherwise, redirects have to be followed every time. In general, this usually means one extra request/response pair when the user follows a hyperlink that gets redirected.

CNP redirect must always be a blank (body-less) request, equivalent to a HTTP GET. Since these requests often contain data of potentially significant size, uploading it for every redirect would be wasteful, in addition to unsafe since every hop would receive the entire request body.

If redirecting requests with body data is supported by a future version of CNP, it can be done using a response parameter. Modifying the intent itself is unnecessary, since it's still a redirect.

User-agent and server identification

The HTTP User-Agent request header and Server response header are used to provide information about the implementations of transfer endpoints. The User-Agent header is often used to provide different pages or layouts based on the client device, such as a desktop computer or a device with a small screen.

CNP does not include these parameters, since they are not relevant to the semantic content of the requested page and CNM does not include layout, which means that it's the job of the user-agent to provide a decent rendering of the content, not the server.

Additionally, these parameters are often used to track usage and deanonymize users. Since this usage is also not related to a semantic content-focused protocol, it is not considered as a reason for implementing these header parameters.

Cookies

HTTP cookies have several glaring problems in both privacy and usability. CNP 0.3 does not offer an alternative, but implementations for a future version of the protocol are mentioned in draft/cnp-session.

Content negotiation

CNP does currently not support content negotiation. While content negotiation would likely add significant header overhead, since the relevant parameters would have to be added to every response, it could be semantically useful (e.g. choosing a preferred language on the protocol level), so it may be added to a future version of CNP.

A significant downside of content negotiation is that it uses data external to the URL to select responses to content requests. This means that a single URL no longer points to a specific file but at most a resource in arbitrary representation.

For now, application-level content negotiation (e.g. language selection using a different path prefix) can be done instead.

Query strings

CNM 0.3 does not support forms and CNP 0.3 makes no attempt to integrate form data into the protocol so far. When forms are added to CNM, CNP will also get support for them. For now, CNP is primarily a content retrieval protocol.

Optimizations

CNP aims to include optimizations that are important for non-interactive hypertext pages while avoiding the complexity of optimizations that a general-purpose data transfer protocol used for applications can include.

Keep-alive and pipelining

Unlike HTTP/1.1, CNP does not implement keep-alive. Since CNM does not support scripting, all page contents are retrieved at once when the document is first loaded. While keep-alive could improve the loading speed there, batch requests also solve that problem without requiring to handle keep-alive connections, since the number of batch requests is know ahead of time and they're all issued before the server starts responding with content. This means that most pages will load in at most two requests: one normal for the page and one batch for embedded content, if any.

Header compression

In most cases, CNP requests are reasonably short, containing a few hundred bytes for host, path, filename, modification timestamp and current time. Unlike HTTP, where requests and responses often contain several kilobytes of data (caching parameters, cookies, referers, user-agent identification string, etc.), CNP messages should in general not contain so much data that compression would be necessary or result in significant performance improvements.

Content compression

CNP 0.3 does not offer content compression, but draft/cnp-compression is a plan to add it to a future version of the protocol.