cnp/0.4 contnet.org/doc/cnp-and-http/ cnp/0.4 ok length=15568 modified=2021-02-05T08:26:28Z name=index.cnm time=2024-04-24T13:39:28Z type=text/cnm title Comparison of CNP and HTTP content section Overview text fmt Most CNP features are modeled on their equivalents in HTTP. However, the main goals of CNP are simplicity, consistency and being well-defined, so CNP has less features than HTTP and the ones that both share may be slightly different in CNP. This document also provides a few example headers; for more examples, see @@/doc/cnp-examples/ doc/cnp-examples@@ section Syntax text fmt The most obvious difference between CNP and HTTP is the syntax. CNP's header is (seemingly) a single line of lowercase text, while HTTP headers consist of multiple lines of case-insensitive parameters and end with a blank line. HTTP headers resemble SMTP (email) headers. While the general syntax resembles ``Parameter-Key: parameter value``, parsing them is more complex than merely splitting the string on the first occurrence of ``: `` (see the article @@http://www.and.org/texts/server-http HTTP for servers@@). Additionally, HTTP header values use the ISO-8859-1 text encoding by default, which is often overlooked by implementations (usually either by restricting it to ASCII or by using UTF-8). HTTP header parameter keys are also case-insensitive, which means that they often need to be normalized. text fmt On the other hand, the CNP header is a simple string of space-separated values that ends with the first line feed byte. The CNP header is defined as a bytestring, though all predefined values consist of ASCII characters. To parse it, splitting by space and then splitting all parameters by the equals sign is sufficient. Arbitrary binary values may be used as both parameter keys and values, with only a few specific bytes needing to be escaped into a specific two-byte escape sequence. Because the protocol header is a case-sensitive bytestring and a specific string can only be escaped into exactly one bytestring, parameter key or value matching can be performed without needing to escape the value first by simply comparing it with a previously escaped search string. text fmt The predefined CNP parameter keys are all lowercase ASCII letters and underscores, which is often simpler to represent in most programming languages than the mixed-case letters with dashes from HTTP header names. Reading the entire CNP header line can often be accomplished using a single function call, such as ``fgets()`` from C (assuming a sufficiently large buffer is provided), rather than reading lines until a blank one is encountered. This also simplifies handling header size limiting, since only a limit for the entire line has to be used. In HTTP, both the maximum size of each individual header and the total number of headers must be limited. All lines in HTTP headers end with a carriage return and line feed pair for historic reasons. The CNP header only ends with a line feed, which is often easier to parse, since only a single byte has to be matched and dealing with exceptions, such as either carriage return or line feed nor being present, unnecessary. Because parsing is this simple, protocols such as CGI are not necessary, since any application can implement a simple parser for correct requests and be put behind a CNP reverse proxy that only lets through valid requests. For example, the following trivial Python 3 program parses all valid CNP messages from a ``file``-like object ``conn``: raw python def unescape(s): for k, v in (('\\0','\0'), ('\\n','\n'), ('\\_',' '), ('\\-','-'), ('\\\\','\\')): s = s.replace(k, v) return s header = conn.readline() version, intent, *params = header.rstrip('\n').split(' ') intent = unescape(intent) params = {unescape(k): unescape(v) for k, v in (p.split('=') for p in params)} body = conn.read() text fmt In C, the CNP header line can be parsed without reallocating the values in it by replacing every space, equals sign and the final line feed with a ``\\0`` character and unescaping values in place. Since escaped values can at most be twice the length of unescaped ones (when composed entirely of bytes that have to be escaped), unescaping will always fit within the space allocated for an escaped value. The ``\\0`` character can not appear in the header as itself either; it must be escaped. This results in a single chunk of memory containing zero-delimited C strings representing each header field. Additionally, since the CNP header is just a single line, it is easier to write in fields where multi-line content is not allowed, such as parameters of a command-line utility. @@/util/cnp-req/ util/cnp-req@@ uses this to provide CNP parameters as command-line arguments. section Request text fmt Example HTTP request header: raw message/http GET /path/file%20name.txt HTTP/1.1 Host: example.com If-Modified-Since: Thu, 01 Jan 1970 00:00:00 GMT Connection: close text fmt The HTTP request is composed of the method line, containing the verb (``GET`` in the above example), path (``/path/file name.txt``) and HTTP version (``1.1``), and lines with header entries. HTTP values, such as the path, can be escaped in several different ways, since @@https://en.wikipedia.org/wiki/Percent-encoding percent-encoding@@ can be used to replace any byte with its hexadecimal numeric value. HTTP/1.0 supports the ``Host`` header and HTTP/1.1 makes it mandatory. This parameter is used to provide the hostname of the server that the client tried to access in order to distinguish between multiple domains hosted on one IP address. Example CNP request header: raw message/cnp cnp/0.3 example.com/path/file\_name.txt if_modified=1970-01-01T00:00:00Z text fmt Unlike HTTP, CNP integrates the hostname with the path into the "intent" field of the CNP message, following the CNP version, since it is an important part of the request. Each value is always escaped into a specific byte sequence, which is in turn the sole and unique escaped representation of that value. This can make path matching faster, since the server can precompute escaped paths to match with the incoming requests. Unlike HTTP, CNP has no request verbs. Every request without body data is equivalent to an idempotent HTTP ``GET`` request and a request with data corresponds to a HTTP ``POST`` request. Semantic HTTP request methods, such as ``PUT`` and ``DELETE``, are functionally equivalent to the previous ones and can instead be implemented as request parameters. A ``HEAD`` equivalent is available using a @@/spec/cnp0.4/#/Request/Parameters/select/info CNP info selector@@. section Response text fmt Example HTTP response header: raw message/http HTTP/1.1 404 Not Found Content-Type: text/plain Date: Thu, 01 Jan 1970 00:01:23 GMT Connection: close text fmt The status line in a HTTP response differs from the one in a request. It contains the HTTP version, a numeric response code and a short textual description of the status code. HTTP defines several status codes, grouped by purpose based on the first digit. Many of these codes have similar meanings and are often misused by implementations. Example CNP response header: raw message/cnp cnp/0.3 error reason=not_found type=text/plain time=1970-01-01T00:01:23Z text fmt On the other hand, CNP responses use the same message syntax as requests, only differing on the intent field and defined parameters. The intent can only be one of a few defined choices: ``ok``, ``error``, ``not_modified`` and ``redirect``. Each of these intents changes the interpretation of the response; smaller semantic meanings (such as the error reason) can be provided as a parameter instead. This way, the client can choose the general action (e.g. following a redirect, showing an error, outputting content, ...) based on the intent and then handle it more specifically based on the parameters. section Parameters text fmt CNP defines significantly less header parameters than HTTP. This is largely intentional to avoid overcomplicating the protocol and simplify implementations. section Caching text fmt CNP defines the ``if_modified`` request parameter and ``modified`` response parameter, which correspond to HTTP ``If-Modified-Since`` and ``Last-Modified`` headers, respectively. Additionally, a server may send a ``time`` parameter containing the current server time in the header; among other uses, this can be used to set the next ``if_modified`` parameter to an arbitrary timestamp between ``modified`` and ``time`` to reduce the impact of fingerprinting based on caching parameters. Currently, there is no equivalent to a HTTP ``If-None-Match``/``ETag`` parameters. This is often unnecessary, since modification time based caching covers the majority of cases where it would be useful. CNP 0.3 also lacks the ``Cache-Control`` header to allow caching content without issuing a request to check whether it has been modified. While this may be added in the future, it is also partially covered by the planned @@/draft/cnp-batch/ CNP batch requests@@, with which a page load will essentially contain two requests (one for the page and other for all embedded content, each of which can use ``if_modified``). Since CNM, the primary document format used on ContNet pages, does not permit scripting, that is often the entirety of requests issued for one user interaction (following a hyperlink). section Errors text fmt CNP ``error`` response intent covers all HTTP ``4XX`` and ``5XX`` status codes. The error meaning is provided in the ``reason`` response parameter. In most cases (interactive web browser), all HTTP error responses are handled by simply showing the attached page and possibly displaying the error reason. CNP simplifies that by having one error response intent and fewer defined error reason values, which would hopefully cover most situations. A notable case is the HTTP ``400 Bad Request`` error. While it was originally intended to be sent as a response to invalid HTTP syntax sent by the client, it is often used to signify an application-level error, such as not providing a required GET parameter. CNP defines a specific reason (``reason=rejected``) for that to avoid confusing it with syntax errors (``reason=syntax``). section Redirect text fmt HTTP provides several different ``3XX`` status codes for redirects, but not all ``3XX`` codes are redirects. This means that the client must know every new redirect code in order for it to work. The main difference between these redirects is whether they should be cached and whether non-idempotent requests, such as ``POST``, should be allowed to be redirected. CNP only has one ``redirect`` response intent. If an equivalent to ``Cache-Control`` HTTP header is implemented in a later version of CNP, it may be used to allow redirect caching; otherwise, redirects have to be followed every time. In general, this usually means one extra request/response pair when the user follows a hyperlink that gets redirected. CNP redirect must always be a blank (body-less) request, equivalent to a HTTP ``GET``. Since these requests often contain data of potentially significant size, uploading it for every redirect would be wasteful, in addition to unsafe since every hop would receive the entire request body. If redirecting requests with body data is supported by a future version of CNP, it can be done using a response parameter. Modifying the intent itself is unnecessary, since it's still a redirect. section User-agent and server identification text fmt The HTTP ``User-Agent`` request header and ``Server`` response header are used to provide information about the implementations of transfer endpoints. The ``User-Agent`` header is often used to provide different pages or layouts based on the client device, such as a desktop computer or a device with a small screen. CNP does not include these parameters, since they are not relevant to the semantic content of the requested page and CNM does not include layout, which means that it's the job of the user-agent to provide a decent rendering of the content, not the server. Additionally, these parameters are often used to track usage and deanonymize users. Since this usage is also not related to a semantic content-focused protocol, it is not considered as a reason for implementing these header parameters. section Cookies text fmt HTTP cookies have @@http://lcamtuf.blogspot.com/2010/10/http-cookies-or-how-not-to-design.html several glaring problems@@ in both privacy and usability. CNP 0.3 does not offer an alternative, but implementations for a future version of the protocol are mentioned in @@/draft/cnp-session/ draft/cnp-session@@. section Content negotiation text fmt CNP does currently not support content negotiation. While content negotiation would likely add significant header overhead, since the relevant parameters would have to be added to every response, it could be semantically useful (e.g. choosing a preferred language on the protocol level), so it may be added to a future version of CNP. A significant downside of content negotiation is that it uses data external to the URL to select responses to content requests. This means that a single URL no longer points to a specific file but at most a resource in arbitrary representation. For now, application-level content negotiation (e.g. language selection using a different path prefix) can be done instead. section Query strings text fmt CNM 0.3 does not support forms and CNP 0.3 makes no attempt to integrate form data into the protocol so far. When forms are added to CNM, CNP will also get support for them. For now, CNP is primarily a content retrieval protocol. section Optimizations text fmt CNP aims to include optimizations that are important for non-interactive hypertext pages while avoiding the complexity of optimizations that a general-purpose data transfer protocol used for applications can include. section Keep-alive and pipelining text fmt Unlike HTTP/1.1, CNP does not implement keep-alive. Since CNM does not support scripting, all page contents are retrieved at once when the document is first loaded. While keep-alive could improve the loading speed there, @@/draft/cnp-batch/ batch requests@@ also solve that problem without requiring to handle keep-alive connections, since the number of batch requests is know ahead of time and they're all issued before the server starts responding with content. This means that most pages will load in at most two requests: one normal for the page and one batch for embedded content, if any. section Header compression text fmt In most cases, CNP requests are reasonably short, containing a few hundred bytes for host, path, filename, modification timestamp and current time. Unlike HTTP, where requests and responses often contain several kilobytes of data (caching parameters, cookies, referers, user-agent identification string, etc.), CNP messages should in general not contain so much data that compression would be necessary or result in significant performance improvements. section Content compression text fmt CNP 0.3 does not offer content compression, but @@/draft/cnp-compression/ draft/cnp-compression@@ is a plan to add it to a future version of the protocol. site doc cnp-and-http links /spec/ Specifications /doc/ Documents /draft/ Drafts /lib/ Libraries /util/ Tools and utilities