CNP 0.1

This is an archived copy of the original draft of the ContNet Protocol from 2013. It has never been implemented and contains numerous flaws.

# The CN protocol
The protocol simple to generate and parse by programs and to write and read by humans, even in its raw form.

# Request
A request is composed of the request header and the request body.

The request header consists of the requested path followed by any number of single-space separated key=value parameters that ends in a newline.

The body can contain any amount arbitrary data, including none. If a body is present, its length in bytes encoded as an ASCII decimal number must be given in the "length" parameter.

The path is a standard UNIX-style absolute filepath prefixed with the host address (example.com/path/to/file.txt). The filepath part is separated from the host with the first / in the path ("example.com/" is a valid path, "[1fff:8:a88:85a3::ac1f]:8001/foo/bar" is valid, "example.com" isn't). The parameter key and value can be any bytestring. Blank keys and values are permissible. An empty request header is invalid, at least the path must be given.

Certain parameter keys are reserved for specific functionality:

- length: the length of the body data in bytes; required if any data will be sent
- version: a dot-separated 2-tuple of the client version; if the version sent is x.y, the server may reply with response version x.<=y (the minor version can be any in the current major version up to the request version); if the version isn't sent, the server may reply with any version (such as the latest) under the assumption that the client is the user directly writing raw requests (so it can, for example, omit unnecessary data from the response, like the time-related headers)
- client: a string identifying the user agent; optional
- compression: the compression format of the request body, such as "gzip"; optional

Request format:

	example.com/request/path.ext param1=value1 another\_param=some\-value\0\n third-param= =without_name length=43
	data data data data
	data data data data
	...

Interpretation:

	{
		path: {
			host: "example.com"
			filepath: "/request/path.ext",
		},
		params: {
			"param1": "value1",
			"another param": "some=value\x00\x0a',
			"third-param": "",
			"": "without_name",
		},
		body: "data data data data\ndata data data data\n...",
	}

Examples:

	example.com/ version=0.0.1 client=get/0.1

	example.org:8080/post-reply version=0.0.1 length=3
	Hi!

### response
A response is also composed of a header and the body.

Instead of the request path, a response has the response type as the first token. This can be one of the following:

- content: response with the requested content in the standard document format.
- data: response with arbitrary data
- redirect: redirects the client to a different URI provided in the body; equivalent of HTTP 301, 302, 303, 307 and 308
- error: indicating an error with the error type in the "error" parameter and possibly with a page in the body; equivalent of HTTP 4xx and 5xx

The parameters function the same as in request. Predefined parameters:

- length: the length of the body; optional since the body ends when the connection is closed, but recommended; any response type
- version: see request, except that the server must always send this; any response type
- server: a string identifying the server program; optional; any response type
- name: the name of the current file/page; optional; content or data response
- time: a RFC3339 timestamp, preferably in the UTC timezone, representing the current time; any response type
- modified: a RFC3339 timestamp, preferably in the UTC timezone, representing the last modification time of the requested resource; content or data response
- type: the MIME type of the resource; data response
- compression: the compression format of the response body, such as "gzip" and "none"; content or data response
- error: a string identifying the error on an error response; error response type only:
	- "syntax": invalid request header; equivalent of HTTP 400
	- "invalid": a parameter was rejected (usually for having an invalid value)
	- "denied": server does not want to provide this content; equivalent of HTTP 401 and 403
	- "not found": the requested path (either host or the filepath) does not exist on the server; equivalent of HTTP 404
	- "too large": the server doesn't want to accept so much data (either header or body); equivalent of HTTP 413
	- "server error": internal server error; equivalent of HTTP 500
	- "not supported": requested feature isn't supported; equivalent of HTTP 406, 501 and 505

The response body can contain one of two things:

- The content in the [CN Content document format](/cnm/CNM 0.1 specification.cnc); response type "content"
- Arbitrary data/files (images, videos, etc.); response type "data"

Out of these, the former will be displayed and the latter might be embedded in the page if the client supports that, otherwise presented as a downloadable file.

Response format:

	ok param1=value1 param2=value2
	datadatadatadata

Interpretation:

	{
		type: "ok",
		params: {
			"param1": "value1",
			"param2": "value2",
		},
		body: "datadatadatadata",
	}

Examples:

	ok version=0.0.1 server=cnd/0.1
	Hello!

	error version=0.0.1 error=not\_found length=23
	The file was not found.

	redirect version=0.0.1
	/some/page

### escaping
The ASCII newline (0x0a), space (0x20), equals (0x3d), NUL (0x00) and backslash (0x5c) must be escaped in several contexts, such as the request/response headers and paths.

Escaping the equals sign is optional in paths, but can be done.

The tab (0x09) and carriage return (0x0d) characters do not have to be escaped, despite technically being whitespace.

- newline: \\n
- space: \\\_
- equals: \\-
- NUL: \\0
- backslash: \\\\

## The default content format
TODO

This should be a human readable, machine parsable and simple. The content should be obvious from the raw response. No layout, just content.

List of site links, a site tree and content?

Optional next/previous page, up?

Embed images or not?

Tag different parts of the file (like RSS or LaTeX?), such as links, related pages, sections, etc.

See gopher and markdown.

The current draft of the format can be seen [here](/cnm/CNM 0.1 specification.cnc).

## Comparison with HTTP
### cookies
TODO

Instead of cookies, CNP will have sessions. These are parameters bound to exact hosts and contain only a value (TODO: limit value size? exactly N bytes?).

To start a session, a response of the type "content" will have a "session" parameter. The client can (and possibly should) ask the user whether to accept the session. If the user had a session before, it is replaced with the new session. Setting an empty session key will end the session (the client should stop sending the session parameter). On every request while the session is active, the user will send a "session" parameter containing the session key. A session key is bound to the exact host (as specified in the host part of the request path). There is no session expiry or setting cross-host sessions. The client should show when a session is active (for example, display an indicator in a part of the UI) and let user end it at any time without much trouble. The server should not rely on the client accepting the session.

### POST
TODO

Basically, not required, as there are parameters and the request body. See cookies (esp. challenge/response part).

### HEAD
Unnecessary. The client can make a normal request, read the header and then close the connection without reading the body.

### forms
TODO

Prepend : (or another symbol) to form key parameter names when sending the request?

### REST and other APIs
Unnecessary. The protocol itself is an API. Information can be easily extracted from the default content format.

### range
TODO

Should be implemented. Possibly just the "range" header specifying a byte range "4-5", "-120", "8-", etc. (: is a possible alternate delimiter).

### if-none-match, if-modified-since
TODO

There are the "date" and "modified" params at the moment. One option is to let the client handle it: check if the date of the cached resource <= the modtime of what server sends and close the connection without reading the body if it's unchanged.

A "hash" param containing some hash of the content could be added. Which hash was used would not be important, as the value of the parameter would be compared to the hash parameter in the new response.

### keep-alive
Considered unnecessary. HTTP keep-alive requires either the content-length to be known in advance (meaning that data can't be generated while the request is being sent) or stuff like chunked encoding (CNP body data is raw data, no CNP-specific decoding necessary). The modern internet connections are fast enough that a TCP connection doesn't take too long. Keep-alive also has [other problems](https://en.wikipedia.org/wiki/HTTP_persistent_connection#Disadvantages).

### being bloated
Not implemented.

## Possible changes
- Let request/response header fields be separated by arbitrary number of spaces instead of exactly one (less strict, but would require splitting on fields instead of on characters)
- Forbid NUL character in header altogether (better C string support, probably not)
- Permit key-only parameters (no =, but would require careful splitting)
- TODO