ContNet

Overview

ContNet is a specification of a web protocol and a document markup format that together represent a more content-focused alternative to the World Wide Web.

It focuses on simplicity for both computers and humans. As a goal, a complete implementation of a client should be feasible as a one-man job instead of requiring years of development by a large team. The document format focuses on simple semantic hypertext content at the cost of layout, styling and scripting.

Why ContNet?

The Web, specifically HTTP as the protocol and HTML (with the use of CSS for styling and JavaScript for scripting) as the document format, is the de-facto hypertext information system implementation and the largest content distribution platform in the world.

Since the Web's inception, it has been constantly in development, with many new features being added both in official specifications and as proprietary vendor extensions. By now, it's an application platform of vast complexity. Implementing it from scratch is so complicated that, despite the Web being the biggest and most visible part of the Internet, only a few distinct implementations with close to full support exist. Usable features are defined by web browser support, not by their presence in the standards.

HTML integrates content and layout into a single markup. When mixed with CSS and JavaScript, especially when the content is loaded dynamically using scripts, the page has to fully render to draw the content on the screen, with the content only being presented in a format meant to be consumed by humans.

As a consequence of that, using the Web without one of the few available web browser engines is not viable, while using a full web browser often requires having a relatively modern computer, since most need hundreds of megabytes of memory and significant CPU power to work normally. General-purpose programmatic content extraction from arbitrary websites is almost impossible. In most cases, dedicated APIs have to be used to access clear content.

ContNet is an attempt to simplify both the implementation and the usage of content-based websites. It focuses on simple and unambiguous specification-defined behavior that is easy to understand and implement.

CNP, the application protocol part of ContNet, is similar to a simplified version of HTTP with stricter and clearer syntax that avoids specifying too many complex features in the hope that most clients will implement all of them instead of just a small subset. While CNP currently lacks advanced optimizations that may be present in HTTP/2, some of them, or similar features, will be added in a simple form in the future.

The markup language CNM is meant to be used to write simple hypertext pages that are easy to parse while still being readable and easy for humans to write. CNM puts focus on the (mostly textual) semantic content with a small amount of relevant metadata (such as navigation to related pages). The syntax is inspired by markup languages such as Markdown and LaTeX, but is simpler, unambiguous and easier to parse. By using indentation-delimited block structure, blocks can be nested without having to worry about using closing tags; any line with less indentation than the block contents ends the block. That should make it more human-friendly than the tag-based SGML syntax. CNM only includes semantic layout, such as sections, tables and lists. CNM also enables various parsing shortcuts that can make document manipulation significantly faster than fully parsing it.

In addition to CNM, ContNet includes an inline text formatting syntax called CNMfmt. It allows using simple semantic text markup, such as emphasized, quotation, code, or defining hyperlinks using a simple syntax that's somewhat similar to Markdown's. Unlike Markdown, it is not ambiguous and is significantly easier to parse. Each format is toggled using a simple two-symbol sequence and does not require LIFO toggling, while also disabling all formats at the end of every paragraph, so the document will not be ruined with a misplaced or missing closing tag.

Current status

CNP 0.4 and CNM 0.4 have recently been released. Specifications are available here:

CNP and CNM versions 0.1 have only been defined as a rough draft of a specification. They haven't been implemented due to various problems and oversights during the design, but they still form the basis for the future versions.

CNM version 0.2 does not have a specification, but only exists as an attempt at implementation. Version 0.3 is very similar and is well-defined instead. CNP 0.2 also lacks a specification and the implementation was incomplete.

Versions 0.3 of CNP and CNM are the first usable versions of ContNet. Specifications and reference implementations are available.

CNP 0.3 defines a simple request-response protocol. It's mostly meant to be used to request pages corresponding to files from a server. It supports basic caching in the form of a request using an if_modified parameter, where the server won't reply with the document if it hasn't been modified since the provided timestamp. It lacks batch requests, sessions and well-defined form submission, which are scheduled for later inclusion.

CNM 0.3 defines the general CNM indentation-based block structure, as well as the CNMfmt inline text formatting syntax. It leaves the possibility for different or extended text formatting to be added in the future. Version 0.3 also lacks data submission (forms).

CNP 0.4 adds content selectors to the specifications. This allows byte range serving (equivalent to HTTP Range: bytes requests) and HTTP HEAD request alternatives. Additionally, it also allows serving parts of documents using semantic selector queries, which isn't a feature of HTML+HTTP.

CNM 0.4 changes CNMfmt from specifying styles to being a semantic markup. With this, the entire CNM format is now purely semantic. Additionally, CNM 0.4 defines selectors, which are queries that can be used to select sections (for use in hash fragments to scroll the page to the section) or content by sections (for filtering document contents; in combination with CNP selectors, can be used to lazily load parts of a very large document).

While CNP and CNM versions 0.4 lack several important features, their main purpose is to provide a solid base for future versions to improve upon. As they currently are, they're still usable for delivery of hypertext pages and simple file transfer. Some future features are available as drafts.

Implementations and utilities

  • cnp-go: Go library for CNP message parsing and composition, includes a simple TCP CNP client and server

  • cnm-go: Go library for CNM parsing and composition

  • cnp-req: a very simple command-line CNP client

  • cn-fileserver: a simple file server using CNP; also lists directories as CNM pages

  • cn-http: CNP gateway/browser that translates CNP requests and responses into HTTP equivalents and converts CNM pages into HTML

  • vim-contnet: Vim syntax highlighting for CNP, CNM and CNMfmt