CNM parsing

The majority of the CNM syntax is a simple indentation-delimited block-based tree structure.

The general parsing strategy is to start in block mode, look for lines with block names then parse them in whatever their syntax is.

However, in practice, if we're only interested in specific blocks, all other blocks can be skipped without fully parsing them. Several parsing shortcuts are possible, among them:


A CNP/CNM server may want to add a site top-level block containing the sitemap to any CNM document that doesn't already have one.

This can be done in a single pass without any buffering by matching every line sent to the client with the ^site(\s|$) regular expression (or equivalent). If none matches once EOF is reached, the document does not contain a site block and one can be generated and appended to the end of the document. No actual parsing of CNM is required.

The same approach can be used for the links block.

Merging top-level blocks

The CNM specification specifies that block-mode top-level blocks with the same name can be merged. However, since their contents end when each top-level block ends, they can't simply be concatenated together.

As a workaround, all initial lines of each top-level block-mode block that are indented by more than one tab character can be removed from the document, since no top-level block supports child blocks with empty names. This ensures that the initial line is a new block with one level of indentation. This works for all block-mode top-level blocks.

Alternatively, for the content block, unknown child blocks are ignored, so a nonexistent block (e.g. __nonexistent__) can be inserted on a line between each pair of merged blocks, indented by one level, to force the parser to try to parse (and skip) it, thereby closing all other blocks.

For the title top-level blocks, the contents can be simply concatenated together, since they are parsed as simple text.

Inserting content

Since multiple top-level blocks are allowed, content can be prepended or appended to the document simply by inserting another content block on top or the bottom. Since CNM blocks are delimited with indentation and non-top-level blocks end when coming upon another block with less indentation than their contents, the new content block will not interfere with existing contents.

CNM selectors

CNM selectors used in the CNP select parameter require the selector to remove any blocks within the content blocks that are not the specified section or its parent or child.

This can be done quickly by ignoring and retaining all top-level blocks, not parsing their contents unless they're the content blocks. Within the content block, leave all non-container blocks (text, raw, embed) untouched and recurse into any container blocks, adding or removing their lines to a stack. If using a section index or section path selector, drop any section blocks with a non-empty title that doesn't match the selector. If a matching section block with a title is found, flush the stack in FIFO order to the output and continue from there, skipping the remainder of the content block once the section ends. Otherwise, if no matching section is found, pop the top of the stack when leaving the recursion for the current content block. Once the last (or only, in case of section name selector) section is found, output all of its content and then skip the rest of the content block.