The majority of the CNM syntax is a simple indentation-delimited block-based tree structure.
The general parsing strategy is to start in block mode, look for lines with block names then parse them in whatever their syntax is.
However, in practice, if we're only interested in specific blocks, all other blocks can be skipped without fully parsing them. Several parsing shortcuts are possible, among them:
Sitemaps¶
A CNP/CNM server may want to add a site
top-level block containing the sitemap to any CNM document that doesn't already have one.
This can be done in a single pass without any buffering by matching every line sent to the client with the ^site(\s|$)
regular expression (or equivalent). If none matches once EOF is reached, the document does not contain a site
block and one can be generated and appended to the end of the document. No actual parsing of CNM is required.
The same approach can be used for the links
block.
Merging top-level blocks¶
The CNM specification specifies that block-mode top-level blocks with the same name can be merged. However, since their contents end when each top-level block ends, they can't simply be concatenated together.
As a workaround, all initial lines of each top-level block-mode block that are indented by more than one tab character can be removed from the document, since no top-level block supports child blocks with empty names. This ensures that the initial line is a new block with one level of indentation. This works for all block-mode top-level blocks.
Alternatively, for the content
block, unknown child blocks are ignored, so a nonexistent block (e.g. __nonexistent__
) can be inserted on a line between each pair of merged blocks, indented by one level, to force the parser to try to parse (and skip) it, thereby closing all other blocks.
For the title
top-level blocks, the contents can be simply concatenated together, since they are parsed as simple text.
Inserting content¶
Since multiple top-level blocks are allowed, content can be prepended or appended to the document simply by inserting another content
block on top or the bottom. Since CNM blocks are delimited with indentation and non-top-level blocks end when coming upon another block with less indentation than their contents, the new content
block will not interfere with existing contents.
CNM selectors¶
CNM selectors used in the CNP select
parameter require the selector to remove any blocks within the content
blocks that are not the specified section or its parent or child.
This can be done quickly by ignoring and retaining all top-level blocks, not parsing their contents unless they're the content
blocks. Within the content
block, leave all non-container blocks (text
, raw
, embed
) untouched and recurse into any container blocks, adding or removing their lines to a stack. If using a section index or section path selector, drop any section
blocks with a non-empty title that doesn't match the selector. If a matching section
block with a title is found, flush the stack in FIFO order to the output and continue from there, skipping the remainder of the content
block once the section ends. Otherwise, if no matching section is found, pop the top of the stack when leaving the recursion for the current content block. Once the last (or only, in case of section name selector) section is found, output all of its content and then skip the rest of the content
block.