title CNM parsing content text fmt The majority of the CNM syntax is a simple indentation-delimited block-based tree structure. The general parsing strategy is to start in @@/spec/cnm0.4/#/Syntax/Block\ mode block mode@@, look for lines with block names then parse them in whatever their syntax is. However, in practice, if we're only interested in specific blocks, all other blocks can be skipped without fully parsing them. Several parsing shortcuts are possible, among them: section Sitemaps text fmt A CNP/CNM server may want to add a ``site`` top-level block containing the sitemap to any CNM document that doesn't already have one. This can be done in a single pass without any buffering by matching every line sent to the client with the ``^site(\s|$)`` regular expression (or equivalent). If none matches once EOF is reached, the document does not contain a ``site`` block and one can be generated and appended to the end of the document. No actual parsing of CNM is required. The same approach can be used for the ``links`` block. section Merging top-level blocks text fmt The @@/spec/cnm0.4/#/Structure CNM specification@@ specifies that block-mode top-level blocks with the same name can be merged. However, since their contents end when each top-level block ends, they can't simply be concatenated together. As a workaround, all initial lines of each top-level block-mode block that are indented by more than one tab character can be removed from the document, since no top-level block supports child blocks with empty names. This ensures that the initial line is a new block with one level of indentation. This works for all block-mode top-level blocks. Alternatively, for the ``content`` block, unknown child blocks are ignored, so a nonexistent block (e.g. ``\_\_nonexistent\_\_``) can be inserted on a line between each pair of merged blocks, indented by one level, to force the parser to try to parse (and skip) it, thereby closing all other blocks. For the ``title`` top-level blocks, the contents can be simply concatenated together, since they are parsed as @@/spec/cnm0.4/#/Syntax/Simple\ text\ mode simple text@@. section Inserting content text fmt Since multiple top-level blocks are allowed, content can be prepended or appended to the document simply by inserting another ``content`` block on top or the bottom. Since CNM blocks are delimited with indentation and non-top-level blocks end when coming upon another block with less indentation than their contents, the new ``content`` block will not interfere with existing contents. section CNM selectors text fmt @@/spec/cnm0.4/#/Selectors CNM selectors@@ used in the @@/spec/cnp0.4/#/Request/Parameters/select CNP ``select`` parameter@@ require the selector to remove any blocks within the ``content`` blocks that are not the specified section or its parent or child. This can be done quickly by ignoring and retaining all top-level blocks, not parsing their contents unless they're the ``content`` blocks. Within the ``content`` block, leave all non-container blocks (``text``, ``raw``, ``embed``) untouched and recurse into any container blocks, adding or removing their lines to a stack. If using a section index or section path selector, drop any ``section`` blocks with a non-empty title that doesn't match the selector. If a matching ``section`` block with a title is found, flush the stack in FIFO order to the output and continue from there, skipping the remainder of the ``content`` block once the section ends. Otherwise, if no matching section is found, pop the top of the stack when leaving the recursion for the current content block. Once the last (or only, in case of section name selector) section is found, output all of its content and then skip the rest of the ``content`` block. site doc cnm-parsing links /spec/ Specifications /doc/ Documents /draft/ Drafts /lib/ Libraries /util/ Tools and utilities