cnp/0.4 contnet.org/spec/cnm0.4/ cnp/0.4 ok length=34557 modified=2021-02-05T08:26:28Z name=index.cnm time=2024-03-28T15:48:59Z type=text/cnm title ContNet Markup specification, version 0.4 (2017-09-04) content section Overview text CNM is a lightweight markup language primarily meant to be used as the hypertext document markup format for ContNet. It is a line-based Unicode text markup format with indentation-delimited blocks. The primary goals of CNM are simple parsing and composition, as well as being readable and writable by humans. CNM contains semantic content of hypertext pages. It does not include layout, styles or scripts, as all of that is supposed to be handled by the rendering application. As such, it aims to avoid obfuscating content behind presentation and supports responsive design, as every device can render the content to fit its screen and interface. section Syntax text fmt All parts of CNM use the UTF-8 encoding. Any invalid UTF-8 sequence is replaced with the ``U+FFFD`` replacement character. A CNM document is mainly composed of blocks defined by indentation. The core structure of the document consists of nested blocks containing other blocks, with the leaves being either blocks with no child blocks or some form of text that does not contain any blocks. Each line in the document ends in a line feed character. All raw (not provided as an escape sequence) carriage return or null characters in the document are ignored. If the document does not end with a line feed character, it is parsed as if it had ended with one. Parsing method for the contents of a block depend on which block it is. If the block is not known, it should be ignored and all of its contents skipped by advancing until the next nonempty line with less indentation than the unknown block's contents. When whitespace is mentioned in the specification, it refers to the following ASCII whitespace characters: tab (``U+0009``), line feed (``U+000A``), form feed (``U+000C``) and space (``U+0020``) in their raw Unicode character form, not as an escape sequence. All other Unicode whitespace characters stand for themselves and are not collapsed or used to split fields. An empty line is a line consisting of at most as much indentation as the parent block's contents and nothing else. Such lines implicitly belong to the last parsed block regardless of the amount of indentation and act the same as if the indentation depth was the same as the block's contents. text fmt __**TL;DR:** Encoded in UTF-8, line-based. LF is line terminator, CR is ignored. Unknown blocks' contents are skipped. text The following general syntactic contexts are commonly used: section Block mode text fmt In block mode, every nonempty line is parsed as a block name line. The block name line consists of a list of whitespace-delimited simple text tokens. The line is first split on each sequence of one or more whitespace characters that are not a part of a simple text escape sequence (specifically, not ``"\\ "``). If there's any leading or trailing whitespace, the first or last token is an empty string after splitting. If the splitting ends with a single empty token (the entire line was just whitespace), the line is treated the same as an empty line and is skipped. The first token in the block name line is the block name. It defines the meaning of the block and how its contents are parsed. The remaining tokens, if any, represent the block's arguments. All empty tokens in the arguments should be ignored. Some blocks might use the arguments as one single value; in that case, the arguments are joined together with spaces. Note that excess tabs or space indentation will result in a block with an empty name. This will usually result in an unknown block, which will then be skipped. All lines following the block name line that are indented at least one level more than the block name or are empty are parsed as the contents of the named block. For every such line, the initial indentation equal to one level more than the block name's is removed and the remainder of the line parsed according to the named block's mode (the inner block keeps any tab characters in excess of the indentation). Block mode parsing in the current block resumes on the first nonempty line that has less indentation than the contents of the last named block. text fmt __**TL;DR:** Block mode contains blocks. Each block starts with line containing simple text name and optional arguments, split by non-escaped whitespace. All lines indented over the indentation of the block name line are contents of that block. section Simple text mode text Simple text is parsed by collapsing all raw (not provided as an escape sequence) whitespace into a single space and removing any leading or trailing spaces, then resolving escape sequences. Simple text can contain escape sequences. These are C-style sequences of two or more characters that begin with a backslash and are parsed as a single character they represent. The following escape sequences are currently defined (without quotes): raw "\b" -> U+0008 backspace "\t" -> U+0009 tab "\n" -> U+000A line feed "\v" -> U+000B vertical tab "\f" -> U+000C form feed "\r" -> U+000D carriage return "\ " -> U+0020 space "\\" -> U+005C backslash "\x##" -> U+00## 8-bit Unicode character "\u####" -> U+#### 16-bit Unicode character "\U########" -> U+######## 32-bit Unicode character text fmt The ``#`` characters in ``\\x##``, ``\\u####`` and ``\\U########`` escape sequences are arbitrary hexadecimal digits ``[0-9a-fA-F]``. In ``\\U########``, the first two digits should generally be zero, since Unicode only supports 21-bit characters. Invalid codepoints are unescaped into the ``U+FFFD`` replacement character. Any other sequence starting with a backslash that is not in the above table, or one of the ``\\x``, ``\\u`` and ``\\U`` sequences with too few hex digits, are parsed the same as if the backslash itself was escaped: they're left in the text unchanged, with the backslash remaining present. Simple text mode is mostly used in block mode block names and arguments or as a part of other formats in specific blocks. text fmt __**TL;DR:** Collapse and trim whitespace. Handle C-style escape sequences. Invalid escape sequences are parsed as normal text. section Raw text mode text In raw text mode, all data is parsed as a literal text blob. Whitespace is preserved exactly as-is, including any leading tabs (tabs that are a part of the block's indentation do not count as a part of the block content in block mode) and empty lines inside the content, excluding any leading or trailing empty lines, which are removed. Global text parsing rules (ignoring carriage returns, UTF-8) still apply. Each raw text line also retains its line feed character. Raw mode is mostly used for the raw block and for the initial parsing of other blocks with their own syntax. In essence, every block could first be parsed in raw mode, then the results of that using the block's parsing mode. text fmt __**TL;DR:** Lines are kept unmodified for later processing. section Structure text fmt The top level of a CNM document is parsed in block mode. It contains blocks containing metadata and the content itself. None of the top-level blocks in CNM have any arguments. An empty top-level block is equivalent to an absent one. If the same top-level block appears multiple times in the document, the contents of all instances are merged together. The content merging happens after parsing, so all child blocks end with the end of each instance of a top-level block. This means that a child block of one of multiple instances of container blocks (``content``, ``site`` and ``links``) is fully contained in its parent top-level block and cannot extend into the next one. Simple text blocks (``title``) can just merge their contents as if all of their lines belonged to a single block, since simple text collapses whitespace anyway. The following blocks are defined on the top level: section title text Contains the document title. The contents of the block are parsed as simple text. Note that the title can be of arbitrary length or even absent and may contain characters like line feed and various control codes. Implementations are not required to display them as such and may instead prefer to display the title, or its prefix up to a certain length if it's too long, as a single line with all whitespace collapsed even after resolving escape sequences. While a title is recommended, a document is not required to have one. Implementations may display that as an empty title (or not show a title at all) or an implementation-defined placeholder or content excerpt of their choice. text fmt **Example:** raw text/cnm title This is a document title. text fmt __**TL;DR:** Simple text. May be very long or not present at all. Make sure to handle e.g. newlines. section links text fmt The ``links`` block can contain an arbitrary number of hyperlinks, which are intended to be a page-wide list of links to relevant parts of the website or other websites. The block contents are parsed in block mode. Each block inside the contents of the ``links`` block should have a URL as the block name and the hyperlink text as the block arguments joined with spaces. If the argument is not present or empty, the hyperlink name is set to the hyperlink URL. The contents of the URL block are parsed as simple text and represent a link description, which may be optionally displayed by the interactive client (for example, as a title that appears on mouse-over or a footnote), but may as well be hidden. Links with missing URL (blank block name) are skipped. text fmt **Example:** raw text/cnm links /example Clicking this link leads to /example. /test The above link has no explicit title, so "/test" is used instead. However, it has a description. Despite the empty line, it's displayed as a single line. cnp://example.com/ Links can also be absolute URLs. text fmt __**TL;DR:** Block mode. Contains nested blocks with URL in name, link text in argument and description in simple text contents. section site text fmt The ``site`` block represents a sitemap. It is used to show a hierarchical tree of the current site. The block contents are parsed in block mode. Each block inside the site block should have a filename or filepath as the block name, which represents the path on the current site. The arguments, joined together with spaces, are an optional name of the path that is used as the hyperlink text; if not provided, then the path should be used as the name. The contents of each block are parsed in block mode and recursively contain other path blocks. The path blocks represent an absolute hierarchical filepath within the current site. Each block represents a hyperlink to a certain page. To construct the entire filepath for a specific path block, prepend a slash to its name and the name of every parent block all the way to the site block itself, then join them together into a single string. If a block path contains slashes, it represents several levels of directories; path composition rules are unchanged. If a block path has a trailing slash, it should be preserved in the filepath. The final filepath represents a relative URL based on the document root of the current site. The client should display these as a list or tree of hyperlinks for navigating the current site. It may assume that a node whose path matches the current page's location is the current page (e.g. shows it in a different color, or shows all other nodes collapsed, etc.). The order of nodes should not be changed and nodes with duplicate path or name should be kept as-is. Sitemap entries with missing path argument are skipped. text fmt **Example:** raw text/cnm site foo This is a link to /foo bar And this to /foo/bar baz/quux This one leads to /foo/baz/quux test And this to /foo/bar/baz/quux/test baz quux Above link uses "baz" as the name. test2 This leads to /foo/baz/quux/test2 cnp://example.com/ This leads to /cnp:/example.com/ text fmt __**TL;DR:** Block mode. Contains recursive block mode blocks with paths as names and hyperlink text as descriptions. Join the names from the root site block to the selected child node into a filepath. section content text fmt The ``content`` top-level block contains the entire body of the document. All of content's child blocks represent the document content. The block contents are parsed in block mode. The meaning of each child block depends on its name. The following content blocks are currently defined: section section text fmt The ``section`` block represents a division of the contents with an optional title. The contents of the section block are parsed in block mode and can be arbitrary content blocks. If the block has arguments, they are joined together with spaces and represent the section title. The section title is displayed as a heading and can be used as a content selector inside the document. Nested sections with titles represent subsections. A section without a title groups the child blocks together without counting as a section (e.g. no table of contents entry). An example use of that is putting multiple text blocks into a list item. As a direct child of the ``content`` or ``section`` block, a title-less section does nothing and is equivalent to a document that has its child blocks directly inside the parent block in the place of the section block. text fmt **Example:** raw text/cnm content section Section name goes here. text fmt __**TL;DR:** Group of content blocks with a heading. section text text fmt The ``text`` block represents text contents. It is parsed in raw text mode, with additional formatting being applied on top depending on the block arguments. The ``text`` block can be specified with a text format mode as the first argument. The format may be used to add rich text formatting. Currently, there are three text format modes defined: ``plain``, ``pre`` and ``fmt``. If the block argument is empty, the ``plain`` format is used. Contents of blocks with unknown format modes can be parsed as if they were ``raw`` blocks. text fmt __**TL;DR:** Contains text. Formatting depends on argument. section text plain text fmt The ``text plain`` block represents plain text content. It consists of a sequence of paragraphs of simple text. Since it's the default mode for the ``text`` block, using the ``plain`` argument is not necessary. A paragraph is a sequence of consecutive nonempty lines of simple text. A paragraph ends with an empty line or the end of the text block. When displaying paragraphs, spacing should be added between them (such as some padding or a blank line). Escaped line feeds in the text itself do not have this spacing. text fmt **Example:** raw text/cnm content text This is a paragraph of text. This sentence is in the same line as the above. This one, however, is a new paragraph.\n And the escaped line break above splits this sentence into a new line, but not a new paragraph. This is joined by single spaces. text fmt __**TL;DR:** Contains paragraphs of simple text and escape sequences. section text pre text fmt The ``text pre`` block represents preformatted plain text content. The ``text pre`` block contents are parsed the same way as a ``raw`` block's, except that simple text escape sequences are still resolved and no syntax highlighting should be done. Whitespace is left untouched and the whole text block is just a single paragraph regardless of blank lines (which are simply literal line feeds). text fmt **Example:** raw text/cnm content text pre This is the first line. This is on a new line. This sentence is\non two lines. The above line is empty, but not a paragraph. This line contains triple spaces. text fmt __**TL;DR:** Contains preformatted raw text and escape sequences. section text fmt text fmt The ``text fmt`` block represents text that contains simple inline formatting. First, the text block is split into paragraphs the same way as a plain text block, with whitespace collapsed as in simple text. After that, the CNMfmt formatting is applied to each paragraph. Finally, escape sequences (including CNMfmt specific ones) are resolved. See the @@#/The\ CNMfmt\ inline\ formatting\ submarkup CNMfmt@@ section below for more information. text fmt **Example:** raw text/cnm content text fmt This is **emphasized**, __alternate__, ``code``, ""quoted"" and @@/ a hyperlink to /@@. **emphasized __emphasized+alternate **alternate ""alternate+quoted still alternate+quoted **alternate+quoted+emphasized This is no longer emphasized, alternate, or quoted. It is also a new paragraph containing a single line without formatting. @@# This link contains **emphasized** text. **@@# This hyperlink is emphasized,**@@ but this text isn't. text fmt __**TL;DR:** Contains paragraphs of text containing inline CNMfmt formatting. section raw text fmt The ``raw`` block represents preformatted text contents. The block contents are parsed in raw mode. When possible, the contents should be displayed with a monospaced font with all whitespace preserved. If present, the first block argument represents the type of the contents. That should generally be the MIME type of the data or lowercased name of the language/syntax in the contents of the ``raw`` block (for example, ``text/html`` or ``html``, ``text/javascript`` or ``application/javascript`` or ``javascript``). When rendering the block contents, the type may be used to perform syntax highlighting. Note that, as in all other blocks, it's not possible to include leading or trailing blank lines in the ``raw`` block's contents. text fmt **Example:** raw text/cnm content raw this is not **emphasized** this is on a new line this line is \n all in one line above line contains characters "\" and "n" the above line was empty text fmt __**TL;DR:** Raw preformatted text. Argument is type name for optional syntax highlighting. section list text fmt The ``list`` block represents a list of items. The block contents are parsed in block mode and can contain arbitrary content blocks. Each child block represents one list item; several blocks can be grouped into a single item using a section block. The first block argument represents the list type. Currently, two list types are defined: ordered and unordered. Unordered lists are simple lists of items with e.g. bullet points. Ordered lists use Arabic numbers by default; currently, choosing alternate numbering style is not possible, but it may be added in the future. Nested unordered lists may use different bullet style, but are not required to. Nested ordered lists use the same style of numbering as the parent one; nested numbering style may be configurable in future versions of CNM. Ordered lists always start with 1. text fmt **Example:** raw text/cnm content list text This is the first item. text Second item. section text Third item. text Still third item. list text Nested list, item 4.1. text fmt __**TL;DR:** List of content blocks. Argument: ordered or unordered. Ordered always starts with 1. section table text fmt The ``table`` block represents two-dimensional tabular data. The contents are parsed in block mode. A table can contain two different types of blocks: ``header`` and ``row``. The ``header`` and ``row`` blocks both act like a section block without an argument: they can contain arbitrary content blocks. Each of their child blocks represents one table cell; to group multiple blocks into one cell, a ``section`` block without a title can be used. The width of the table depends on the longest header or row. Any headers or rows with less cells than that are padded with empty cells on the right side. Currently, there is no support for multi-column or multi-row cells. section header text fmt The ``header`` block represents a table header row. It is parsed the same way as a ``section`` block without a title and can contain arbitrary content blocks. Each child block represents a column header cell. The ``header`` block represents a row with table headers. It should be displayed in a more emphasized manner and, optionally, allow sorting all follow-up rows until the next header or the end of the table by columns. A table is not required to start with a header, nor to include one at all. section row text fmt The ``row`` block represents a table data row. It is parsed the same way as a ``section`` block without a title and can contain arbitrary content blocks. Each child block represents a table body cell. The ``row`` block represents a row the table contents. text fmt **Example:** raw text/cnm content table header text Header of column 1 text Header of column 2 text Header of column 3 row text Row 1 column 1 text Row 1 column 2 row text Row 2 column 1 text Row 2 column 2 row section text Row 3 column 1 text Still Row 3 column 1 text Row 3 column 2 text Row 3 column 3 Row 1 column 3 and row 2 column 3 are empty. text fmt __**TL;DR:** Contains headers and rows. Child blocks of these are cells. section embed text fmt The ``embed`` block is used to embed external content into the document. The first block argument represents the MIME type of the embedded content. It can be used by the user agent to decide how to handle it. Graphical browsers are recommended to display at least common image types (e.g. ``image/png``, ``image/jpeg``, ``image/webp`` and ``image/svg+xml``) inside the page by default. An empty argument or invalid MIME type can be treated as an application/octet-stream type and not be embedded. The second argument is the URL pointing to the embedded content. An embed block without a URL should be ignored. The URL may also be a data URI. The contents of the block are parsed in simple text mode and represent the description of the embedded content. If present, the description can be displayed as e.g. a caption, mouse-over title, placeholder when the content cannot be embedded, etc., but may as well be hidden. If the content type is unknown or cannot be embedded within the page, the embedded content should be presented as a hyperlink instead. text fmt **Example:** raw text/cnm content embed image/png /static/example.png This is an embedded image's caption/title/hover text. text fmt __**TL;DR:** Argument is MIME type and URL, contents are description. Embed inside page if possible, otherwise provide hyperlink. section Selectors text fmt CNM selector queries can be used to identify specific sections in a CNM document. Selectors can be used to select a section in the document (e.g. to move an open document so that it's visible) or filter a document to only show certain sections and their content. section Section selector text fmt A section selector query identifies a specific section in the document. It's usually used in the hash fragment part of a URL to move the visible document to the named section. Section title selectors are case-sensitive. Section selectors can select sections either by a section title, a path of section titles or a path of section indices. A section without a title does not count as a section and cannot be selected by section selectors; any mention of sections in the specification of selectors refers exclusively to sections with non-empty titles. A section with an empty title can essentially be regarded as a generic container block. section Title selector raw #{title} text fmt The title selector selects the first section with the given title (``{title}``) in the document. The section order is defined by their vertical position; block depth is irrelevant. If multiple sections in the document have the same title, this selector only selects the first one. The title must use URL percent-encoding where at least the slash character (``U+002F``) is encoded into ``%2F`` or ``%2f``. An empty title matches the top of the document contents. Note that the ``#`` character (``U+0023``) in the selector is not the same as the one separating the URL hash fragment. An example URL with a title selector is @@cnp://example.com/file.cnm##title@@. section Title path selector raw /{path} text fmt The title path selector selects a section based on a path of section titles. The ``{path}`` part of the query consists of zero or more section titles (escaped just like in the title selector) separated by a single slash character. Each title in the path selects a section using the same method as the title selector, but only considers sections that aren't a child block of another section in the current context (are accessible from the current context without passing through another section). The initial context is the top-level ``content`` block. Each time a section in the path is matched, the new context becomes this section's contents. If any part of the path fails to find a matching section, the query does not match anything. An empty path matches the top of the document contents. An empty title in a non-empty path does not match anything. section Index path selector raw ${indices} text fmt The index path selector selects a section based on a path of section indices. The ``{indices}`` part of the query is a dot-separated path of zero or more section indices represented by decimal numbers. Each index in the path selects a section within the current context (as in the title path selector). The first section has the index 1. If any index in the path is zero or higher than the number of the sections in its context, the query does not match anything. An empty path matches the top of the document contents. section Content selector text fmt A content selector is a selector that selects a subset of the document contents based on a section. The content selectors have the same syntax as the section selectors, but may be optionally prefixed with an exclamation mark (``U+0021``) for a shallow selector. Using a content selector query on a document returns a new document consisting of only the named section, all of its contents and all parent block names up to the top-level without any of their sibling blocks or other contents. A shallow selector selects a similar document, but excludes the contents of any child sections of the selected section (the section block name lines and any non-section blocks with their contents are kept). For the cases where a specific selector selects the top of the document contents, the entire ``content`` block with all of its contents is selected (or, in the case of a shallow selector, without child section contents). An empty content selector selects the entire document with all of its contents, including non-``content`` top-level blocks, unmodified (though the actual document may be recomposed, as long as the contents aren't changed). A content selector consisting only of the shallow selector modifier ``!`` selects the same document, but without the contents of any sections. section Examples text fmt Example CNM document: raw text/cnm title Test content section A text T1 section B text T2 list text T3 section C text T4 section C text T5 list section text T6 section E text T7 text T8 section E text T9 text fmt Section selectors: list text fmt ``#A`` selects the section ""A"" containing the text ""T1"", section ""B"" and section ""C"". text fmt ``#C`` selects the section ""C"" containing the text ""T4"". text fmt ``#F`` does not select anything. text fmt ``/A`` selects the section ""A"" containing the text ""T1"", section ""B"" and section ""C"". text fmt ``/A/B/C`` selects the section ""C"" containing the text ""T4"". text fmt ``/A/C`` selects the section ""C"" containing the text ""T5"". text fmt ``/E`` selects the section ""E"" containing the text ""T7"". text fmt ``/B`` does not select anything. text fmt ``$1`` selects the section ""A"" containing the text ""T1"", section ""B"" and section ""C"". text fmt ``$2`` selects the section ""E"" containing the text ""T7"". text fmt ``$3`` selects the section ""E"" containing the text ""T9"". text fmt ``$1.1.1`` selects the section ""C"" containing the text ""T4"". text fmt ``$1.3`` does not select anything. text fmt Content selectors: list section text fmt ``#C`` selects the following document: raw text/cnm content section A section B list section C text T4 section text fmt ``!/A`` selects the following document: raw text/cnm content section A text T1 section B section C section text fmt ``!/`` selects the following document: raw text/cnm content section A list section text T6 section E text T8 section E section text fmt ``!`` selects the following document: raw text/cnm title Test content section A list section text T6 section E text T8 section E section The CNMfmt inline formatting submarkup text fmt The CNMfmt markup is used within ``text fmt`` content blocks to provide inline formatting of text. CNMfmt extends the CNM ``text plain`` block by introducing toggles of various format options. These toggles consist of two symbol characters. If the format of the toggle is currently not in effect, the toggle enables it. Otherwise, the format is disabled. Formats do **not** have to be toggled in LIFO order. All formats are implicitly closed with the end of the paragraph. The following toggles and formats are currently defined: raw ** emphasized __ alternate `` code "" quotation @@ hyperlink section Emphasized text fmt The **\*\*emphasized\*\*** format indicates emphasized text. It uses two asterisks (``\*\*``) as the toggle. The usual way to style emphasized text is with a bold font, but implementations may choose to use a different style. section Alternate text fmt The __\_\_alternate\_\___ format indicates text in an alternate voice that is offset from the normal text. It uses two underscores (``\_\_``) as the toggle. The usual way to style alternate text is with an italic font, but implementations may choose to use a different style. section Code text fmt The contents of the ``\`\`code\`\``` format represent computer code or similar text that is usually not in a spoken language. It uses two grave accents (``\`\```) as the toggle. Note that whitespace in this tag is **not** preserved; it is collapsed the same way as in the rest of the ``text fmt`` block. Code should be displayed in a monospaced font, if possible. section Quote text fmt The ""\"\"quote\"\""" format represents a quotation. It uses two quote marks (``\"\"``) as the toggle. The usual way to style quoted text is to include quote marks on the beginning and end and/or frame it, but implementations may choose a different style. section Hyperlink text fmt The @@cnp://example.com/ \@\@cnp://example.com/ hyperlink\@\@@@ format represents an inline hyperlink. It uses two at signs (``\@\@``) as the toggle. The hyperlink consists of two parts: the URL and the link text. The URL is the first non-whitespace word inside the formatted text. The URL does not contain any CNMfmt toggles excluding ``\@\@``, which ends the entire hyperlink format (for example, if a ``\_\_`` appears inside the URL, it does not toggle the alternate format). Note that the URL can still contain CNM simple text and CNMfmt escape sequences; these can be used to supply Unicode characters and spaces instead of manually percent-encoding the URL. If the hyperlink format consists of more than one word, the remainder of the content is used as the hyperlink text. It may contain arbitrary CNMfmt formatting. If the link text is blank, the URL is used as link text instead. text fmt Any other sequences of two symbols stand for themselves as text. The CNMfmt markup also includes several new escapes alongside the standard CNM ones to allow including the toggle characters as text: raw "\*" -> U+002A asterisk "\_" -> U+005F underscore "\`" -> U+0060 grave accent "\"" -> U+0022 quotation mark "\@" -> U+0040 at sign site spec cnm0.4 links /spec/ Specifications /doc/ Documents /draft/ Drafts /lib/ Libraries /util/ Tools and utilities