McKeeman Form

Douglas Crockford
2017-07-08

McKeeman Form is a notation for expressing grammars. It was proposed by Bill McKeeman of Dartmouth College. It is a simplified Backus-Naur Form with significant whitespace and minimal use of metacharacters.

Grammar

We can express the grammar of McKeeman Form in McKeeman Form.

A grammar is a list of one or more rules. Each of the rules is separated by a newline.

grammar rules

rules rule rule newline rules

A rule has a name on one line, with alternatives indented below it. Each alternative is on its own line.

rule name newline empty alternatives

alternatives alternative alternatives

A name is sequence of letters.

name letter letter name

letter 'a' . 'z' 'A' . 'Z' '_'

The Unicode code point U+000A is used as the newline.

newline '000a'

If the first line after the name of a rule is '', then the rule may match nothing. The empty rule says that a '' may optionally appear before the first of the alternatives.

empty '' fourspaces "''" newline

Each alternative is indented four spaces. Each alternative contains one or more items followed by a newline.

alternative fourspaces items newline

fourspaces space space space space

The Unicode code point U+0020 is used as the space.

space '0020'

An item is a literal or the name of a rule.

items item item space items

item literal name

There are two forms of literal. Single quotes can specify a single character, or a single character from a range of characters. Double quotes can specify multiple characters.

literal ''' single ''' range '"' characters '"'

Any Unicode code point except the 32 control codes may be placed within the single quotes. The hexcode of a Unicode code point may also be placed within the single quotes. A hexcode can contain 4, 5, or 6 hexidecimal digits.

single ' ' . '10ffff' hexcode

hexcode "10" hex hex hex hex hex hex hex hex hex hex hex hex hex

hex '0' . '9' 'A' . 'F' 'a' . 'f'

A range is specified with a period followed by another single character. It can optionally be followed by minus signs and characters to be excluded.

range '' " . '" single ''' exclude

exclude '' " - '" single ''' exclude

A character in double quotes can be any of the Unicode code points except the 32 control codes and the double quote character.

characters character character characters

character ' ' . '10ffff' - '"'

JSON

This is the JSON grammar in McKeeman Form.

json element

value object array string number "true" "false" "null"

object '{' ws '}' '{' members '}'

members member member ',' members

member ws string ws ':' element

array '[' ws ']' '[' elements ']'

elements element element ',' elements

element ws value ws

string '"' characters '"'

characters '' character characters

character '0020' . '10ffff' - '"' - '\' '\' escape

escape '"' '\' '/' 'b' 'n' 'r' 't' 'u' hex hex hex hex

hex '0' . '9' 'A' . 'F' 'a' . 'f'

number int frac exp

int digit onenine digits '-' digit '-' onenine digits

digits digit digit digits

digit '0' . '9'

onenine '1'.'9'

frac '' '.' digits

exp '' 'E' sign digits 'e' sign digits

sign '' "+" "-"

ws '' '0009' ws '000a' ws '000d' ws '0020' ws