syntactics.txt 0.2.1 UTF-8 2024-01-29 *---|----1----|----2----|----3----|----4----|----5----|----6----|----7----|--* The Miser Project ================= PHRASE-STRUCTURE GRAMMARS AND BNF NOTATION ------------------------------------------ Formal-grammar systems provide for precise specification of (usually-text) languages in terms of their syntactic structure. These can be used along with a formal semantics for all grammatically-valid statements, providing precise and rigorous determination of the intended meaning. This is part of nailing down programming-language definitions for precise realization in operational computer software. This note provides background and definitions that are exploited in establishing the formal grammars introduced in the Miser Project. See . 1. GRAMMARS AND PARSERS Here, grammars are taken as descriptions of the structure of texts in a language, and how those structures are interpreted -- the semantics of the texts. The grammar establishes the acceptable syntax of texts and their interpretation. A parser and a parsing system are related to grammars and their semantics, but grammars do not directly specify parsers. Grammars do constrain what any parser shall accept and how it conforms to an intended semantics. Although it is commonplace in computer science to blur grammars and parsers together, for oFrugal parsing will be distinct, though shown to be faithful to the formal syntactics. 2. PHRASE-STRUCTURE GRAMMARS IN BACKUS-NAUR FORM (BNF) Phrase-structure grammars were introduced by Nohm Chomsky as part of a hiearchy of grammar formalizations below what one might require in formalizing the grammar of a natural language [Chomsky1956]. In later treatment[Chomsky1959], phrase-structure grammars are identified as type 2, context-free, grammars; the terms are interchangeable. The term "context-free" has a precise and very narrow meaning. To avoid confusion with any informal idea of context, "phrase-structure" is preferred Here. BNF (now short for Backus-Naur Form) notation for expressing phrase-structure grammars exploded in the 1960s after inspiration by [Backus1958] and adaptation for ALGOL 60 [Naur1963]. The idea that one could specify a programming language in a formal manner and derive a parser in a mechanical way was compelling. BNS is commonly used in the specification of programming languages and other digital-computer textual forms, additional constraints given as side-conditions. 3. BNF NOTATION FOR THE MISER PROJECT *** TO BE EXPANDED *** NOTATION TO BE SPECIFIED ******************** *** We will use phrase categories instead of non-terminals, but the idea is the same. In particular, when there are no non-terminals in a sequence (i.e., only terminals), that is a terminal phrase. There is no further expansion available. *** The concrete reference syntax consists of Unicode, usually in UTF8 encoding, and a %-escape mechanism is used when a character is not printable. It is recommended that characters that are not easily typed be avoided by authors of ob expressions, recognizing that a recipient may encounter %-encoding instead. If you don't know what this is about, you are probably on safe ground. [SEE HOW INTERNATIONALIZATION OF C LANGUAGE IDENTIFIERS IS HANDLED] Basically, white-space is allowed wherever it does not change the grammar of a 〈term〉 and required where consecutive 〈term〉 occurrences would not be disambiguated. 4. REFERENCES [Backus1958] Backus, J. W. The syntax and semantics of the proposed International Algebraic Language of the Zurich ACM-GAMM Conference. Proc. International Conference on Information Processing. UNESCO (Paris: June 1959) pp. 125-132. Available on the Internet at [Chomsky1956] Chomsky, Noam. Three models for the description of languages. IRE Transactions on Information Theory 2, 3, 113-124. . Available on the Internet at . Seeking a model applicable to natural languages, Chomsky identifies finite-state languages (now type 3: regular expressions), phrase- structure languages (now type 2: context free), and transformational languages, the last having his focus. [Chomsky1959] Chomsky, Noam. On certain formal properties of grammars. Information and Control 2, 2 (1959), 137-167. . Here Chomsky's models are formalized more fully, establishing a hierarchy from recursively-enumerable (type 0, uninteresting for linguistics) and descending into strictly-tighter subsets with context-sensitive (type 1), context-free type 2), and finite state (type 3). There is a layer (recursive languages) within type 0 and containing type 1, so all of types 1-3 are recursive and hence decidable. These are generative grammars: production of grammatical strings is described, not to be confused with notation-borrowing schemes for the parsing of such text strings. [Duncan1967] Duncan, Fraser G. Notational abbreviations applied to the syntax of ALGOL. Section AB26.3.5 of ALGOL Bulletin 26 (August 1967), pp.28-32. Available on the Internet at . This paper made use of nested 〈 ... 〉 in a peculiar way, but it inspired me to use it to use productions to generate naming of phrase categories, providing a vocabulary grammar and generic arrangements, including moving between concrete cases such as 〈integer〉 and 〈〈integer〉-length-string〉. This is not needed here. I just want this reference set to be complete wherever I use it. [Ershov1963] Ershov, A.P., Kozhukhin, G.I., Voloshin, Yu.M. The Input Language for a System of Automatic Programming. Computation Center, Academy of Sciences of the USSR (Moscow, 1961, Russian); Academic Press (London, New York: 1963). This provided a type of generic context-free production that inspired some of this work, although not relied upon in the ob-exp grammar. [Gorn1963] Gorn, Saul. Detection of Generative Ambiguities in Context-Free Mechanical Languages. J. ACM 10, 2 (April 1963), 196-208. . Section 1, Introduction, offers a precise definition of context- free (mechanical) languages along with important considerations such as their decidability. Although it is undecidable, in general, whether a given context-free grammar is ambiguous, it is possible to determine when a particular statement is by relying on a parser such as that of [Cheatham1964]. [Naur1963] Naur, Peter (ed.)., Backus, J.W., Bauer, F.L., Green, J., Katz, C., McCarthy, J., Perlis, A.J., Rutishauser, H., Samelson, K., Vauquois, B., Wegstein, J.H., Van Winjgaarden, A., Woodger, M. Revised Report on the Algorithmic Language ALGOL 60. Comm. ACM 6, 1 (January 1963), 1-17. DOI . Although the 1960 publication is often cited, corrections were called for, leading to the 1963 Revised Report. Some ambiguities in the grammar were repaired. [Wikipedia2022a] Wikipedia. Context-Free Grammar, article version of 2022-06-05T09:33 accessed at . [Wikipedia2022b] Wikipedia. Pāṇini, article version of 2022-07-14 accessed at . *---|----1----|----2----|----3----|----4----|----5----|----6----|----7----|--* Copyright 2018-2024 Dennis E. Hamilton Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. *---|----1----|----2----|----3----|----4----|----5----|----6----|----7----|--* ATTRIBUTION Hamilton, Dennis E. Phrase-Structure Grammars and BNF Notation. Miser Project text file syntactics.txt version 0.2.01dated 2024-01-29, available on the Internet as a version of TODO * Note that this is about grammar and the interpretation mechanism. It is not the grammar specification for oFrugal. That's a different matter. * Revealing examples are needed here. Grace Hopper favored that and I should heed it for that reason alone. [This applies more to ob-exp.txt and oFrugalese.txt though.] * Cite other forms of BNF, such as the EBNF used at IETF, others. * Add the handling of concrete cases and the tie-in to a reference character code, akin to the counterpart in the XML specification. * Use the XML 1.0 (Namespaces) characterization of white space, letters, digits, and the restricted allowance of other characters in ob-exp terms. Provide a reference. * It is necessary to make clear that these are generative grammars and there should be some links about those. The Saul Gorn clarity would be useful. * Identify Pattern Languages, in Alexander's formulation, as similarly linguistical, though conceptually different (i.e., not about texts) * Find the material on Post's production system and also its use in the the material referenced by John W. Carr, III. * https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form may be useful. I notice I am skimping and not dealing with white space and optional white space. I can't ignore that in the lexical rules. * There can be pragmatics about some things. I just had a thought about that and now I have lost the case that inspired this remark. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0.2.1 2024-01-29T20:24Z Move unused references on compiler compiling to mindelay.txt 0.2.0 2024-01-29T17:24Z Repurpose from 0.1.0 grammars.txt. 0.0.0 2022-08-11T17:36Z Place-holder and boiler-plate extracted from ob-exp.txt 1.0.1 and mindelay.txt *** end of cs/syntactics.txt ***