Grammatical Inference

27 Feb 2017 16:30

Meaing: inferring the rules of a formal language (its grammar) from examples, positive or negative. I'm mostly interested in the positive case, since I want to describe physical processes as though they were formal languages or (what is equivalent) automata.

— A result which has recently come to vex me is the fact that even finite automata are, in the computational learning theory sense, hard to learn. It's widely believed, but not proved, that common cryptographic systems are hard to crack, meaning that there are no efficient, polynomial-time algorithms for them (unless you have the key, of course...). It turns out that the ability to learn polynomially-sized deterministic finite automata in polynomial time would imply the ability to defeat RSA in polynomial time, which is a pretty good indication that it can't be done. (See Kearns and Vazirani for a very nice discussion.) Initial results were about uniform distributions over words, but it turns out further that this holds even when the distribution of words is generated by a stochastic automaton of the same form.

This is extremely annoying to me, because I want to learn stochastic automata from time series, and to do so in polynomial time (if not better). One possibility is that the time-complexity is just bad, in the worst case, and there is nothing to be done about this. (I fear this might be true.) This would not, however, address the scaling of prediction error and confidence with sample size, which is really what interests me. The other possibility is that I am interested in a rather different set-up than this, one where it's crucial that, as n grows, we see the continuation of a single sample path from a fixed automaton. (In the language, every allowed word is the prefix of infinitely many other allowed words.) Or: the experiment can go on forever. (In fact I'm often interested in the situation where the experiment could have been running forever, so every word is the suffix of infinitely many words.) I think, though I am not sure, that the ability to infer these languages quickly would not cause cryptographic horrors, because I think this restriction breaks the crypto-to-DFA mapping. Computational Learning Theory; Computational Mechanics; Linguistics; Machine Learning, Statistical Inference and Induction; Transducers

    To read:
  • Hendrik Blockeel, Robert Brijder, "Non-Confluent NLC Graph Grammar Inference by Compressing Disjoint Subgraphs", arxiv:0901.4876
  • Andreas Blume, "A Learning-Efficiency Explanation of Structure in Language", Theory and Decision 57 (2004): 265--285
  • Miguel Bugalho and Arlindo L. Oliveira, "Inference of regular languages using state merging algorithms with search", Pattern Recognition 38 (2005): 1457--1467
  • Francisco Casacuberta, Enrique Vidal and David Picó, "Inference of finite-state transducers from regular languages", Pattern Recognition 38 (2005): 1431--1443 ["Given a training corpus of input-output pairs of sentences, the proposed approach uses statistical alignment methods to produce a set of conventional strings from which a stochastic finite-state grammar is inferred. This grammar is finally transformed into a resulting finite-state transducer."]
  • Alexander Clark, Christophe Costa Florencio and Chris Watkins, "Languages as hyperplanes: grammatical inference with string kernels", Machine Learning 82 (2011): 351--373
  • Alexander Clark, Rémi Eyraud, Amaury Habrard, "Using Contextual Representations to Efficiently Learn Context-Free Languages", Journal of Machine Learning Research 11 (2010): 2707--2744
  • Shay B. Cohen and Noah A. Smith
  • Trevor Cohn, Phil Blunsom, Sharon Goldwater, "Inducing Tree-Substitution Grammars", Journal of Machine Learning Research 11 (2010): 3053--3096
  • P. Collet, A. Galves and A. Lopes, "Maximum Likelihood and Minimum Entropy Identfication of Grammars," Random and Computational Dynamics 3 (1995): 241--250
  • Colin de la Higuera, "A bibliographical study of grammatical inference", Pattern Recognition 38 (2005): 1332--1348
  • C. de la Higuera and J. C. Janodet, "Inference of \omega-languages from prefixes", Theoretical Computer Science 313 (2004): 295--312 [abstract]
  • Francois Denis, Aurelien Lemay and Alain Terlutte, "Learning regular languages using RFSAs", Theoretical Computer Science 313 (2004): 267--294 [abstract]
  • Francois Denis, Yann Esposito and Amaury Habrard, "Learning rational stochastic languages", cs.LG/0602062
  • Jeroen Geertzen, "String Alignment in Grammatical Inference: what suffix trees can do" [PDF]
  • C. Lee Giles, Steve Lawrence and Ah Chung Tsoi, "Noisy Time Series Prediction Using Recurrent Neural Networks and Grammatical Inference," Machine Learning 44 (2001): 161--183
  • James Henderson, Ivan Titov, "Incremental Sigmoid Belief Networks for Grammar Learning", Journal of Machine Learning Research 11 (2010): 3541--3570
  • Mark Johnson, Thomas L. Griffiths and Sharon Goldwater, "Adaptor Grammars: A Framework for Specifying Composition Nonparametric Bayesian Models", NIPS 19 [But containing significant typos; see version at Johnson's website]
  • Bill Keller and Rudi Lutz, "Evolutionary induction of stochastic context free grammars", Pattern Recognition 38 (2005): 1393--1406
  • Dan Klein and Christopher D. Manning, "Natural language grammar induction with a generative constituent-context model", Pattern Recognition 38 (2005): 1407--1419 ["We present a generative probabilistic model for the unsupervised learning of hierarchical natural language syntactic structure. Unlike most previous work, we do not learn a context-free grammar, but rather induce a distributional model of constituents which explicitly relates constituent yields and their linear contexts.... [Gets the] best published unsupervised parsing results on the ATIS corpus...."]
  • Leo Kontorovich, John Lafferty and David Blei, "Variational Inference and Learning for a Unified Model of Syntax, Semantics and Morphology" [Abstract, PDF]
  • Steffen Lange and Thomas Zeugmann, "Incremental Learning from Positive Data," Journal of Computer and System Sciences 53 (1996): 88--103
  • S. M. Lucas and T. J. Reynolds, "Learning Deterministic Finite Automata with a Smart State Labeling Evolutionary Algorithm", IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (2005): 1063--1074
  • Marcelo A. Montemurro and Pedro A. Pury, "Long-range fractal correlations in literary corpora," cond-mat/0201139 [The paper doesn't consider grammars, but it's an effect which grammatical inference needs to be able to handle]
  • Katsuhiko Nakamura and Masashi Matsumoto, "Incremental learning of context free grammars based on bottom-up parsing and search", Pattern Recognition 38 (2005): 1384--1392
  • Partha Niyogi
    • The Informational Complexity of Learning: Perspectives on Neural Networks and Generative Grammars [How many licks does it take to get to the core of a context-free grammar, Uncle Noam?]
    • The Computational Nature of Language Learning and Evolution [Blurb]
  • Arlindo L. Oliveira and Joao P. M. Silva, "Efficient Algorithms for the Inference of Minimum Size DFAs," Machine Learning 44 (2001): 93--119
  • David Pico and Francisco Casacuberta, "Some Statistical-Estimation Methods for Stochastic Finite-State Transducers," Machine Learning 44 (2001): 121--141
  • Paul Prasse, Christoph Sawade, Niels Landwehr, Tobias Scheffer, "Learning to Identify Regular Expressions that Describe Email Campaigns", arxiv:1206.4637
  • Detlef Prescher, "A Tutorial on the Expectation-Maximization Algorithm Including Maximum-Likelihood Estimation and EM Training of Probabilistic Context-Free Grammars", cs.CL/0412015
  • Juan Ramón Rico-Juan, Jorge Calera-Rubio and Rafael C. Carrasco, "Smoothing and compression with stochastic k-testable tree languages", Pattern Recognition 38 (2005): 1420--1430
  • Peter Rossmanith and Thomas Zeugmann, "Stochastic Finite Learning of the Pattern Languages," Machine Learning 44 (2001): 67--91
  • Yasubumi Sakakibara
  • Muddassar A. Sindhu, Karl Meinke, "IDS: An Incremental Learning Algorithm for Finite Automata", arxiv:1206.2691
  • Patrick Suppes, Language for Humans and Robots
  • J. L. Verdu-Mas, R. C. Carrasco and J. Calera-Rubio, "Parsing with Probabilistic Strictly Locally Testable Tree Languages", IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (2005): 1040--1050
  • Sicco Verwer, Mathijs de Weerdt and Cees Witteveen, "Efficiently identifying deterministic real-time automata from labeled data", Machine Learning 86 (2012): 295--333

  • Notebooks: