Understanding grammar change: Digital resources and evolutionary modelling

Europe/London
Room G.05 (50 George Square)

Room G.05

50 George Square

50 George Square Edinburgh EH8 9LH
Description

Grammar change is affected by many interacting factors, from how we learn and use languages to their social functions. Learning about how grammar changes can help us understand all these factors.

The raw materials in studying grammar change are corpora: collections of texts annotated with grammatical information. By building models and combining them with corpus data, we can uncover the underlying causes of historical changes. However, current corpora are limited in what they can reveal: some are too small, some are poorly annotated, and English is over represented. Central challenges are how to build better corpus resources, and how to better use the resources we have.

This interdisciplinary workshop will address this challenge, bringing together experts in cutting-edge techniques (grammatical theory, natural language processing, mathematical modelling) from linguistics, physics, and cognitive science, to share ideas, develop best practices, and identify scope for developing better resources and new models.

Registration: Registration is now open. Please use the 'Registration' tab on the left. There is no fee to attend this event. Please note that registration will close at 2pm BST on Friday 31st May.

Conference Dinner: A dinner for speakers and delegates will be held on 4 June at Hotel Du Vin and the cost is £45. If you would like to attend the dinner you should indicate this when registering. Once you have submitted the workshop registration you must also visit this link to pay the fee for the meal, which will confirm your place at the dinner. (Note: to simplify expenses claims, we have badged this as a "conference fee". There is no conference fee to pay if you do not wish to participate in the dinner.)

Speakers: The following is a list of the confirmed speakers so far.

Venue: The conference will be held in two different venues as follows:

  • 4 June – Room G.05, 50 George Square
  • 5 June – Room G.03, Bayes Centre

Please see venue details in information box at the bottom of this page. 

If you require any additional information, please email sopa.events@ed.ac.uk
 

Organisers: Richard Blythe, Juan Guerrero Montero, Dan Lassiter and Robert Truswell.

Sponsors: We are grateful to the following sponsors for supporting this workshop:

    • 10:30 11:00
      Welcome 30m Room G.05

      Room G.05

      50 George Square

      50 George Square Edinburgh EH8 9LH
      Speakers: Richard Blythe (The University of Edinburgh), Juan Guerrero Montero (University of Edinburgh), Dan Lassiter (University of Edinburgh), Rob Truswell (University of Edinburgh)
    • 11:00 12:00
      Syntactic Planning, Informational Risk, and the Information Threshold 1h Room G.05

      Room G.05

      50 George Square

      50 George Square Edinburgh EH8 9LH

      This talk picks up on work conducted with colleagues on the project 'Constraints on the Adaptiveness of Information in Language (CAIL)’,
      which involved using information theory to analyze linguistic optionality and its cognitive scaffolding. Building on seminal work by Fenk and Fenk (1980, see also Fenk-Oczlon, 2001 and many subs), we suggest that linguistic planning is adapted for noise resistance. Specifically, speakers use whatever syntactic means are at their disposal in order to reduce the likelihood of catastrophic communication failure in the presence of noise. Thus, part of what motivates choice between syntactic alternatives is a type of risk mitigation.

      First, we use real and permuted sentences from the Penn-York Computer-annotated Corpus of a Large Amount of English to demonstrate that more uniform ordering of elements confers functional noise resistance. Secondly, data from syntactic change in English and Icelandic (using the Penn Parsed Corpora of Historical English and the Icelandic Parsed Historical Corpus) shows that speakers use the syntactic variants made available by change in progress to approach a certain target or threshold of information uniformity, a threshold that is conserved over historical time. We have updated some prior work in this area on the OV-to-VO changes in English and Icelandic with additional work on the interaction between OV/VO, V2, and constraints on information spread.

      Finally, I will present some data from the Penn-Helsinki Parsed Corpus of Early Modern English (Kroch et al 2004) and Penn Parsed Corpus of Modern British English (Kroch et al 2016) on the decline of DP topicalization in Late Early Modern English and its implications for information uniformity, carrying on the work of Speyer (2008, 2010). Surprisingly, object DP fronting appears to be on a trajectory of slow decline in modern English, quite independently of the well-known phrase structure changes in Middle and Early Modern English. I suggest that this is a "slow change" of the type described in Wallenberg (2016), and that fronted and in-situ orders are partially specialized along the continuous dimension of informational uniformity.

      The existence of an information threshold or target is expected if the human language faculty constantly tries to keep the risk of information loss below a certain amount with a certain probability (not unlike the financial notion of Value at Risk), but at the same time, cannot achieve perfect uniformity due to linguistic constraints.

      Speaker: Joel Wallenberg (University of York)
    • 12:00 13:30
      Lunch 1h 30m Room G.05

      Room G.05

      50 George Square

      50 George Square Edinburgh EH8 9LH
    • 13:30 14:30
      Ancestral State Reconstruction of grammatical traits - how can it be done? Case study from Oceania and new approaches 1h Room G.05

      Room G.05

      50 George Square

      50 George Square Edinburgh EH8 9LH

      Estimating linguistic pasts is difficult in general, and grammatical change appears harder to predict than lexical. In this talk, I disentangle the fundamental principles of the traditional Historical Linguistics (HL) toolkit and how it relates to computational approaches. I contrast the task of reconstruction in linguistics to that of Ancestral State Reconstruction in biology, highlighting, in particular, the difficulty of determining appropriate data for historical modelling (cf. Walkden 2013; Evans 2021). Many sources of data for grammatical change lack the structure that lexical material has of parts within words corresponding to parts in other words - and the words themselves are cognates. This kind of cognacy-pattern is hard to find in grammatical data. The dearth of such patterns and/or the difficulty finding them may warrant other ways of evaluating data such as the permanency of structures independent of their content (cf. Goddard 1994; Evans 2003; Ross 2004) or phylogenetic signal (Skirgård 2024). Furthermore, the dynamics of grammatical change are likely to be subject to pressures hitherto not modelled explicitly, such as those discussed in the workshop proposal: communicative need, social factors, and learning biases. There may also be influences from neurolinguistic processing (Bickel et al 2015) and demographic correlates (Wray & Grace 2007; Greenhill 2015; Raviv et al. 2019; Shcherbakova et al. 2023). Outlining several case studies, I want to illustrate that the existing toolkit in historical linguistics is currently underdeveloped to address grammatical change. I suggest possible approaches to proceed by.

      References

      Levshina, N. (2022). Communicative efficiency. Cambridge University Press
      Shcherbakova, O., Michaelis, S. M., Haynie, H. J., Passmore, S., Gast, V., Gray, R. D., ... & Skirgård, H. (2023). Societies of strangers do not speak less complex languages. Science Advances, 9(33), eadf7704.
      Wray, A., & Grace, G. W. (2007). The consequences of talking to strangers: Evolutionary corollaries of socio-cultural influences on linguistic form. Lingua, 117(3), 543-578.
      Bickel, B., Witzlack-Makarevich, A., Choudhary, K. K., Schlesewsky, M., & Bornkessel-Schlesewsky, I. (2015). The neurophysiology of language processing shapes the evolution of grammar: Evidence from case marking. PLoS One, 10(8), e0132819.
      Raviv, L., Meyer, A., & Lev-Ari, S. (2019). Larger communities create more systematic languages. Proceedings of the Royal Society B, 286(1907), 20191262.
      Evans, C. L., Greenhill, S. J., Watts, J., List, J. M., Botero, C. A., Gray, R. D., & Kirby, K. R. (2021). The uses and abuses of tree thinking in cultural evolution. Philosophical Transactions of the Royal Society B, 376(1828), 20200056.
      Skirgård, H. (2024). Disentangling Ancestral State Reconstruction in historical linguistics: Comparing classic approaches and new methods using Oceanic grammar. Diachronica.
      Walkden, G. (2013). The correspondence problem in syntactic reconstruction. Diachronica, 30(1), 95-122.
      Ross, M. D. (2004). The morphosyntactic typology of Oceanic languages. LANGUAGE AND LINGUISTICS-TAIPEI-, 5(2), 491.
      Goddard, I. (1993). Contamination in Algonquian Languages. In Historical Linguistics 1989: Papers from the 9th International Conference on Historical Linguistics, New Brunswick, 14 18 August 1989 (Vol. 106, p. 129). John Benjamins Publishing.
      Evans, B. E. (2003). A study of valency-changing devices in Proto Oceanic (Pacific linguistics; 539). Pacific Linguistics.
      Greenhill, S. J. (2015). Demographic correlates of language diversity. In The Routledge handbook of historical linguistics (pp. 557-578). Routledge

      Speaker: Hedvig Skirgård (Max Planck Institute for Evolutionary Anthropology)
    • 14:30 15:00
      Coffee 30m Room G.05

      Room G.05

      50 George Square

      50 George Square Edinburgh EH8 9LH
    • 15:00 16:00
      Understanding syntactic change: Constructing the infrastructure 1h Room G.05

      Room G.05

      50 George Square

      50 George Square Edinburgh EH8 9LH

      In order to understand syntactic change, it is useful to be able to mine parsed corpora for relevant data - the larger, the better. State-of-the-art parsers now parse ever larger amounts of text, but their output generally does not include information of interest to linguists, such as grammatical functions or empty categories. So parsed corpora that are manually annotated for such information remain important, and they will remain important even as automatic parsers improve, at least as long as those parsers require training data. It is also worth noting that manual annotation of corpora is sensible when the amount of text for a given language or language stage is relatively small.

      Of course, manual annotation has its own drawbacks - it is time-consuming and subject to human error. To some extent, these drawbacks can be addressed by what in business schools would be called best practices. In my presentation, I will present such tricks, methods, and general strategies as I have learned from constructing parsed corpora over the years in the hopes that they will prove useful and suggest further developments of their own to the workshop participants. If desired, I would be happy to discuss specific challenges faced by participants in particular cases of corpus construction.

      Speaker: Beatrice Santorini (University of Pennsylvania)
    • 16:00 17:00
      Prepositional phrases in historical corpora of English: Methodological and theoretical challenges 1h Room G.05

      Room G.05

      50 George Square

      50 George Square Edinburgh EH8 9LH

      This paper reports on a research project investigating prepositional phrases in verbal argument structure patterns in the history of English: Prepositional marking as a more analytic means of expression presumably increased in use over time (concurrent to a loss of morphological case marking and fixation of constituent order), with prepositions gradually taking on more grammatical, ‘core’ complement functions, such as e.g. recipient marking (Fennell 2001; Baugh & Cable 2002; Hawkins 2012; Szmrecsanyi 2012, 2016). The paper then presents a large dataset drawn from the well-known Penn-Parsed Corpora of Historical English (ca. 1150-1900; Kroch et al. 2000, 2004, 2010) and selected case studies on these developments. It highlights (a) general methodological challenges incurred by working with historical corpora, such as abundant spelling variation and limited corpus sizes (e.g. Trips & Percillier 2020), and (b) the specific methodological and challenges of dealing with the diachrony of prepositional phrases in English. I show that PPs – and especially PPs in historical data – are problematic in that e.g. distinguishing different types (adjuncts vs complements) or different semantic roles of PPs is a non-trivial task for manual classification, but also for more data-driven, automated ways of analysis based on recent developments in NLP (e.g. Merlo & Ferrer 2006; Hovy et al. 2010; Huang et al. 2020). Among other things, the paper outlines a pilot study of PP-classification making use of MacBERTh, an LLM pre-trained on historical English data (Manjavacas & Fonteyn 2022). Finally, I discuss how such methodological challenges and approaches can then inform (or are in turn informed by) theoretical assumptions about language change in general, and the development of English PPs in particular (see e.g. Hoffmann 2007 and Bergs 2021 for constructionist approaches to Present Day English PP-patterns).

      Baugh, A. & T. Cable. 2002. A history of the English language. 5th edn. London: Routledge.
      Bergs, A. 2021. Complements and adjuncts. In B. Aarts, A. McMahon & L. Hinrichs (eds.), The handbook of English linguistics. Hoboken, NJ: Wiley-Blackwell. https://doi.org/10.1002/9781119540618.ch9
      Fennell, B. A. 2001. A history of English: A sociolinguistic approach. Oxford: Blackwell.
      Hawkins, J. 2012. The drift of English towards invariable word order from a typological and Germanic perspective. In T. Nevalainen & E. C. Traugott (Eds.), The Oxford handbook of the history of English (pp. 622–632). Oxford: Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199922765.013.0053
      Hoffmann, T. 2007. Complements versus adjuncts? A construction grammar account of English prepositional phrases. Occasional Papers in Language and Linguistics (University of Nairobi) 3, 92-119.
      Hovy, D., S. Tratz & E. Hovy. 2010. What's in a preposition? Dimensions of sense disambiguation for an interesting word class. In Coling 2010: Poster Volume, 4554-4562.
      Huang, G., J. Wang, H. Tang & X. Ye. 2020. BERT-based contextual semantic analysis for English preposition error correction. Journal of Physics: Conf. Ser. 1693: 012115. https://doi.org/10.1088/1742-6596/1693/1/012115
      Kroch, A., A. Taylor & B. Santorini. 2000. The Penn-Helsinki Parsed Corpus of Middle English (PPCME2). Department of Linguistics, University of Pennsylvania, second edition, release 4. www.ling.upenn.edu/hist–corpora/PPCME2–RELEASE–3/index.html
      Kroch, A., B. Santorini & L. Delfs. 2004. The Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME). Department of Linguistics, University of Pennsylvania, first edition, release 3. https://www.ling.upenn.edu/hist-corpora/PPCEME-RELEASE-3/index.html
      Kroch, A., B. Santorini & A. Diertani. 2016. The Penn Parsed Corpus of Modern British English. http://www.ling.upenn.edu/ppche/ppche-release 2016/PPCMBE2-RELEASE-1
      Manjavacas, E. & L. Fonteyn. 2022. Adapting vs. pre-training Language models for historical languages. Journal of Data Mining & Digital Humanities jdmdh: 9152. https://doi.org/10.46298/jdmdh.9152
      Merlo, P. & E. Esteve Ferrer. 2006. The notion of argument in prepositional phrase attachment. Computational Linguistics 32(3): 341-378. https://doi.org/10.1162/coli.2006.32.3.341
      Percillier, M. & C. Trips. 2020. Lemmatising verbs in Middle English corpora: The benefit of enriching the Penn-Helsinki Parsed Corpus of Middle English 2 (PPCME2), the Parsed Corpus of Middle English Poetry (PCMEP), and A Parsed Linguistic Atlas of Early Middle English (PLAEME). In
      Proceedings of the 12th Language Resources and Evaluation Conference, 7170-7178. Marseille:European Language Resources Association. https://www.aclweb.org/anthology/2020.lrec-1.886
      Szmrecsanyi, B. 2012. Analyticity and syntheticity in the history of English. In T. Nevalainen & E. C. Traugott (eds). The Oxford handbook of the history of English, 654-665. Oxford: OUP. https://doi.org/10.1093/oxfordhb/9780199922765.013.0056
      Szmrecsyani, B. 2016. An analytic-synthetic spiral in the history of English. In E. van Gelderen (ed.), Cyclical change continued, 93-112. Amsterdam: Benjamins. https://doi.org/10.1075/la.227.04szm

      Speaker: Eva Zehentner (University of Zurich)
  • Wednesday, 5 June
    • 10:00 11:00
      Variational learning: Anatomy of an algorithm 1h G.03 (Bayes Centre)

      G.03

      Bayes Centre

      47 Potterow, Edinburgh, EH8 9BT

      Two decades after its introduction, the variational learner (Yang 2002) forms an essential part of the mathematically-minded diachronist's toolkit. This model of language acquisition, together with its inter-generational predictions about language change, has been applied to a wide array of phenomena ranging from word order change to morphological simplification. I begin this talk by reviewing the fundamental properties of variational learning, as traditionally understood. I then move on to discuss two extensions of the basic model: multi-population and multi-grammar variational learning dynamics, including an application to sociolinguistic typology. As a positive formal contribution, I present a sufficient condition for the global asymptotic dominance of a single grammar in multi-grammar competition.

      Speaker: Henri Kauhanen (University of Konstanz)
    • 11:00 11:30
      Coffee 30m Bayes Centre

      Bayes Centre

      47 Potterow, Edinburgh, EH8 9BT
    • 11:30 12:30
      Modelling grammar change as an evolutionary process 1h G.03 (Bayes Centre)

      G.03

      Bayes Centre

      47 Potterow, Edinburgh, EH8 9BT

      Evolutionary models of genetic drift have been successfully applied in the last decade to the analysis of diachronic change in corpus data. Amongst all of them, the Wright-Fisher model stands out as a simple but powerful paradigm that is mathematically equivalent to models of the cultural transmission of language, like Iterated Bayesian Learning and the Utterance Selection Model. Wright Fisher characterises language change as a process where different expressions (variants) realising the same function (meaning) compete against each other for usage in the speech community. Corpus data analyses using Wright-Fisher aim at detecting and quantifying the evolutionary forces (e.g. drift, selection, mutation) shaping this competition process, which can shed light on the underlying diachronic phenomena driving language change.

      This approach is limited in that it assumes isolation of the competition process for each function. In this work, we present an Iterated Bayesian Learning model of grammar change involving the co-evolution of interrelated functions and expressions that better reflects the complex interdependencies often present in language change. We show that this model is equivalent to a modified Wright-Fisher paradigm, and maps effects including learning biases, analogy and social preferences to evolutionary forces. This enables its application to hypothesis testing and model selection in the analysis of corpus data, which we illustrate through applications to the study of the evolution of relativisers in Middle and Modern English, and the emergence of periphrastic do in Early Modern English. Our results show that evolutionary models incorporating co-evolving functions are relevant towards our empirical and quantitative understanding of language change. The model we introduce is a promising first step towards this.

      Speaker: Juan Guerrero Montero (University of Edinburgh)
    • 12:30 14:00
      Lunch 1h 30m Bayes Centre

      Bayes Centre

      47 Potterow, Edinburgh, EH8 9BT
    • 14:00 15:00
      Neural Ratio Estimation of Evolutionary Dynamics with Transformer Models 1h G.03 (Bayes Centre)

      G.03

      Bayes Centre

      47 Potterow, Edinburgh, EH8 9BT
      Speaker: Folgert Karsdorp (Meertens Institute)
    • 15:00 15:30
      Coffee 30m Bayes Centre

      Bayes Centre

      47 Potterow, Edinburgh, EH8 9BT
    • 15:30 16:30
      Implicational universals, probabilities and grammar competition 1h G.03 (Bayes Centre)

      G.03

      Bayes Centre

      47 Potterow, Edinburgh, EH8 9BT

      At its simplest, grammar competition is the view that individuals associate linguistic variants with probabilities as part of their knowledge, and that these probabilities are reflected in usage. Roberts (2021) has suggested that the grammar-competition worldview is unable to handle a well-motivated linguistic universal, the Final-over-Final Constraint (FOFC). In this talk I sketch a way of unifying competing grammars and FOFC, and show that it makes interesting predictions about usage, stored probabilities and diachronic change. I will also show how this reasoning is also applicable to universals other than FOFC, such as Blake’s hierarchy of case systems, in a way that is consistent with the available corpus evidence.

      Speaker: George Walkden (University of Konstanz)
Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×