4–5 Jun 2024
50 George Square
Europe/London timezone

Understanding syntactic change: Constructing the infrastructure

4 Jun 2024, 15:00
Room G.05 (50 George Square)

Room G.05

50 George Square

50 George Square Edinburgh EH8 9LH


Beatrice Santorini (University of Pennsylvania)


In order to understand syntactic change, it is useful to be able to mine parsed corpora for relevant data - the larger, the better. State-of-the-art parsers now parse ever larger amounts of text, but their output generally does not include information of interest to linguists, such as grammatical functions or empty categories. So parsed corpora that are manually annotated for such information remain important, and they will remain important even as automatic parsers improve, at least as long as those parsers require training data. It is also worth noting that manual annotation of corpora is sensible when the amount of text for a given language or language stage is relatively small.

Of course, manual annotation has its own drawbacks - it is time-consuming and subject to human error. To some extent, these drawbacks can be addressed by what in business schools would be called best practices. In my presentation, I will present such tricks, methods, and general strategies as I have learned from constructing parsed corpora over the years in the hopes that they will prove useful and suggest further developments of their own to the workshop participants. If desired, I would be happy to discuss specific challenges faced by participants in particular cases of corpus construction.

Primary author

Beatrice Santorini (University of Pennsylvania)

Presentation materials