The Historical Semantics of Temporal Comparisons Through the Lens of Digital Humanities
Promises and Pitfalls
This chapter explores ways in which digital methods may help to study the historical semantics and pragmatical uses of comparisons. From the perspective of traditional conceptual history an obvious starting point would be to search for occurrences of the noun »comparison« itself, including its derivatives and synonyms. In this chapter, however, we move beyond single-word searches or collocation studies towards what one might call a computer-assisted »historical semantics of utterances«. In ordinary language most comparisons are indeed »performed« through inconspicuous sentences such as »x is like y« (equations) or »j is better than k« (grading comparisons). When utterances such as these (and not words) are chosen as our minimal unit of analysis, any computer-assisted study of the semantics and pragmaticall uses of comparisons becomes much more complicated. Digitized text corpora have to be specifically tagged, and digital search tools need to be »trained« in order to achieve an error-free identification of the searched for sentences. In this chapter we proceed in three steps. In a first step (section 2) we develop a basic typology of comparative utterances, then delimit our present field of inquiry to one particular type of utterances (temporal comparisons) and identify the crucial semantical / syntactical markers which might enable digital tools to recognize temporal comparisons in digitized text corpora. In a second step (section 3) we apply our typological considerations to an already existing, highly elaborated digitized text base: the Hansard Corpus of British Parliamentary Debates (1803-2005) (HC), supported by the HTST tagger. It is demonstrated how queries have to be formulated in order to achieve valuable and relatively error-free search results. By contrast, in a third step (section 4) our analytical framework is applied to a self-defined text corpus – a corpus of utopias and dystopias ranging from Thomas More's »Utopia« (1516) down to Adam Sternbergh's »Shovel Ready« (2014). We explain the laborious process of digitizing and tagging the corpus by using a freely available tagger (Corpus Workbench), then describe several rounds of queries and finally evaluate the search results. It turns out that these are relatively weak and defective, compared to those achieved with the Hansard Corpus and the HTST tagger, as well as compared to a »traditional«, hermeneutical way of filtering out comparative utterances. Hence, our contest between »man« and »machine« ends with a draw. While well-established corpora and refined tools may render valuable results in some cases, the balance is more mixed when corpora are self-defined and tools are less sophisticated: in these cases we still need to carefully weigh up the advantages of speed provided by digital tools against the amount of preparatory work and the rate of errors to be corrected manually.