Skip to main content

DiaCollo: collocation analysis in diachronic perspective

In the words of the famous language philosopher Ludwig Wittgenstein: "the meaning of a word is its usage in the language" (Philosophical Investigations, Part I, section 43). In other words, the meaning of a word can be revealed by the context in which it appears. An ambiguous word such as ‘bank’ can be be disambiguated given its context: the ‘bank’ bounding a body of water will tend to occur together with terms like “river”, “lake”, or “slope”, while the ‘bank’ which is a financial institution will tend occur together with expressions like “money”, “cheque”, or “go to”.

Changes in a word's meaning will therefore often be directly associated with changes in its characteristic combinations (the set of words with which it typically occurs together, its collocates). Even political, cultural, or social changes relating to a central term can be revealed and traced through its typical combinations (see the example for ‘revolution’ below).

DiaCollo is a software tool for the discovery, comparison, and interactive visualization of the typical word combinations for a user-specified target term. Characteristic word combination profiles based on various underlying text corpora can be requested for a particular time period, as well as direct comparisons between different time periods. In addition to traditional static tabular display formats, a number of intuitive interactive online visualizations for query result data are also available.

A short guide on how to use DiaCollo

  1. Visit the DiaCollo query form in a browser to query the data from the German Text Archive text corpus
  2. Type the word Revolution in the QUERY field.
  3. Select Cloud from the FORMAT menu. Leave the rest of the fields unchanged.
  4. Click on the submit button (next to the QUERY field).
  5. In the box beneath the query section, the words that typically appear with Revolution will be displayed. The window initially shows the situation in 1610. The presentation format is a word-cloud: the displayed words will differ in size and colour based on their association strengths with respect to the target word, Revolution.
  6. Directly above the display area is a time-line beginning at 1610 and ending at 1910, divided into intervals of 10 years each. To the left of the display area is a scale of the (relative) association strengths for the displayed items for easier interpretation of the results.
  7. Clicking on a date in the time-line (e.g. 1790) will cause the typical combinations for Revolution in the corresponding decade to be displayed; clicking on a word in the display area will display a window containing detailed information on that word to be displayed, including a direct link to the respective underlying corpus hits. Alternatively, you can click on the play button to the left of the time-line to initiate an animation of the changes in typical word combinations over time. Playback speed can be altered with the vertical slider next to the play button.

You can modify the basic recipe above in various ways, for example by changing the queried time period (DATE) and/or the size of the intervals on the time-line (SLICE). You can also change the maximal number of displayed collocates (KBEST) or the mode of visual presentation (FORMAT). Additional corpora and further modes of application are also available. For instance, you can use DiaCollo to display the differences or the similarities between two different words on the basis of their typical collocates over a given time period, or to directly compare the typical collocates of a single word in two different time periods. Further details and examples can be found in the full CLARIN-D DiaCollo use-case (in German), as well as in DiaCollo's online help pages.

Additional versions of this guide

A more detailed guide with examples in German is available in PDF format.

CLARIN Centre
Berlin-Brandenburg Academy of Sciences (BBAW)
Project leader
Bryan Jurish
Contact email
Acknowledgements

DiaCollo is a use case of the CLARIN-D centre in the Berlin-Brandenburg Academy of Sciences and Humanities (BBAW).

Participating projects:

 

Related CLARIN-D tools and services

  • WebLicht web-based analysis tool
  • DTA::CAB historical German text analysis service