Character based analysis

Character segmentation is known to be particularly hard for handwritings. By contrast, printed texts are distinguished by clearly separated letters, making optical character recognition much more simpler.

There is, however, also a much more fundamental ontological issue: Boundaries between letters of handwritings do generally not exist. There are just a couple of exceptions, for example, when the writer sets off his writing instrument after drawing each glyph or in specific kinds of handwritings, such as the Carolingian minuscule.

It is nevertheless meaningful to separate letters within words. This is due to the fact that the palaeographer can search through the document in order to find specific strings. For this case, it is not relevant that there is no clear boundary.

But there is much more to say about the relevance of character segmentation. From the point of view of one of the traditional palaeographic schools, the treatment of individual letters is mandatory. There are basically four reasons:

  • The visual appearance of single letters varies more or less for single writers. It is of relevance to take into account the different shapes of individual letters. For example, there is tradtionally the distinction between the long ’s‘ and the round one. Such category related differences as well as the spectrum of variations within single categories are of interest to characterise a single writer.
  • The orthography in the Middle Ages was not as standardised as it is today. A lot of different ways how to write single words can be found in ancient documents. But even today, characteristic mistakes can give us valuable insights into the writer.
  • Abbreviations have been widespread in the Middle Ages. Their usuage can us tell a lot about a single writer. Even more important, they can generally not be resolved without their careful treatment by an editor who is transcribing a given document.
  • Holistic word-wise translations do not make much sense. It is rather necessary to look precisely at the case sensitivity of a handwriting, at the consistent use of specific letters, such as the ‚i‘ and the ‚u‘ as vowels in comparison to the ‚j‘ and the ‚v‘ as consonants, respectively, to mention just a few examples. More generally, any text is in need of interpretation, extending to the analysis of individual letters.

These are just a few examples why a research paradigm based on individual letters is of particular relevance for which Diptychon provides the basis.