The digital analysis of writing systems

A discussion paper

 

Any writing system is an information technology. It has an input device, say a pencil or a stylus; an output device, the writing surface itself, say a sheet of paper or a clay tablet; and a means of storing information in conventionally coded form so that it can be retrieved at a later date. What the new information technology of the microcomputer has that its predecessors lacked is a means of processing that information. Because of this processing power, which is now enormous, it is possible to use the new technology to analyse the old.

What kinds of analysis are possible, and interesting? It depends on the writing system. In the cuneiform system, for instance, which could not be read by any living person for nearly two thousand years after it fell out of use, the primary and enormous task was decipherment: unlocking the code and enabling its phonological and semantic realisation. For the writing systems of living languages, this is hardly a pressing problem. But for all writing systems there is another, perennial, interest: what can one read in them other than the meaning that they present? Is it possible to tell when and where a document was written, and, most important of all, is it possible to tell who wrote it, from examining the characteristics of the written trace?

Those who ask this question have two different names; there is no single term to cover both. For ancient documents, students who study and attempt to derive meaning from the form as well as the content are called palaeographers; these are mainly academics and librarians, though some make a living doing consultancy work for auction houses. For modern documents, they are called forensic document analysts. Or, more casually, handwriting experts. These operate as forensic scientists, and are employed by the police and the government, or are self-employed: their job normally consists of examining documents for evidence of authorship and presenting that evidence in court. Since a substantial proportion of our financial system is dependent on the belief that a signature is identifiably unique to the author of that signature, they have a lot of work to do, and on their judgements depend, very often, enormous sums of money.

Forensic document examiners normally have science degrees, and consider themselves to be scientists. Indeed, their livelihood depends on this, because they give opinion evidence in court; only an accepted scientific expert can give such evidence. Palaeographers generally have arts degrees, and their judgements tend to make less of a claim to scientific status, though it is clear that both sets of experts are doing almost exactly the same job of work [1] . There is no doubt that, for instance, Home Office forensic handwriting experts in the UK are carefully trained and operate a rigorous, cautious, and disciplined methodology; and it is known that the Police are happy to have handwriting evidence (when they can get it: properly qualified handwriting experts are notoriously cautious).  But the ambiguity represented in the dual expertise of the document analyst and the palaeographer, the arts student and the science student, represents a definite ambiguity in the nature of the discipline: is it, can it be, a science? For many years this has been a common question in cross-examination of forensic document experts, and this uncertainty has recently become a very large issue with the Daubert ruling [2] in the U.S, which has been used to cast considerable doubt on the scientific credentials of handwriting experts.

One reason for this doubt is this. When a handwriting expert makes a statement, for instance that two pieces of handwriting (or two different cuneiform tablets) were produced by the same person, what he or she is doing is comparing those two inscriptions with a very large number of other inscriptions. He or she is saying that the two writings in question are similar to each other in respects in which they differ from all those other samples. The other samples may no longer exist: they are members of a reference database that exists in the examiner’s memory, based on years of experience in looking at the physical characteristics of handwriting.

The problem is, however, that that experience, that information, is not accessible to anyone else. It exists only inside the expert’s head. We believe that the internal databases of two different experienced experts will contain more or less the same information, and consulting them will produce more or less the same results, and one can test this by showing the same problem to two or more experts and seeing how much they agree or differ—this is the essence of how the scientific expert works in a contestatory legal system; but we have no more direct method of finding out. The information remains locked in the expert’s head, only accessible to his or her subjective introspection.

The problem, in a word, is that handwriting analysis lacks a taxonomy [3] . A published reference set, searchable and available to all. A reference set that contains answers to questions such as: what are the characteristics of handwriting by someone who is, say, French, or Neo-Babylonian, or suffering from Parkinson’s disease, or left-handed, or drunk, or disguising his or her hand, or female, or… and so on, for a long list. In other words, a reference set that makes public, explicit, and available to all the information that a trained document expert or palaeographer will have in his or her head. Without such a taxonomy, it is difficult for a discipline to be taken seriously as a science. With it, there would be an objective and agreed set of standards to which everyone can refer: lawyers, for instance, or archaeologists in the field. The assertions of experts would become properly testable, and their expertise learnable by more formal methods than the acquisition of experience. Learnable as a science is learned.

The reason for this lack of taxonomy (which is to be found everywhere in the study of writing systems: it does not exist for cuneiform, or seventeenth century English secretary hand, or 9th Century Carolingian Minuscule) is not hard to find. A complete taxonomy of each of these scripts would have to be photographic. No verbal description would be remotely satisfactory; for the reference set to be of any use one would have to see it. And photography, until recently, was expensive, and the publication of photographs in book form, prohibitively, impossibly expensive. [4] Moreover, only the simplest possible indexes can operate in a printed book; and reference material like this requires very complex indexing.

However, that has all changed. Megapixel digital cameras are now relatively cheap, standard computers can handle multi-megabyte image files with ease, the World Wide Web makes publication of images very easy, and easily searchable; and all of this, once equipment costs have been paid, is extremely cheap. Extraordinary opportunities are opening up for the digital analysis of writing systems.

However, these opportunities also bring problems: research problems. The principle of such an analysis is easy to see: one makes digital photographs of examples of the writing system under analysis, whether this system is modern French, or Neo-Babylonian, or whether the writing is disguised or produced by someone suffering from Parkinson’s’ Disease; one analyses the photographs, extracts exemplary instances, and puts them in a database that can output to the Internet with some form of hypertext indexing system.

But: cuneiform writing is three-dimensional. And, in fact, so is all writing, and some of that three–dimensional data is used by forensic analysts. 3D photography is not easy. Another problem: the data for these reference lists is extremely large: how does one make it retrievable? How does one manage it? What kind of interface (or interfaces) would be the optimum for the range of users envisaged? What kind of hardware would carry that interface? Clearly, a standard PC screen; but this is not much use to the lawyer in court or the archaeologist in the field, to whom some form of wireless PDA would be much more useful. Another problem: The contents of this (potentially vast) reference database are pictorial. Of course, the data will be analysed and labelled. But the essence of the concept, the fact that the data must be pictorial, means that not every aspect of the data can have a searchable text label. How then does one search it? For instance, if an Assyriologist sees an example of a cuneiform sign that he or she doesn’t recognise, s/he will know that it must be an instance of a member of a total set which is the entire cuneiform sign set, which has about 600 members; but which one? How to look this up?

These questions can only be answered collaboratively. They are engineering questions, but require input from experts in the writing systems involved. Such a collaboration already exists, and has had considerable success in opening up these questions and beginning to find answers for them. This is the Cuneiform Digital Forensic Project at the University of Birmingham. This interdisciplinary project brought together an Assyriologist, a Forensic Document Analyst, and experts in database design and digital imagery. This collaboration has developed prototype solutions to some of these questions, in the context of Cuneiform writing on clay tablets produced during the third to the first millennia BC, and has been awarded a major Leverhulme grant in order to implement these solutions. This grant, for A Study of the Cuneiform Sign at the Graphemic, Allographic, and Graphetic Levels, (2002) will result in a palaeographic survey of all of cuneiform, and a register (known as a sign list) of all of the varieties of all cuneiform signs. The sign list will include a device known as the sign processor, which is a purely graphic search tool.

In one way cuneiform is the most difficult of all writing systems; the data is extremely ancient, and extremely difficult. But in another way it is much easier to do palaeographical analysis to cuneiform than to more modern pen-based inscriptions, because cuneiform is highly stylised. The signs are produced by making a wedge-shaped dent in clay with the triangular end of a stylus; there is only a very limited number of ways one can arrange these indentations, whereas a pen can go anywhere it likes on a piece of paper. And, although the number of clay tablets containing cuneiform writing that there are in the museums of the world is very large, it is by no means as large as the corpus of manuscript writings, which is virtually limitless. Nonetheless, the initial success of the cuneiform project makes it seem at least possible that similar success can be obtained in the digital analysis of more modern writing systems; and it is the purpose of this paper to open up discussion of that possibility.

Overtures have already been made towards other areas of palaeography. Members of the project have delivered papers to two interdisciplinary workshops at Oxford, hosted by Egyptology and Ancient Near Eastern Studies and the Centre for the Study of Ancient Documents. [5]

Tom Davis
The University of Birmingham
August 2002



[1] For a discussion of the difference in practice between palaeographers and forensic document analysts, see Davis, Tom. "The Analysis of Handwriting: An Introductory Survey." The Book Encompass'd. Ed. Peter Davison. Cambridge: Cambridge University Press, 1992. 57-68.

Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 594 (1993) established four criteria for expert scientific evidence. ‘The four Daubert factors are:   (1) whether the methodology upon which the testimony is based has been, or can be, tested; (2) whether the methodology "has been subjected to peer review and publication"; (3) the methodology's "known or potential rate of error" and the availability and use of standards to control the methodology's operation; and (4) the extent to which the methodology is generally accepted in the relevant scientific community.’   Abraham Pais, "Navigating Uncertainty: Gatekeeping in the Absence of Hard Science," Harvard Law Review 113.6 (2000): 1471.

‘1. Classification, esp. in relation to its general laws or principles; that department of science, or of a particular science or subject, which consists in or relates to classification; esp. the systematic classification of living organisms’ (OED).

In the 80’s, two such taxonomies were attempted, in one-year projects funded by the UK Home Office: Davis, Tom. The Handwriting of Old People, 1985, and Brown, Frances, and Tom Davis. Identification Characteristics of the Handwriting of Eight European Countries, 1989. The results are regularly consulted in forensic practice in Home Office laboratories—in photocopied form.

[5]

Davis, Tom, and Alasdair Livingstone. "The Work of the Cuneiform Digital Forensic Project." Egyptology and Ancient Near Eastern Studies and the Centre for the Study of Ancient Documents, Oriental Institute, Oxford, 2001.

Davis, Tom. "Forensic Handwriting Analysis: Practice and Theory." Handwriting Identification Ancient and Modern: a Workshop. Centre for the Study of Ancient Documents, Oxford, 2002. See also:
---. "Forensic Handwriting Analysis: Practice and Theory." Centre for the Study of Ancient Documents Newsletter (2002).