The Definition of Cataloguing
Criteria for a Cuneiform Sign List

The Concept of the Cuneiform

The Architectural Design of the
Web-based Cuneiform Data
Cataloguing System


Appendix: A survey of Cuneiform
Sign Lists























































































































The Digital Classification of Ancient Near Eastern Cuneiform Data

Theodoros Arvanitis, Tom Davis, Alasdair Livingstone, Javier Pinilla-Dutoit, Sandra Woolley[1]

Note: a multimedia presentation entitled The Poem under the Desert, which presents an introduction to the material offered here, will be found on the accompanying cd.


This paper introduces the ancient cuneiform sign system, the oldest known writing, and describes some of the the problems faced by scholars working in this field. It then shows how digital photography can help; specifically in assisting the construction of a Web-based descriptive list of the cuneiform sign set. Such a database has been designed, and is described.


The poet Shelley imagined, in 1817, the ruined statue of an ancient king, Ozymandias[2], who had caused the original edifice to be erected with the immodest inscription ‘look upon my works, ye mighty, and despair’. The despair would not, the poet suggested, be exactly what this king imagined, for instead of the monuments of an empire, around the statue ‘the lone and level sands stretch far away’.

The statue may not have existed, but it is certainly true that where there is now desert there were once empires. The lone and level sands cover three millennia of human culture and several distinct civilisations, which occupied a geographical area that stretches from the borders of Egypt well up into Russia. And the poet was right in one more respect: the real counterparts of the poet’s imaginary monarch can speak to us. The minds of those who created and lived in these empires, these civilisations, are preserved in written form. This writing is called cuneiform (literally: wedge-shaped), and it is preserved because it was written in clay. The clay tablets that were the 'pages' of cuneiform script, once baked, are extraordinarily durable; they even survived the ransacking and burning of libraries, as one empire laid waste to another.

The earliest writing in clay, dating from about the end of the 4th millennium BC, was mainly pictorial. Scribes used a pointed stylus to draw simple – stylised – images of the objects they wished to name. The process of stylisation took a radical step when, during the early third millennium BC, the pictographic images were replaced by a much simplified set of patterns. The new stylus used for these patterns was square in section, and held not like a pencil but like a screwdriver. The pointed end of two sides of the square section was pushed into the clay, making a wedge shaped indentation[3]. These ‘wedges’, as they are called, are used in a number of different combinations to create the 600 or so fundamental units of the cuneiform script: rthe cuneiform signs. Often pictographic in origin, but highly stylised, a cuneiform sign may be made up of anywhere between one and a dozen or more wedges, crossing each other to make complex patterns.

These six hundred[4] cuneiform signs, together with the archaeological evidence, are the key to all of our knowledge of the rich cultures of the Ancient Near East. Whole libraries of cuneiform tablets have been excavated, containing a literature of laws, poems, recipes, customs, religious observances, charms, prophecies, and much else – a vast download, the mental maps of several entire cultures. The signs were used to write three different major languages, and each sign at some point in its long history may have been used to express a concept (as Chinese characters express concepts) or a sound (as the signs of our writing system express sounds)[5] or both. All contained in wedge marks on clay tablets. 

When Shelley wrote 'Ozymandias', no-one in the world could read any cuneiform inscription whatsoever. The languages that they enshrined were dead, had not been spoken by any living person for over two thousand years, and the key to the signification of the signs completely lost. Indeed, the eighteenth-century inventor of the word ‘cuneiform’ clung quite tenaciously to the theory that the complex patterns of cuneiform were simply that: decorative patterns, without any meaning at all[6]. Proving him wrong took over a century of incredible mental effort, and the work is not yet complete.


There is much to hinder the study of cuneiform. Relatively few people study it, because it is difficult. Only a small proportion of the preserved texts have been properly deciphered and edited. Moreover no modern Western culture feels instinctively that the Ancient Near East, dead for millennia, is part of their cultural heritage (incorrectly, it has to be said). Thus relatively little money is available for that study. Because of this, there are not many photographs of tablets available, since until recently photography was expensive; and there are vast numbers of tablets to study. Students of the subject are therefore reduced to the simplest and cheapest technology available: pencil and paper. If you wish to copy cuneiform as it stands on the tablet, you have to draw it, and if you wish to see how a text looks like in its original form, you for the most part have to look at a published drawing. The books that contain cuneiform are full of drawings of tablets. The dictionaries of cuneiform, known as sign lists, consist of collections of drawings of signs, in fact usually drawings of drawings of signs.[7] One could even say that some of the most advanced scholars of the subject, world experts, have never seen many of the signs they work with daily, as they appear in the clay: they have only seen drawings. It is an extraordinary situation. The subject, difficult enough in itself, is rendered much much more difficult. And so not many people study it, and so there is not much money, and so on.

There is a way out of this situation. This problem, like so many others, is radically transformed by the cheap availability of powerful computing resources. Chemical photography is expensive; digital photography, once the equipment has been bought, is virtually free. And, again at virtually no cost, any number of digital photographs can be published on the World Wide Web, making the tablets that previously could only be seen in drawings or costly visits to museums easily available to anyone. The world of cuneiform studies has passed at one jump from the pre-photographic to the digital age, and will never be the same again.

The Birmingham University Digital Forensic Project came about as an early attempt to take advantage of this wonderful twin resource: limitless cheap photography, and limitless publication. It consists of a coalition of effort between four individuals who work in the University of Birmingham: Alasdair Livingstone, an Assyriologist; Theo Arvanitis, an expert in database design; Sandra Woolley, a digital imaging specialist, and Tom Davis, who teaches Bibliography and Paleography in the English Department and is a Forensic Handwriting Expert. The purpose of this coalition, funded by a generous grant from the University in an attempt to foster interdisciplinary studies, is to develop the potential for digital photography of cuneiform tablets.

To achieve the aim of this project we have concentrated in two areas of research. Firstly, we have found that it is possible to apply the methodology of forensic handwriting analysis to the wedge marks in clay of cuneiform writing. It is possible to discriminate individual characteristics in those marks, just as it is in modern ink-on-paper handwriting, that we believe will in many cases enable identification of individual scribes. The description of this work is beyond the scope of this paper. Secondly, we have developed a design for a classification dictionary of cuneiform: a sign list. All previous sign lists are highly selective in scope, and based on drawings of drawings[8]. Ours is intended to be far more global, and based on photographs of the signs as they actually appear in the clay. With it, for the first time, scholars in cuneiform will be able find out what the signs they spend their lives working on actually look like as they occur on real tablets.

The aim of this paper is to present the design of a Web-based cuneiform data cataloguing system to support the creation of a digital image-based dictionary of cuneiform signs for the period covering 3000 BC to 323 BC.

The Definition of Cataloguing Criteria for a Cuneiform Sign List, and the User Interface

To begin with, we need to establish a terminology, and hence define the relevant cataloguing criteria. The terminology we use derives from the linguistics of written language, as developed by Haas [1976][9] and Sampson [1985][10]. We distinguish three levels at which a sign can be classified. The bottom level, the instance, the actual mark one sees in the clay and wishes to identify, we call a graph. The top level, the level of the label which one will wish to use to identify that graph, the level of the sign, we call a grapheme[11]. There is an intermediate level, because a sign can be validly written in a number of different ways, which may differ considerably one from another but still be considered to be authentic representations of the same grapheme. We call this level the allographic level[12]. So, in alphabetic writing, as you look at what you are now reading, you are seeing a long string of graphs. Effortlessly you are identifying each graph as a valid instance of a particular letter or mark of punctuation, as a depiction of a grapheme. Even in the highly controlled graphic environment of print there is still much allographic variation: the grapheme /a/, for instance, may be a roman 'a' or an italic 'a', (in most typefaces a completely different form). All of these are recognised without hesitation by skilled readers as being different versions of the same thing, allographs of the grapheme /a/, and therefore, in our alphabetic script, inviting the same phonological realisation.

The sign list is organised around those three levels: graph, allograph, grapheme. We take the conventional designation universally used by Assyriologists, the cuneiform ‘sign’, as being synonymous with ‘grapheme’; this is not precisely true, but it is true enough, and no sign list would succeed if it radically departs from normal usage. So, at the top level, we have the sign. 

Each sign has a conventional name: here, dingir. At this level of the sign list we present a drawing of the conventional depiction of this sign, here taken from one of the current sign lists (Labat), with a sign number, which is used for referencing in that sign list. It is essential that a usable sign list should have a basis in and strong reference to current practice, as well as offering its own innovations, because otherwise it would simply be rejected as unusable by those trained in that current practice. Underneath we have the ancient name that cuneiform scribes gave to that sign: here, ‘ana’ or ‘dingiraku’.

Underneath that is what will become an overview of the sign’s history and geographical spread. It must be remembered that the cuneiform sign had a long and varied history: three thousand years of continuous use, in widely different civilisations, languages, and geographical areas. 22 subheadings represent a map of that history, giving the main geographical and chronological areas where cuneiform flourished. In the database for each of these subheadings there is a box which contains a number (which may be zero) and the abbreviation ‘ph’. If the number is greater than zero then this will represent the number of distinct allographs of the sign that we have so far included in the sign list for that period or geographical area, and this number will be a hypertext link to the allographic level of the database. ‘Ph’ is short for ‘phonology’, and this offers the opportunity of a link to a description of the phonological realisation for that period/area, if there is one.

The hope is that the sign level of the database, when the sign list has been populated, will enable the user to see more or less at a glance the range and spread of any particular sign, and the amount of allographic variation – in other words, the number of distinctly different ways of forming each sign -- in each of the geographical / chronological areas.

At the allographic level the user is presented with, for each period, all of the allographs of the sign that have so far been distinguished and entered. These are named by sign name, abbreviation for the period, and a number to distinguish this from the other allographs found in that period: so, ‘dingir NA1’ names the first allograph listed of the sign dingir found in the Neo-Assyrian period. At this level there will be a picture of a representative instance of each allograph as it is actually found in an ancient source, and a stylised representation of the allograph produced, not by drawing, but by a computer program we call the sign processor; of this, more later. Both of these representations are entirely new developments for the cuneiform sign list.

The allograph name functions as a hypertext link to the next level, the graphic. Here is found examples of the allograph described at the allographic level: up to six in all. It could be said that this level is the basis and foundation of the entire edifice: actual photographs of actual instances, the sign as it appears in the clay: the evidence for the allograph. Each carefully and comprehensively provenanced. At this level, the user will be able to see, most probably for the first time, several different examples of what a sign looks like, in a given period.

We envisage that one of the principal functions of the sign list will be to help users confronted by a sign that they cannot identify to determine which sign it is; in other words, to move from the unknown graph in front of them to the allograph of which it is an instance and thence to the grapheme that will identify it. In order to do this, we have designed an entirely new kind of search engine. We have of course straightforward text-based search engines, that will enable Boolean searches for sign names; but the standard problem that the user faces is precisely that they do not know the sign name to search for; what is needed, given that what they have is an unknown graphic object, is a graphic search engine.

The Concept of the Cuneiform Processor

We propose that the sign processor, mentioned above, should provide this. This is still at the development stage: a prototype has been built, tested, and found to work. It works like this. If you imagine a grid laid over an idealised version of a cuneiform sign, it might look something like the image in the sidebar. It can be seen that each cell of the grid contains a graphic shape; and the number of possibilities for the kind of shape that each cell contains is limited. If the user is given a set of buttons, one for each of the possibilities, then by clicking first on the correct cell, then on the appropriate button, they can put that shape into the selected cell and thus begin to build a depiction of the sign. This sign processor would require a certain amount of learning in order to acquire proficiency, but not much, since it is obvious and intuitive. Each allograph of each sign in our database would have its own sign-processed ideal representation, at the allographic level of the database.

This is very useful, but the utility does not stop there. Since each cell can contain only one of a small set of possibilities, each of those possibilities can be represented by a single letter of the alphabet. This produces an array, that will exactly describe that depiction. An array of letters is a text string, and as such it can easily be searched. Therefore, a user can be asked to create a sign processed image of the unknown graph, and the computer can translate this into a search string, and find matches for the whole sign, or for any part of it. The latter is of considerable importance: partial and fuzzy searches must be a component of this image search tool, to allow for graphetic variation between instances of the same allograph, user idiosyncrasies in translation from the graph to the computer screen, and, perhaps most important of all, to allow for a crucial characteristic of the cuneiform data: that it is fragmentary. Cuneiform tablets can put up with much hard usage, but not even they can survive entirely untouched by the systematic and thorough destruction of the city that surrounds them. Many cuneiform tablets are broken, which means that many of the signs that the user will want to identify exist in fragmentary forms. We hope that our image search engine will assist the identification of fragmentary signs. We hope too that it will help in the teaching of cuneiform, enabling a simple standardised mechanical means of writing it. And that it will remove from cuneiformists the handicap of needing to know how to draw.

The Architectural Design of the Web-based Cuneiform Data Cataloguing System

The design is modular, scalable and platform-independent. The cuneiform database application uses a three-tier architecture (see the sidebar diagram). The clients are web browsers that make requests to a web server through Java servlets. The web server communicates with the database through a JDBC interface and retrieves or updates data to the database.

A three-tier design, as opposed to a traditional client-server application, provides a more flexible and efficient configuration. The first tier of our application can use any number of Java-enabled browsers and provides the user interface to the database system. The second tier consists of servlets that encapsulate the logic of the application and provide access to the data. The third tier comprises the data repository. Our database management system (DBMS) uses mSQL. This tier is accessed using a relational database interface. In our design we make use of JDBC. Other technologies can also be supported.

JDBC (Java Database Connectivity) is an Application Program Interface (API) that allows connecting to tabular data sources using the Java programming language. Java is platform-independent and provides cross-platform capabilities. Our system runs in a variety of machines and operating systems without need to change the code or recompile. Any module can run in a different operating system. We have tested different heterogeneous configurations using Windows NT and Unix operating systems.

'Servlets are protocol- and platform-independent server side components which dynamically extend Java enabled servers. They provide a general framework for services built using the request-response paradigm. Their initial use is to provide secure web-based access to data which is presented using HTML web pages, interactively viewing or modifying that data using dynamic web page generation techniques.'[13]

The database is a SQL relational database built using mSQL. Basically, the database is composed of the following tables:

  • Sign
  • Allograph
  • Instance
  • Image
  • Period
  • Metaperiod
  • Phonology

The figure below shows a high-level class diagram of theabove-mentioned components.

Signs are classified by periods and periods are grouped by metaperiods. Signs have allographs and allographs have instances. In addition, allographs are linked to periods, to allow for flexibility. Sign, Allograph and Instance records use pictorial information, which is detailed in the Image table. The user accesses the dictionary through a series of web-based interfaces that allows browsing the data at several levels. These interfaces have been described and illustrated above.


The sign list will exist on the World Wide Web, for anyone to consult free of charge. Users will be invited to collaborate in its construction, by sending in images of signs (identified or not identified) that they think should be included. A key to the whole enterprise is availability of data. We have been very fortunate in acquiring the co-operation of a number of major cuneiform collections.

The other key to this task is labour, and money. The seed money provided by the University of Birmingham has been used to create it; now we need to fill it. We are engaged in searching for funds to do this.

Assyriologists, who face an extraordinarily difficult task of interpretation, may sometimes feel themselves as despairing as those who are invited to contemplate the statue of Ozymandias. We hope that our sign list may give them some reasons to rejoice.

Appendix: A survey of Cuneiform Sign Lists

1. Charles Fossey 1926, Manuel d'Assyriologie II: Evolution des Cunéiformes, Paris.

Fossey’s objective was to give one example of each allograph of every sign he encountered in the material available to him, which was in the form of published hand copies by other scholars. He simply listed the signs, giving the place of publication, and arranged the material in a rough chronological and geographical order. His collection is rich and still useful, but it still presents numerous impedimenta to the user. For example, it is impossible for the user to distinguish between sign forms that constitute a norm for a place or time and those that may be examples of hapax legomenon. Fossey states that his intention was not to give the same allograph form twice for one period and area, but he nevertheless gives very near allographs. In a total of over a thousand pages thirty-six thousand individual sign forms are registered. A whole section is devoted to cases where allographic variation could result in confusion between two signs.

2. René Labat 1948, Manuel d'Epigraphie Akkadienne, Paris.

Labat depended greatly upon Fossey as well as – although to a much lesser degree - specialized sign lists that had appeared in the meantime (such as Burrow’s archaic sign list) Ur Excavations: Texts II (1935) pp. 61 ff. ; he also depends on his own intuition. Labat's list is much more clearly organized than that of Fossey, with columns and boxes for archaic, ED and Old, Middle and Late Assyrian and Old Middle and Late Babylonian. Although the subject of Assyriology had expanded greatly during the twenty-three years separating Fossey and Labat, the total number of forms given by Labat is well under a quarter of those given by Fossey.

Whereas Fossey’s collection struck deeply into the range of then published cuneiform, Labat’s does not. Labat’s book is by name a Manuel d’Epigraphie Akkadienne. However, apart from the trivial point that most of it is palaeography and not strictly speaking epigraphy, Akkadienne is used also with a certain freedom:  the ample ED column means that Sumerian is included, while the Ur III period and Vth dynasty of Lagash are left out. Mari – already then, as now, a pride and joy of French Assyriology - is not singled out for careful treatment, although the first volume of Archive Royale de Mari in the Textes Cuneiformes du Louvre had already appeared, albeit only two years earlier.

3. Rykle Borger 1978, Assyrisch-babylonisch Zeichenliste, Neukirchen-Vluyn.

This is a fuller (413 page) version of an earlier Zeichenliste from the same hand of 1971 (124 pages). The paleographic part of this is limited to thirty-two pages at the beginning of the book. This contains eight columns of which only the first six contain signs. These are an undifferentiated mix of Neo- and Middle Assyrian, followed by Neo-Babylonian, Kassite boundary stones, Old Assyrian, Old Babylonian general and Old Babylonian Code of Hammurabi. The final two columns refer the reader to Johannes Friedrich’s Hethitische Lesestuecke (Heidelberg, 1960) and to Fossey. The bulk of the Zeichenliste is taken up with supplying Sumerian and Akkadian readings for groups of signs, and a rough guide to the Sumerian verbal chain. The former was a godsend to students, who until its appearance still had to use Deimels’ outdated Sumerisches Lexikon of 1932 (Rome), which, like Fossey, collected data but did not interpret it.

4. Friedrich Ellermeier 1979, Sumerisches Glossar, Nörten-Hardenberg bei Göttingen.

This comes from the same stable as the Zeichenliste and is in a sense almost as much - or as little - a list of signs. Only Neo-Assyrian sign forms are given, and the purpose is to establish what values signs, including compound signs, can have.

Finally, two specialized sign lists must be mentioned:

5. Chr. R/uster and Erich Neu 1989, Hethitisches Zeichenlexikon. Inventar und Interpretation der Keilschriftzeichen aus den Boghazkoi-Texten, Wiesbaden.

This sign list gives a multiplicity of allographs of individual signs, but, unlike Fossey, does not indicate sources for individual quoted forms. As indicated above, an important desideratum should be that a user would be able to trace the spread of scribal traditions over space and time. For instance, consider the following instance. Received opinion is that the Hittite cuneiform derives from a late Old Babylonian cursive script such as was used at sites such as Alalakh, Tell Atchana, and that the script, plus, quite likely, the scribes themselves and their equipment, were carried back to Hattusas as booty by the Great King Hattusilis I during his campaigns in North Syria around 1550 BC. Further, Heinrich Otten, the Altmeister of Hittitology has expressed the opinion that the development of cuneiform among the Hittites was not an internal Hittite phenomenon but was linked to developments in the Syro-Mesopotamian region. A scholar who wished to test this hypothesis would not be greatly assisted by Ruester and Neu’s signlist.

6. M.-J. Steve 1992, Syllabaire Elamite. Histoire et Paleographie, Paris.

This sign list has to deal with a much smaller corpus of tablets than the others mentioned so far. Allographs are given, though without quoting sources. On the other hand the chronological development and usage is well represented in table form.


[1] This paper represents the work of a highly interdisciplinary project, the collaboration of experts from Departments of Electrical and Electronic Engineering, Archaeology, and English.

[2] Shelley wrote the poem for a sonnet writing competition run by the periodical The Examiner; it was first published on January 11, 1818, in that journal. He took the name Ozymandias from the Greek Historian Diodorus Siculus, who was writing about the monarch we know as Rameses II.

[3] The way in which cuneiform was written on clay is shown in the accompanying multimedia presentation, The Poem in the Desert.

[4] This is of course a very approximate number: the total number of signs in use for any version of cuneiform varied very considerably during the long history of the writing system.

[5] We have concept signs too, of course: for instance, the numerals, which differ in phonological realisation according to the language of the reader, but always express the same concept. And Chinese characters often consist of a combination of concept and phonological (signific and phonetic) indicators.

[6] Thomas Hyde, Regius Professor Hebrew and Laudian Professor of Arabic at Oxford, in his Historia Religonis Veterum Persarum (Oxford, 1700). He called the signs ‘ductuli pyramidales seu Cuneiformes’ (pp. 517, 526).

[7] A survey of available sign lists is given in Appendix.

[8] See the Appendix.

[9] Haas, W. 1976, Writing Without Letters, Manchester University Press, Manchester.

[10] Sampson, G. 1985, Writing Systems: a Linguistic Introduction, Hutchinson, London.

[11] Technically, a grapheme is a minimum meaning-bearing unit of a given writing system. If the difference between two units in a writing system is experienced by users of that system as significantly changing the meaning, then that difference is graphemic. If the difference is not significant of a meaning change, then the difference is said to be graphetic. The difference between any instance of /a/ and /b/ is graphemic; that between instances of /a/ in two different fonts is graphetic.

[12] The difference between allographic and graphetic variation is that the distinction between any two graphs of the same grapheme is graphetic, even if they are identical; they are two different instances of the same thing. Allographic variation is graphetic variation that is marked or obvious, seen by users as two different ways of writing the same thing. For instance, the difference between roman /a/ and italic /a/.

[13] 'Java servlet technology': White paper (accessed 2000-12-16).