Restoring and attributing frail texts using deep neural networks

Restoring and attributing frail texts using deep neural networks

Ithaca logo

Yannis Assael1,*, Thea Sommerschield2,3,*, Brendan Shillingford1, Mahyar Bordbar1, John Pavlopoulos4,
Marita Chatzipanagiotou4, Ion Androutsopoulos4, Jonathan Prag3, Nando de Freitas1

1 DeepMind, United Kingdom
2 Ca’ Foscari College of Venice, Italy
3 College of Oxford, United Kingdom
4 Athens College of Economics and Industry, Greece
* Authors contributed equally to this work

Open In Colab

Dilapidated History depends on disciplines such as Epigraphy, the investigate cross-test of inscribed
texts usually known as “inscriptions”, for proof of the idea, language, society
and historical previous of previous civilizations. On the opposite hand, over the centuries many inscriptions
were damaged to the level of illegibility, transported removed from their
normal region, and their date of writing is steeped in uncertainty. We
existing Ithaca, the indispensable Deep Neural Community for the textual restoration,
geographical and chronological attribution of frail Greek inscriptions. Ithaca
is designed to help and magnify the historian’s workflow: its architecture
specializes in collaboration, decision make stronger, and interpretability.

Restoration of damaged inscription
Restoration of damaged inscription: this inscription (IG I3 4B) data a decree pertaining to the Acropolis of Athens and dates 485/4 BCE. (CC BY-SA 3.0, WikiMedia)

Whereas Ithaca alone achieves 62% accuracy when restoring damaged texts, as soon
as historians exercise Ithaca their efficiency leaps from 25% to 72%, confirming
this synergistic examine wait on’s affect. Ithaca can attribute inscriptions to
their normal region with 71% accuracy and would possibly doubtless date them with a distance of
now not up to 30 years from ground-reality ranges, redating key texts of Classical
Athens and contributing to topical debates in Dilapidated History. This work exhibits
how items love Ithaca can unlock the cooperative probably between AI and
historians, transformationally impacting the manner we investigate cross-test and write about regarded as one of
the major durations in human historical previous.

Ithaca architecture
Ithaca’s architecture processing the phrase “δήμο το αθηναίων” (“the oldsters of Athens”). The first 3 characters of the phrase had been hidden and their restoration is proposed. In tandem, Ithaca additionally predicts the inscription’s predicament and date.


When using any of this project’s offer code, please cite:

  title={Restoring and attributing frail texts using deep neural networks},
  creator={Assael*, Yannis and Sommerschield*, Thea and Shillingford, Brendan and Bordbar, Mahyar and Pavlopoulos, John and Chatzipanagiotou, Marita and Androutsopoulos, Ion and Prag, Jonathan and de Freitas, Nando},

Ithaca inference online

To wait on extra examine in the sector we created a web based interactive python notebook, the attach researchers can expect regarded as one of our trained items to receive textual notify material restorations, visualise consideration weights, and more.

Ithaca inference offline

Evolved customers who deserve to keep inference using the trained mannequin also can fair need
to manufacture so manually using the ithaca library at once.

First, to set up the ithaca library and its dependencies, bustle:

Then, receive the mannequin through

curl --output checkpoint.pkl

An example of using the library also will more than doubtless be bustle through

python --input_file=example_input.txt

which is able to bustle restoration and attribution on
the textual notify material in example_input.txt.

To bustle it with assorted input textual notify material, bustle

python --input="..."
# or using textual notify material in a UTF-8 encoded textual notify material file: 
python --input_file=some_other_input_file.txt

The restoration or attribution JSON also will more than doubtless be saved to a file:


For plump help, bustle:

python --help

Dataset generation

Ithaca used to be trained on The Packard Humanities Institute’s
Searchable Greek Inscriptions” public
dataset. The processing workflow for generating the machine-actionable textual notify material and
metadata, as well to extra info on the practice, validation and test splits
come in at I.PHI dataset.

Coaching Ithaca

Study practice/ for directions.


Apache License, Model 2.0

Read More



Hey! look, i give tutorials to all my users and i help them!

you're currently offline