• Prof. Sabri A. Mahmoud
    Information and Computer Science Department
    King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia

  • Dr. Mohammad Tanvir Parvez
    Assistant Professor
    Computer Engineering Department, Qassim University, Qassim 51477
    Saudi Arabia

  • Hamzah Luqman
    PhD candidate
    Information and Computer Science Department
    King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia

  • Baligh Al-Helali
    Master student
    Information and Computer Science Department
    King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia
Online-KHATT database

Online-KHATT is a large unconstrained online Arabic handwritten text database, collected from 623 writers of different origins having 10,040 lines of Arabic online text written by electronic pen. The database is very much suitable for research in online Arabic handwriting recognition.
Online-KHATT database uses the source text of well-known KHATT database, thus ensuring a wide variety of topics and large vocabulary. In addition, Online-KHATT database provides natural unconstrained Arabic text. The database is designed for tasks that are of central interests to ICFHR community, like online handwriting recognition, writer identification and verification, character segmentation, electronic ink and pen-based systems, etc.

The competition database Online-KHATT consists of 10,040 lines of Arabic online text written by 623 writers using Android- and Windows-based devices. A 3-level verification procedure aligns the online text with its ground truth at the line level. The verified ground truth database contains meta-data describing the online text at the line level in text, InkML and XML formats. The text lines of the database are randomly distributed into training, testing, and verification sets that contain 70%, 15% and 15% of the database text lines respectively. Figure 1 illustrates some samples of online text from Online-KHATT database.

The organizers will provide the competition participants with the following data. The current competition will use only the training and validation sets of Online-KHATT database as the training and testing sets respectively. The participants will be provided with a training set of 6,974 lines of online Arabic text containing 56,547 Arabic words and 561,754 characters. For every line of online written text, a verified ground truth text line will be provided. The participants will use the training set of online text to train their systems.

Figure 1 Samples of Online-KHATT database
Evaluation of participating systems

The participants will submit their Arabic online text recognition systems as executable files. The organizers will test these systems on a hidden evaluation set of 1,533 lines of text (like the ones in Figure 2 ) containing 12,284 Arabic words and 120,281 characters. The organizers will publish the results of the candidate methods on the evaluation dataset, as well as a comparative evaluation of the submitted systems.
For each line of text from the hidden set of Online-KHATT, the participating systems will produce an output file of recognized words. For each word in the input line of text, the systems need to generate 10 hypotheses, sorted in descending order (from top-1 most probable word to top-10 most-probable word). Different words from the same text line should appear in separate lines in the output file.
Note that, the participating systems may need to segment the input text lines into Arabic words. In addition, the participating systems may use a dictionary, if needed. However, the number of out-of-vocabulary tokens in the Online-KHATT test set is more than 45%.

Figure 2 Sample of Online-KHATT lines
Performance measures

The submitted systems will be ranked according to different criteria. We will evaluate the performance of the system at two different levels: word recognition accuracy and character recognition accuracy. As for the word recognition performance, the systems will be ranked according to the accuracy in recognizing the entire words present in a line of text. We will publish Top-1, Top-5 and Top-10 word recognition accuracies for the participating systems. In the case of character recognition accuracies, two modes of performance measures will be carried out. First, accuracy will be counted based on the characters in each recognized word. Second, character recognition accuracy will be calculated based on the entire line of text (at the PAW – Part of Arabic Word level), so as to offset any error in segmentation of the line into words. In both modes, only the Top-1 recognized words will be used. Finally, the participating systems will be compared based on the running time. Average word recognition time taken by each participating system will be estimated.

The deadline to submit the systems is 15 May, 2016.
To download the database, please email Dr. Mohammad Tanvir Parvez.

Copyright 2016