[ieee 2011 international conference on pattern analysis and intelligent robotics (icpair) - kuala...

Offline Handwritten Jawi Recognition using the Trace Transform

Mohammad F�� Nasrudin#1, Maria Petrou*2 �Centre for Artificial Intelligence Technology

Universiti Kebangsaan Malaysia, 43600 UKM Bangi Selangor, MALAYSIA [email protected]

�Department of Electrical and Electronic Engineering, Imperial College, London SW7 2AZ, UK

[email protected]

Abstract—The Trace transform, a generalization of the Radon transform, allows one to construct image features that are invariant to a chosen group of image transformations. In this paper, we use features that are known as object signatures, which are invariant to affine distortion, generated by the trace transform to discriminate between offline handwritten Jawi sub words. The process consists of tracing an image with straight lines, along which certain functionals of the image function are calculated, in all possible orientations. For each combination of functionals we derive the object signature which is a function of orientation of the tracing lines. In this paper, we demonstrate the usefulness of the derived signature and compare the result of the recognition with those obtained by using features based on affine moment invariants and angular radial transforms.

Keywords-Trace transform; Jawi; Handwritten Recognition; ART.

I. INTRODUCTION Jawi is a cursive script that was derived from the Arabic

alphabet and was adopted for the use of Malay language writing. Jawi can largely be found in old Malay manuscripts that have not been fully digitized yet. The Jawi alphabet contains 36 basic characters. Lately, a lot of studies have been devoted to cursive handwritten character recognition, both online and offline, such as cursive Latin [1], Arabic [2], Urdu, and Farsi [3]. The main reason for the attention is that this area is still open to research because cursive text characteristics are different and more complex than those of other types of text. This complexity is due to the fact that in handwriting, different characters are often written very similarly to each other.

Offline handwritten recognition requires the use of image features that capture the characteristics of the character or word shown so that they are invariant to the way the character is presented in the image. Various structural and statistical handwritten Jawi features in local and global contexts have been explored [4]. The process of extracting handwritten features, however, has been defined in a human-oriented perspective such that they capture the features the human vision system would recognize in the image. Thus, features that previously have been proposed for handwritten Jawi recognition such as in [5�6] are features which have some

physical meaning. It is not, however, necessary for the features to have a meaning to the human eye in order to represent a character or word well. Limiting the features among those that make sense to the human vision system somehow restricts the number of features that one can use.

Current methods in recognising handwritten Jawi tend to segment Jawi words into characters. This kind of implementation can be observed in [5,7�9]. The main benefit of recognising at the character level is that the number classes to be recognised will be kept minimum. However, the existence of non standard ligatures, overlapping of two or more adjacent characters in the same word, make segmentation of handwritten words seems impractical. One way to eliminate this problem is to do segmentation just up to sub words level. Then what is left is to find a useful object descriptor that can characterise immaculately the sub words that we would like to identify.

The object signature developed by [14] based on the trace transform has shown its effectiveness for Korean character recognition [10]. It has not been tested for other scripts yet. The Korean script (Hangul) is very distinct from Jawi in term of the cursiveness. Hangul is well harmonized with Chinese characters which are usually shaped squarely [11]. On the other hand, Jawi characters connect directly to the character which immediately follows something that gives written text an overall cursive appearance [4].

The alternative state-of-the-art in affine feature construction is the method based on moments and angular radial transform (ART). These methods have been around for some time and are being used in many applications, from optical character recognition to face recognition. In fact, ART had been proposed as region based shape descriptor for the MPEG-7 standard [12�13]. In this paper we apply the trace transform theory to the problem of handwritten Jawi recognition using the moments and ART based method as benchmarks.

In Section 2 we present the background to the trace transform and the idea of object signature that has been proposed in [14] for constructing image features that are invariant to affine distortions. In Sections 3 and 4, we elaborate on the experiments and results, respectively. We conclude in Section 5.

87

2011 International Conference on Pattern Analysis and Intelligent Robotics 28-29 June 2011, Putrajaya, Malaysia

978-1-61284-406-0/11/$26.00 ©2011

DAR-06

II. TRACE TRANSFORM The trace transform can be understood as a generalization

of the well-known Radon transform. The Radon transform of a real image function ),( yxf is a function ),( θrp defined by computing the integral of ),( yxf along all lines ),( θrL :

�� −−=D

dxdyyxryxfrp )sincos(),(),( θθδθ (1)

Here sin cos θθ yxr += is the normal parameterization of a line ),,,( trl θ r is the length of the normal from the origin of the axes to the line, � is the angle between the normal and the positive x axis, t is a parameter along the line, D is the area of support of ( )yxf , and � is the Dirac function (Figure 1).

The trace transform is similar to the Radon transform in the sense that it also calculates a functional of the image function along lines, but the functional is not necessarily the integral. The Radon transform, therefore, is a specific case of the trace transform. Consider criss-crossing image ),( yxf with lines

),,( trl θ in all directions. Denote by ),( θrL the set of all lines. The trace transform is a function ),,,( θrfTg defined on

),( θrL with the help of trace functional T (some functional of the image function ),( yxf when it is considered along line

),( θrl as a function of parameter t). Then

[ ]),,(),,,( trfTrfTg θθ = (2)

A triple feature, a number which can characterise image ),( yxf is generated with the help of two additional functionals

called diametric and circus functional, designated by P and , respectively [15�16]. The triple feature is defined as:

[ ][ ][ ]),,()( trfTPf θΦ=Π (3)

where represents the extracted triple feature of image ),( yxf , P is a functional that is applied to parameter r and

is a functional operating on the orientation variable �, after the two previous operations have been performed.

Fig. 1 Definition of the parameters of an image ),( yxf and tracing line

The extracted triple feature is influenced strongly by the properties of the chosen functionals T, P and . For practical applications, such as feature extraction, these functionals may be chosen so that the triple feature has one of the following properties [15�16]:

• invariance to rotation, translation and scaling;

• sensitivity to rotation, translation and scaling so that these parameters can be recovered;

• correlates well with some desired properties which we want to identify in a sequence of images.

Using an appropriate combination of functionals T, P and , thousands of triple features can be generated, although most of them will not be useful; nevertheless, one can investigate them and make the appropriate choice for the specific task with the help of experimentation.

A method had been proposed by [14] to characterise an object not by a single number produced by the cascaded application of the three functionals T, P and , but by using instead only the first two functionals. Using functionals T and P successively allows one to characterise an object by a string of numbers, which is like a signature of the object.

The object signature is a function, called the associated circus, )(φah , defined in terms of the function ),(φh which is produced by applying functionals T and P:

( )PTP KK

a hh −−≡ λφφ /1)()( (4)

Parameters λP, KT and KP are some real valued numbers which characterise functionals T and P (for details refer to [14]). If ,0=− PTP KKλ the associated circus is defined as:

φφφ

ddhha

)()( ≡ (5)

If we plot in polar coordinates the associated circus function of the original image, )(1 φah , and the associated circus function of the affinely distorted image, )(2 φah , they will produce two closed shapes, which are connected by a linear transformation. In order to be able to compare these two shapes, they have to be normalised so that their principal axes are coincidental. This can be done by a linear transformation applied to each shape separately as described in detail in [14]. The normalised shapes )(1 φnh and )(2 φnh are the signatures of the two images. For practical applications, the task of identifying an object is just comparing two strings of numbers,

)(1 φnh and )(2 φnh , that are circularly shifted and possibly scaled versions of each other. Figure 2 shows two signatures of a Jawi character in two different font types that are very similar in shape and differ mainly by rotation and scaling.

88

Fig. 2 The signatures of Jawi character “Pa” in two different font types

III. EXPERIMENT We collected nine sets of scanned articles written by nine

different writers. Each article was designed such that all possible combinations of 36 Jawi characters exist at least once in the text. This was to make sure that all kinds of character combinations were tested since cursiveness changes character shape depending on where the character appears within its word or sub word. Each article contains 213 words. Randomly we selected six articles as the reference data and the rest, three articles, as the testing data.

All the scanned word images were decomposed into a set of sub word images using Connected Component segmentation algorithm. This step is possible because there are seven Jawi characters that can’t be connected to their following character. The decomposition process generated 540 sub word images for each article. These sub words can be grouped into 216 classes. Thus in total there were 3240 images in the training data (reference dictionary) set and 1620 images in the testing dataset. We assume that the sub word images are subject to affine deformations when they are written by a different writer. Example images of the sub words in the reference dictionary are in Figure 3.

We shall compare the result of the trace transform method with that of the moments and ART method. For the trace transform method we computed the object signature, ha, by applying the functionals T and P. We tested seven different T functionals and eleven P functionals. The T functionals were:

− T1: Integral of ),(tf where )(tf is the value of the image

function along the tracing line;

− T2: Max of )(tf ;

− T3: Integral of )(tf ′ ;

− T4: Integral of )(tf ′′ ;

− T5: Lp quasi-norm (p = 0.5) = 2q , where q = integral of

)(tf ;

− T6: Median R+: )( ctf − where c is the median abscissa;

− T7: Weighted Median R+: )( ctf − where c is the median

abscissa and the weights are )()( cttf − .

The first seven P functionals are the same as T1 to T7, called P1 to P7, respectively. In addition, the following four P functionals were used:

− P8: t • median index dividing the integral of )(tf ;

− P9: Average of t • index max of )(tf ;

− P10: t • gravity center of )(tf ;

− P11: t median index dividing integral of )(tf ;

We then generated all possible pairs using these seven T functionals and eleven P functionals. In total, there were 7×11=77 candidates of circus functions or signatures to characterise an image.

Not all signatures are useful. We eliminated the useless ones. Features that have all zeros or all in one fixed value will be discarded. Then, we run numerous simple recognition processes using all possible object signature combinations. Each run generated results in ranks. We adopted a widely-used measure in evaluation of ranking method for information retrieval, which is Normalized Discount Cumulative Gain (NDCG) [17]:

�= +

−=n

j

jR

n jZnN

1

)(

)1log(12)( (6)

where n denotes position, R(j) denotes score for rank j, and Zn is a normalization factor to guarantee that a perfect ranking’s NDCG at position n equals 1. In evaluation, NDCG is further averaged over all queries. Based on the NDCG measurement, 24 out of 77 signatures are useful.

Each image was traced by lines one pixel apart, ie the value of parameter p for two successive parallel lines used differed by 1. For each value of p, 48 different orientations were used. This means that the orientations of the lines with the same p differed by 7.5 degrees. Each line was sampled with points one pixel apart, that is to say parameter t took discrete values with steps equal to 1 inter-pixel distance.

To compare the two signature values, one from the testing data and one from the reference data, we computed their circular cross-correlation coefficients. Circular cross-correlation, CX, is defined as:

��

�

==

=

×

−−=

N

ial

N

iat

N

ialat

ihih

dihihdCX

1

2

1

2

1

)(()((

))](())([()( (7)

where hat and hal are the signature values of the test image and reference image respectively, N is the length of the signatures and d is the shift. Two signatures are most similar when their correlation is minimum. We chose the maximum value over 48 shifts (equals to 48 different orientations used). This way, we produced 24 different numbers when comparing two images. Then, we used as measure of similarity the sum of these numbers. We ranked the numbers. The smallest number indicates the two most similar signatures.

For the moments based method we use the affine invariant functions of Flusser and Suk [18�20]. They are based on the

89

calculation of moments of the object to be described. Combination of these moments forms affine invariant descriptors. The six affine invariants are the following:

)(1 21102204

001 μμμ

μ−=− I

)34

46(1

212

221

32103

3123003122130

203

23010

002

μμμμ

μμμμμμμμμ

−+

+−=− I

))(

)()((1

221123002

122103301122

120321201000

3

μμμμ

μμμμμμμμμμ

−+

−−−=− I

)612968

186129

66(1

230

3022130

202

211

1230022

11221

20220

1230202203003

311

12210211200330021120

21032

11202

1202220

030221220031211

220

203

22011

004

μμμμμμμμμμμμμ

μμμμμμμμμμμμμμμμμ

μμμμμμμ

μμμμμμμμμμμ

+−++−−

−+++

−−=− I

)34(1 222133104406

005 μμμμμ

μ+−=− I

)

2(1

322

23104

213401322312204409

006

μμμ

μμμμμμμμμ

−−

−+=− I

where pqμ is defined as:

dxdyyyxxyxf q

object

ppq )()(),( −−= ��μ (8)

with ),( yxf being the grey level image function, and ),( yx the centre of mass of the object. The proposers of the moments method have suggested the use of either four or six of the invariant functions. We present results for both cases.

For the ART based method, we use the descriptors as discussed in [21]. The ART coefficients Fnm, of order n and m, are defined by:

� �=π

θθθ2

0

1

0

),(),( pdpdPfpVF nmnm (9)

where f(p,�) is an image function in polar coordinates and Vnm(p,�) is the ART basis function that are separable along the angular and radial directions, that is,

)()(),( pRApV nmnm θθ = (10)

An exponential function is used for the angular basis function to achieve rotation invariance. The radial basis function is defined by a cosine function:

��

��

�

��

��

��

�

≠=

=

=

00

)cos(21

)(

)exp(21)(

nn

nppRn

jmAm

π

θπ

θ (11)

The ART descriptor is defined as a set of normalized magnitudes of the set of ART coefficients. In this paper, we use twelve angular (m < 12) and three radial functions (n < 3) as proposed in MPEG-7. Thus, using the ART method, there were 36 features to characterise a sub word.

Poor recognition result could come from bad choice of classifier. In order not to stick just on one classifier, we run the recognition process using various classifiers from wide range of paradigms. These include Multi-layer Perceptron (MLP), Radial Basis Function (RBF), Support Vector Machine (SVM) [22], Naive Bayes [23], BayesNet [24], Decision Table [25], J48 [26], and AdaBoost [27]. The code was written based in the WEKA package and the default parameters were used for each algorithm. Before run, the data were normalised by being divided with an appropriate number so that all features combined are of the same order of magnitude.

IV. RESULT The results of all our experiments are presented in Tables 1

and 2. In Table 1, we list the recognition result for the benchmark methods. In the table, we put the benchmark features in the first column. In the second column and beyond, we put the percentage of recognition based on different classifiers.

The recognition result for the proposed method is presented in Table 2. In the table, we put in the first column the percentage of correct recognition in top-five choice. In second and third columns we put the percentages it recognised in sixth to tenth position and eleventh to fifteenth positions, respectively. In fourth column, we put the percentages of recognition in the sixteenth position and beyond.

V. CONCLUSIONS Results of the comprehensive experiments on evaluating

the benchmark methods using wide range of classifiers for handwritten Jawi recognition are presented in Table 1. ART features were found to be more useful compared to both moment based features. The results also identify that AdaBoost as the most useful classifier for all kind of benchmark features. It scores 32.05% as the highest recognition rate with the ART features.

In Table 2, on the other hand, the object signature clearly show better results compared with the benchmark method. The top-5 recognition rate of the object signature is 72.02%. Even the top-1 result of the proposed method that is not in Table 2, is 52.19%, which is better than 32.05% (ART + AdaBoost).

90

ACKNOWLEDGMENT The authors would like to thank the university for the

Research Grants No. UKM-GUP-TMK-07-02-034.

REFERENCES [1] R. Plamondon, S. N. Srihari. On-Line and Off-Line Handwriting

Recognition: A Comprehensive Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22 No. 1 (2000) 63–84

[2] L. M. Lorigo, V. Govindaraju: Offline Arabic Handwriting Recognition: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28 No. 5 (2006) 712–724

[3] M. S. Baghshah: A Novel Fuzzy Approach to Recognition of Online Persian Handwriting: Proceedings. 5th International Conference on Intelligent Systems Design and Applications 2005 (ISDA '05), 8-10 September 2005, Wroclaw, Poland (2005) 268–273

[4] M. F. Nasrudin, K. Omar, M. S. Zakaria, C. Y. Liong: Handwritten Cursive Jawi Character Recognition: A Survey, Proceeding of the 5th International Conference on Computer Graphics, Imaging and Visualization (2008) 247–256

[5] K. Omar: Jawi Handwritten Text Recognition Using Multi-level Classifier (in Malay), PhD Thesis, Universiti Putra Malaysia (2000)

[6] M. Manaf: Jawi Handwritten Text Recognition Using Recurrent Bama Neural Networks (in Malay), PhD Thesis, Universiti Kebangsaan Malaysia (2002)

[7] R. Mohammad: Modification of Combined Segmentation Technique for Jawi Manuscript (in Malay), Masters Thesis, Universiti Kebangsaan Malaysia (2002)

[8] C. N. Deraman: Extension of Combined Segmentation Technique for Jawi Manuscripts (in Malay). Masters Thesis, Universiti Kebangsaan Malaysia (2005)

[9] V. Mutiawani: Segmentation of Jawi Text Using Voronoi Diagram (in Malay) Masters Thesis, Universiti Kebangsaan Malaysia (2007)

[10] A. Kadyrov, M. Petrou and J. Park: Korean Character Recognition with the Trace Transform, Proceedings of the International Conference on Integration of Multimedia Contents, ICIM 2001, November 15, 2001, Chosun University, Gwangju, South Korea (2001) 7–12

[11] W. S. Kim & R. Park: Off-line recognition of handwritten korean and alphanumeric characters using hidden markov models, Pattern Recognition, Vol. 29, No. 5 (1996), 845–858

[12] M. Bober: MPEG–7 Visual Shape Descriptors, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 11, No. 6 (2001), 716–719

[13] W. Y. Kim & Y. S. Kim: A new region-based shape descriptor, Technical Report Dec 1999, Hanyang University and Konan Technology (1999)

[14] A. Kadyrov, M. Petrou: Object Descriptors Invariant to Affine Distortions, Proceedings MVC 2001, Vol. 2, Manchester, UK (2001) 391–400

[15] A. Kadyrov, M. Petrou: The Trace Transform and Its Applications, IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI, Vol. 23 (2001) 811–828

[16] A. Kadyrov, M. Petrou: The Trace Transform as a Tool to Invariant Feature Construction, Proceedings ICPR98, Brisbane, Australia (1998) 1037–1039

[17] K. Jarvelin & J. Kekalainen: Cumulated gain-based evaluation of IR techniques, ACM Transactions on Information Systems, Vol. 20, No. 4 (2002), 422–446

[18] J. Flusser and T. Suk: Pattern Recognition by Affine Moment Invariants, Pattern Recognition, Vol 26 (1993) 167–174

[19] J. Flusser and T. Suk: Affine Moment Invariants: A New Tool for Character Recognition, Pattern Recognition Letters 15 (1994) 433–436

[20] J. Flusser and T. Suk: A Moment-Based Approach to Registration of Images with Affine Geometric Distortion, IEEE Transactions on Geoscience and Remote Sensing, Vol 32 (1994) 382–387

[21] J. Ricard, D. Coeurjolly & A. Baskurt: Generalization of angular radial transform, International Conference on Image Processing 2004 (ICIP '04), Vol. 4 (2004), 24–27

[22] S. S. Keerthi, S. K. Shevade, C. Bhattacharyya & K. R. K. Murthy: Improvements to Platt's SMO algorithm for SVM classifier design, Neural Computation, Vol. 13, No. 3 (2001), 637–649

[23] G. H. John & P. Langley: Estimating Continuous Distributions in Bayesian Classifiers, Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (1995), 338–345

[24] R. R. Bouckaert: Bayesian networks in Weka, Technical Report 14/2004, Computer Science Department, University of Waikato (2004)

[25] R. Kohavi: The Power of Decision Tables. In: 8th European Conference on Machine Learning (1995), 174–189

[26] J. R. Quinlan: C4.5: Programs for Machine Learning, Morgan Kaufmann (1993)

[27] Y. Freund & R. E. Schapire: Experiments with a New Boosting Algorithm, In: Proceedings of the Thirteenth International Conference on Machine Learning (1996), 148–156

Fig. 3 Example images of the sub words in the reference dictionary

TABLE I PERCENTAGE OF CORRECT SUB WORDS RECOGNITION OF THE BENCHMARK METHOD USING SELECTED CLASSIFIERS

(DT IS DECISION TABLE, NB IS NAIVE BAYES, BNET IS BAYESNET, AB IS ADABOOST)

Method Percentage of correct recognition

MLP RBF SVM NB BNet DT J48 AB Avg Four moments 14.58 14.45 12.03 14.76 19.91 19.04 20.60 20.66 17.00 Six moments 21.59 13.52 11.85 15.45 17.31 18.49 20.91 24.57 17.96

ART 27.03 15.93 25.23 25.73 23.25 21.39 26.29 32.05 24.61

TABLE III PERCENTAGE OF CORRECT SUB WORDS RECOGNITION OF THE PROPOSED METHOD

Percentage of correct recognition 72.02 6.89 3.71 17.29

91

[ieee 2011 international conference on pattern analysis and intelligent robotics (icpair) - kuala...

Documents