regularized independent component analysis in face...

Regularized Independent Component Analysis in Face Verification

Pang Ying Han1, Teo Chuan Chin1*, Ooi Shih Yin1, Lau Siong Hoe1, Hiew Fu San2, Liew Yee Ping1 1 Faculty of Information Science and Technology, Multimedia University, 75450 Melaka, Malaysia

2 Infineon Technologies (Malaysia) Sdn. Bhd., Free Trade Zone, Batu Berendam, 75450 Melaka, Malaysia

[email protected]; [email protected]*; [email protected]; [email protected]

ABSTRACT A regularized Independent Component Analysis (denoted as RICA) is proposed in the application of face verification. In RICA, information of correlation coefficients between images is employed to form a Laplacian matrix. This Laplacian matrix is used for locating localized features through regularizing the facial data before independent component analysis (ICA) feature extraction. Since there are two different architectures of ICA (ICA I and ICA II), RICA is implemented on these two architectures, namely RICA_ICA I and RICA_ICA II, respectively. Two face datasets are adopted to access the effectiveness of the proposed techniques. The databases are Facial Recognition Technology (FERET) and CMU Pose, Illumination, and Expression (CMU PIE). From the experimental results, it is demonstrated that the both proposed techniques, RICA_ICA I and RICA_ICA II, are able to show its superiority in face verification. KEYWORDS Face; correlation coefficients; Laplacian matrix; regularization; Independent Component Analysis. 1 INTRODUCTION In pattern recognition, the captured image data is usually represented in very high dimension. Noise and redundant information are embedded in this high dimensional form, leading to performance degradation. This is known as curse of dimensionality. Hence, numbers of facial feature extraction techniques have been researched and introduced in order to transform/ project this high dimensional data into a more compact but informative feature representation [1][2][3][4][5][6].

Principal Component Analysis (PCA) is one of the well-known feature extraction techniques in pattern recognition, especially in face recognition [1]. In PCA, data variance is maximized so that the uncorrelated coefficients of the data could be visible for image representation. However, the robustness of this technique is constrained by its unsupervised learning nature. Therefore, this technique is then further enhanced through incorporating supervised learning mode for better performance by Belhumeur et al. [2]. This supervised technique is known as Linear Discriminant Analysis (LDA).

From the perspective of pattern recognition, higher-order dependencies in an image usually comprise nonlinear relations among pixels. And, this nonlinear information is significant for recognition. Hence, Independent Component Analysis (ICA) is proposed [3]. ICA attempts to seek a set of basis signals that are statistically independent. From the literature, two different architectures of ICA implementation are proposed. ICA architecture I (denoted as ICA I) treats images as random variables and pixels as outcomes; and, ICA architecture II (denoted as ICA II) treats image pixels as random variables and images as outcomes [3]. It is claimed that ICA I is focused on spatially localized features, whereas ICA II is mainly focused on global features [3]. The superiority of these two architectures has been demonstrated by Yuen and Lai [7] as well as Steward [3] in their research works.

In literature, there is a relatively new research path for discriminative data learning, that is based on the improvement of population statistics estimation and data locality preservation [8][9][10]. In feature analysis, training data is important for feature extraction techniques to

ISBN: 978-0-9891305-2-3 ©2013 SDIWC 60

understand the basic nature of the data. Theoretically, a large and accurate training set is preferable to ensure the performance of the techniques. However, practically, we only could collect a limited number of training samples due to time and human power constraints. This limited training sample may result biased estimates. Therefore, regularization is introduced to overcome this problem. In literature, Jiang et al. introduce Eigenfeature Regularization and Extraction (ERE) to resolve the biasness [8]. Furthermore, there is another regularized technique, known as Regularized Locality Preserving Projection, that serves the same purpose. The aim of this technique is to regulate Locality Preserving Projection (LPP) features [9]. Recently, a discriminant graph embedding technique that regulates sampling data locality is proposed [10].

Inspired by these works, a regularized Independent Component Analysis (RICA) is proposed here. In this technique, information of correlation coefficients between images is employed to form a Laplacian matrix. This Laplacian matrix is used to discover local geometry structures of the data. These features will then be used to regularize the data input before being processed for independent component analysis. Unlike the works of Jiang [8], Lu [9] and Pang [10] which require eigenspace decomposition for weighting function formulation, RICA directly utilizes eigenvalue of the Laplacian matrix to form weighting function for regularization.

The performance of RICA is tested on two public available face datasets: Facial Recognition Technology (FERET) [11] and CMU Pose, Illumination, and Expression (CMU PIE) [12]. Experimental results show that the proposed RICA is able to show its superiority compared with other feature extraction techniques. 2 THE PROPOSED TECHNIQUES: REGULARIZED INDEPENDENT COMPONENT ANALYSIS In this technique, correlation coefficients between images are utilized to discover the connectivity of

the data samples. These coefficients are the weights of each edge that signify the similarity/ correlation of data pairs 풙 , i.e. 푊 =

풙 and풙 arefromthesameclass

0 otherwise

(1)

where 퐶표푣 is a covariance matrix. A Laplacian matrix L is formed by utilizing the coefficient weights 푊 ,

퐋 = 퐃 −퐖 (2) where 퐖 = 푊 and 퐃 = 퐷 = ∑ 푊 ,∀푖 ≠ 푗. L will then be used for modeling the intrinsic composition of the data by computing the eigenvector 흊 of Laplacian scatter matrix 퐗퐋퐗 ,

퐗퐋퐗 흊 = 휑 흊 (3) where 퐗 = [풙 ,풙 , … , 풙 ] with {풙 ∈ 푹 |푖 =1,2, … , 푛}. Constitution of 흊 forms an eigenspace 퐕 = 흊 ,흊 , … , 흊 and 휑 is the eigenvalue that corresponds to 흊 where 휑 > 휑 > ⋯ > 휑 . Figure 1 illustrates a plot of eigenvalue 휑 versus the order of the eigenvectors.

Since L presents the connectivity of same-class data, Laplacian scatter matrix 퐗퐋퐗 highlights the disparity of same-class data, dubbed as intra-class variation. Larger value of 휑 that corresponding to 흊 indicates higher intra-class variation in 흊 . Hence, these 흊 with large eigenvalue should be regularized with smaller weights to reduce the same-class data discrepancy. On the other hand, zero value of 휑 implies nil intra-class variation embedded in 흊 . This zero variance, computed using the training set, is data specified and it might not apply same meaning on other data set. However, the smallest (either nonzero or zero) eigenvalues always imply the subspace possessing minimal within-class variation. So, the subspace should be greatly weighted. In other words, 흊 is regularized by imposing an eigenvalue-based weight, i.e.,

흊 = ( )

흊 (4)

ISBN: 978-0-9891305-2-3 ©2013 SDIWC 61

A plot of 휑 and the eigenvalue-based weight

( ) is presented in Figure 1 too for better

illustration.

Figure 1. A plot of 휑 (blue dotted line) and

( ) (red dashed line)

The training data is preprocessed for minimal within-class variance through projecting onto 퐕 = 흊 ,흊 , … , 흊 before ICA analysis, i.e.

풙 = 퐕 풙 (5) ICA attempts to produce a set of linearly independent basis signals from an observed signal [3]. In face analysis, a face data is considered as a mixture of unknown statistically independent source signals by an unknown mixing matrix (Figure 2). In this case, the regularized face input is the observed mixture, i.e. 풙 = 퐀풔. ICA seeks the separating matrix W in such a way that

풖 = 퐖풙 = 퐖퐀풔 (6) is an estimate of the true source signals. In this work, the regularized data input 풙 will be processed via two different architectures of ICA: ICA I and ICA II for feature extraction. These proposed techniques are known as RICA_ICA I and RICA_ICA II, respectively.

2.1 RICA_ICA I RICA_ICA I seeks a set of basis signals that are statistically independent. In this case, 풙 are variables and pixels are observations for the variables. Before ICA analysis, PCA is performed onto the data to reduce data dimension and discard small trailing eigenvalues [3]. Let R be a 푑 × 푝 matrix where 푑 is the number of data pixels and 푝 is the first 푝 eigenvectors of a set of 푛 face images. RICA_ICA I is implemented on TR where the regularized data input in the row are treated as variables and the pixels in the column are observations. Independent basis vector 풖 is computed,

풖 = 퐖퐑 (7) and 퐔 = [풖 ,풖 , … ,풖 ].

Based on the calculated PCA coefficients, 퐂 = 퐗 퐑 = 퐗 퐕푹, ICA coefficients matrix is calculated,

퐁 = 퐂퐖 = 퐗 퐕푹퐖 (8)

1 d

휑

1푒푥푝(휑 )

ISBN: 978-0-9891305-2-3 ©2013 SDIWC 62

Figure 2. Regularized ICA implementation on face 2.2 RICA_ICA II RICA_ICA II seeks statistically independent coefficients for the input data. Opposite to RICA_ICA I, 풙 are observations and pixels are variables. In order to reduce data dimension and discard small trailing eigenvalues, PCA is performed before feature extraction [3], just like that in RICA_ICA I. The statistically independent coefficients are calculated,

퐔 = 퐖퐂 (9) 3 EXPERIMENTAL RESULTS AND DISCUSSIONS The performance of RICA is evaluated using two different publicly available face databases. There are Facial Recognition Technology (FERET) [11] and CMU Pose, Illumination, and Expression (CMU PIE) [12]. These images are with significant illumination as well as facial expression variations. In this work, there are 100 subjects with 10 images per subject in FERET database. Five images per subject are used as training set and another remaining five images are used for testing. In CMU PIE database, there are 67 subjects with 20 images per subject. Half of the images per subject, i.e. 10, are used for training and another half, that is 10, images are used for testing.

In this paper, we address a performance comparison between RICA and other techniques, such as PCA [1], ICA I [3], ICA II [3], Locality Preserving Projection (LPP) [13], LDA [2], Supervised LPP [13], MFA [4] and MMC [15]. 3.1 Performance on FERET Database Figure 3 shows verification errors on the testing set against reduced dimensions t. Besides that, Table 1 records the best results corresponding to the optimal feature dimension t of various techniques. 3.2 Performance on CMU PIE Database Verification error rates of various techniques are illustrated in Figure 4. And, Table 2 demonstrates the lowest error rates corresponding to the feature dimensionality t obtained by various techniques on CMU PIE database.

Observed mixtures 풙

(i.e. regularized

face)

Separating matrix 퐖

Mixing matrix 퐀 Estimation

of source signals 풖

Source signals 풔

ISBN: 978-0-9891305-2-3 ©2013 SDIWC 63

Figure 3. Error rate against feature dimension t on FERET database

Table 1. Verification performance of various techniques on FERET database

Method Feature dimension t

Error rate (%)

Unsupervised technique PCA 80 40.04 LPP 60 40.37

ICA I 50 39.26 ICA II 30 38.27

Supervised technique LDA 99 39.06 SLPP 10 30.3 MFA 10 35.8 MMC 99 39.17

RICA_ICA I 10 27.7 RICA_ICA II 9 28.41

ISBN: 978-0-9891305-2-3 ©2013 SDIWC 64

Figure 4. Recognition error rate against the number of features on CMU PIE database

Table 2. Verification performance of various techniques on CMU PIE database

Method Feature dimension t

Error rate (%)

Unsupervised technique PCA 90 66 LPP 80 45.23

ICA I 190 60.1 ICA II 10 38.8

Supervised technique LDA 66 28.67 SLPP 70 33.73 MFA 100 27.64 MMC 66 35.96

RICA_ICA I 10 27.5 RICA_ICA II 9 27.7

ISBN: 978-0-9891305-2-3 ©2013 SDIWC 65

3.3 Discussions From the above experimental results, we notice that 1. Both proposed techniques, RICA_ICA I and

RICA_ICA II consistently outperforms ordinary ICA on all the tested databases. This experimental results show that the regularization in RICA could improve the performance of ICA with at least 20%. This validates the effectiveness of RICA eigenspace regularization to process the data for minimal intra-class variation, leading to better data discrimination.

2. Supervised feature extraction techniques always adopt class membership information for data learning during training phase. For instances, LDA, MFA and MMC employ this class membership information to analyse both same-class and different-class information explicitly through discriminant criterion, i.e. Fisher criterion or Maximum Margin criterion. While, SLLP utilizes class specific information to identity the true neighbourhood of a data (same class data) for disclosing intrinsic data manifold. On the other hand, the proposed techniques RICAs, which comprising RICA_ICA I and RICA_ICA II, adopt class specific information to model the intrinsic data structure based on the labelled training samples. After then, the data eigenspace is regularized for minimal within-class variation before feature extraction process. Experimental results show that the feature regularization of RICA is more effective than those discriminant functions employed in the above mentioned techniques in class discrimination application. Generally, RICA obtains better recognition performance

than other techniques, including those supervised techniques.

3. RICA obtains its good score with small

number of features in face verification. In other words, we could deduce that RICA is able to retrieve discriminating features in the lower ordered eigenvectors.

4 CONCLUSIONS In this paper, a discriminant feature extraction techniques in face verification is presented. The proposed technique is known as Regularized Independent Component Analysis (coined as RICA). The objective function of RICA is to construct an optimal projection to produce informative feature representations with minimal within-class variance from facial images. RICA adopts information of correlation coefficients between images to form a Laplacian matrix. This matrix is to retrieve local features for regularizing the data input before independent component analysis (ICA) feature extraction. Through this regularization, the input data will be processed for minimal within-class variance which benefits discrimination. RICA is implemented on the two different architectures of ICA (type I and type II) and namely as RICA_ICA I and RICA_ICA II, respectively. From the experimental results, both proposed technique demonstrate superior performance to other feature extraction techniques.

5 ACKNOWLEDGEMENT The authors acknowledge the financial support of Telekom Research and Development Sdn. Bhd. of Malaysia

ISBN: 978-0-9891305-2-3 ©2013 SDIWC 66

6 REFERENCES 1. Turk, M., Pentland, A.: Eigenfaces for recognition. J.

Cognitive Neuroscience 3(1), 71--86 (1991). 2. Belhumeur, P.N., Hespanha, J.P., Kriegman D.J.:

Eigenfaces vs. Fisherfaces: recognition using class specific linear. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 711--720 (1997).

3. Bartlett, M.S., Movellan, J.R., Sejnowski, T.J.: Face Recognition by Independent Component analysis. IEEE Transactions on Neural Networks 13(6), 1450--1464 (2002).

4. Yan, S.C., Dong, X., Zhang, B.Y., Zhang, H.J., Yang, Q., Lin, S.: Graph Embedding and Extensions: A General Framework for Dimensionality Reduction. IEEE Trans. On Pattern Analysis and Machine Intelligence 29(1), 40--51 (2007).

5. Pang, Y.H., Andrew, T.B.J, David, N.C.L: Face authentication system using pseudo Zernike moments on wavelet subband. IEICE Electronic Express 01/2004, 275--280 (2004).

6. Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Yi, M.: Robust Face Recognition via Sparse Representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(2), 210--227 (2009).

7. Yuen, P.C., Lai, J.H.: Independent Component Analysis of Face Images. IEEE Workshop on Biologically Motivated Computer Vision 2000: 1811, 545--553 (2000).

8. Jiang, X., Bappaditya, M., Kot, A.: Eigenfeature Regularization and Extraction in Face Recogntion. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(3), 383--394 (2008).

9. Lu, J., Tan, Y.: Regularized locality preserving projections and its extensions for face recognition. IEEE Trans. On System Man. And Cybernetics 40(3), 958--963 (2010).

10. Pang, Y.H., Andrew, T., Fazly S.A.: Regularized locality preserving discriminant embedding for face recognition. Neurocomputing 77(1), 156--166 (2012).

11. Phillips, P.J., Moon, H., Rauss, P.J., Rizvi, S.: The FERET evaluation methodology for face recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(10), 1090--1104 (2000).

12. Sim, T., Baker, S. and Bsat, M.: The CMU Pose, Illumination, and Expression Database. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(12), 1615--1618 (2003).

13. He, X., Yan, S., Hu, Y., Niyogi, P. and Zhang, H.: Face recognition using laplacianfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (3), 328--340 (2005).

14. Li, H., Jiang, T.: Efficient and robust feature extraction by Maximum Margin Criterion. IEEE Trans. on Neural Networks 17(1), 157--165 (2006).

ISBN: 978-0-9891305-2-3 ©2013 SDIWC 67

regularized independent component analysis in face...

Documents