logo

SCIENTIA SINICA Informationis, Volume 51 , Issue 3 : 367(2021) https://doi.org/10.1360/SSI-2020-0236

Latent regularized generative adversarial network for face spoofing detection

More info
  • ReceivedAug 4, 2020
  • AcceptedSep 21, 2020
  • PublishedFeb 25, 2021

Abstract


Funded by

国家自然科学基金(61772524,61902129,61972157)

上海市浦江人才计划(19PJ1403100)

浦东新区科技发展基金(PKJ2018-Y46)

北京市自然科学基金面上项目(4182067)

上海交通大学转化医学交叉基金(ZH2018ZDA25)


References

[1] Galbally J, Marcel S, Fierrez J. Biometric Antispoofing Methods: A Survey in Face Recognition. IEEE Access, 2014, 2: 1530-1552 CrossRef Google Scholar

[2] Pan G, Sun L, Wu Z, et al. Eyeblink-based anti-spoofing in face recognition from a generic webcamera. In: Proceedings of IEEE International Conference on Computer Vision, 2007. 1--8. Google Scholar

[3] Frischholz R W, Werner A. Avoiding replay-attacks in a face recognition systenm using head-pose estimation. In: Proceedings of IEEE International SOI Conference, 2003. 234--235. Google Scholar

[4] Alotaibi A, Mahmood A. Deep face liveness detection based on nonlinear diffusion using convolution neural network. SIViP, 2017, 11: 713-720 CrossRef Google Scholar

[5] de Freitas Pereira, Tiago, Anjos, Andre, De Martino, Jose Mario. Can face anti-spoofing countermeasures work in a real world scenario? In: Proceedings of International Conference on Biometrics, 2013. 1--8. Google Scholar

[6] Boulkenafet Z, Komulainen J, Hadid A. Face Spoofing Detection Using Colour Texture Analysis. IEEE TransInformForensic Secur, 2016, 11: 1818-1830 CrossRef Google Scholar

[7] Yang J, Lei Z, Liao S, et al. Face liveness detection with component dependent descriptor. In: Proceedings of International Conference on Biometrics, 2013. 1--6. Google Scholar

[8] Chingovska, I, Anjos, A,Marcel, S. On the effectiveness of local binary patterns in face anti-spoofing. In: Proceedings of BIOSIG-proceedings of the International Conference Of Biometrics Special Interest Group, 2012. 1--7. Google Scholar

[9] Yang J, Lei Z, Li S Z . Learn convolutional neural network for face anti-spoofing. computer science, 2014, 9218: 373-384. DOI: 10.1007/978-3-319-21963-9_34. Google Scholar

[10] Xu Z, Li S, Deng W. Learning temporal features using LSTM-CNN architecture for face anti-spoofing. In: Proceedings of IAPR Asian Conference on Pattern Recognition, 2015. 141--145. Google Scholar

[11] Lu C, Shi J, Jia J. Abnormal event detection at 150 FPS in MATLAB. In: Proceedings of the 2013 IEEE International Conference on Computer Vision, 2013. 2720--2727. Google Scholar

[12] Chong Y S, Tay Y H. Abnormal event detection in videos using spatiotemporal autoencoder. International Symposium on Neural Networks, 2017. 189-196. DOI: 10.1007/978-3-319-59081-3_23. Google Scholar

[13] Wang, L, Zhou, F, Li, Z, et al. Abnormal event detection in videos using hybrid spatio-temporal autoencoder. In: Proceedings of IEEE International Conference on Image Processing, 2018. 2276--2280. Google Scholar

[14] Baur, C, Wiestler, B, Albarqouni, et al. Deep autoencoding models for unsupervised anomaly segmentation in brain mr images. International MICCAI Brainlesion Workshop, 2018. 161-169. DOI: 10.1007/978-3-030-11723-8_16. Google Scholar

[15] Zong, B, Song,et al. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In: Proceedings of International Conference on Learning Representations, 2018. Google Scholar

[16] Zenati H, Foo C S, Lecouat B, et al. Efficient GAN-Based anomaly detection. 2018,. arXiv Google Scholar

[17] Liu R, Fusi N, Mackey L. Teacher-student compression with generative adversarial networks. 2018,. arXiv Google Scholar

[18] Akcay S, Atapour-Abarghouei A, Breckon T P . GANomaly: semi-supervised anomaly detection via adversarial training. In: Proceedings of Asian Conference on Computer Vision, 2018. 622--637. Google Scholar

[19] Goodfellow, I, Pouget-Abadie, et al. Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, 2014. 2672--2680. Google Scholar

[20] Isola, P., Zhu,et al. Image-to-image translation with conditional adversarial networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1125--1134. Google Scholar

[21] An, J,Cho, S. Variational autoencoder based anomaly detection using reconstruction probability. Special Lecture on IE, 2015, 2.1. Google Scholar

[22] Ilg E, Mayer N, Saikia T, et al. FlowNet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 2462--2470. Google Scholar

[23] Yan, Hongliang, Ding, et al. Mind the class weight bias: weighted maximum mean discrepancy for unsupervised domain adaptation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 2272--2281. Google Scholar

[24] Tan X , Li Y , Liu J , et al. Face liveness detection from a single image with sparse low rank bilinear discriminative model. In: Proceedings of European Conference on Computer Vision, 2010. 504--517. Google Scholar

[25] Chingovska, I, Anjos, et al. On the effectiveness of local binary patterns in face anti-spoofing. In: Proceedings of BIOSIG-proceedings of the international conference of biometrics special interest group, 2012. 1--7. Google Scholar

[26] Zhang Z , Yan J, Liu S, et al. A face antispoofing database with diverse attacks. In: Proceedings of IAPR international conference on Biometrics, 2012. 26--31. Google Scholar

[27] Wood S A J. Temporal coordination of articulator gestures.. J Acoust Soc Am, 1996, 99: 2546-2574 CrossRef ADS Google Scholar

[28] Liu Y, Stehouwer J, Jourabloo A, et al. Deep tree learning for zero-shot face anti-Spoofing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 4680--4689. Google Scholar

[29] Schlegl T, Seebck, Philipp, Waldstein S M, et al. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: Proceedings of International Conference on Information Processing in Medical Imaging, 2017. 146--157. Google Scholar

[30] Yu C, Jia Y . Anisotropic diffusion-based kernel matrix model for face liveness detection. 2017,. arXiv Google Scholar

[31] Kim W, Suh S, Han J J. Face Liveness Detection From a Single Image via Diffusion Speed Model. IEEE Trans Image Process, 2015, 24: 2456-2465 CrossRef ADS Google Scholar

[32] Li J, Wang Y, Tan T, et al. Live face detection based on the analysis of fourier spectra, Biometric Technology for Human Identification. In: Proceedings of SPIE, 2004. 5404: 296--303. Google Scholar

[33] MJ, Hadid A, Pietik M. Face spoofing detection from single images using micro-texture analysis. In: Proceedings of International Joint Conference on Biometrics, 2011. 1--7. Google Scholar

[34] Jourabloo A, Liu Y, Liu X. Face de-spoofing: anti-spoofing via noise modeling. In: Proceedings of European Conference on Computer Vision, 2018. 290--306. Google Scholar

[35] Boulkenafet, Zinelabidine, Komulainen, et al. Face anti-spoofing based on color texture analysis. In: Proceedings of International Conference on Image Processing, 2015. 2636--2640. Google Scholar

[36] Liu, Y, Jourabloo, et al. Learning deep models for face anti-spoofing: binary or auxiliary supervision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 389--398. Google Scholar

[37] Tu, X, Zhao,et al. Learning generalizable and identity-discriminative representations for face anti-spoofing. ACM Transactions on Intelligent Systems and Technology, 2019. DOI: 10.13140/RG.2.2.29617.84324. Google Scholar

  • Figure 1

    (Color online) Our framework consists of a generator, a discriminator, a latent regularizer and an auxiliary encoder. The generator and discriminator were trained by competing with each other while collaborating to understand the underlying concept in the normal class. The latent regularizer is further utilized to distinguish between normal samples and outliers in a discriminant way in latent feature space. The auxiliary encoder is adopted to minimize the distance between the bottleneck features of original input image and encoded latent feature of the generated image

  • Figure 2

    (Color online) The first two rows display the live face examples with optical flow data visualization. The third and fourth rowsdisplay the fixed support attack face examples by using a stand to hold the client biometry with optical flow data visualization.The last two rows display the hand attack face examples by the attacker holding the device with optical flow data visualization.For each row, the first image is one of the frames from the video, followed by seven optical flow maps which were generatedfrom these frames

  • Figure 3

    (Color online) The experiments are conducted on (a) CIFAR10 and (b) MNIST, one class is considered as normal sample and the others are considered as abnormal sample

  • Figure 4

    (Color online) Overall performance of the model based on varying size of the latent vector in (a) CIFAR10 and (b) MNIST dataset

  • Figure 5

    (Color online) Proposed method learned features in 3-dimensional space in cross domain experiment between CASIA MFSD and REPLAY ATTACK datasets. (a) Model is trained in CASIA MFSD dataset and tested in REPLAY ATTACK dataset without outliers constraint. (b) Model is trained in CASIA MFSD dataset and tested in REPLAY ATTACK dataset with outliers constraint

  • Figure 6

    (Color online) Proposed method learned features in 3-dimensional space in cross domain experiment between CASIA MFSD and REPLAY ATTACK datasets. (a) Model is trained in REPLAY ATTACK dataset and tested in CASIA MFSD dataset without outliers constraint. (b) Model is trained in REPLAY ATTACK dataset and tested in CASIA MFSD dataset with outliers constraint

  • Figure 7

    (Color online) The distribution of the abnormal scores forboth normal samples (live face) and abnormal samples (hand attack or fixed support attack) in the trained model withoutlier constraint (a) and without outlier constraint (b). The horizontal axis denotes the abnormal scores of each video sample. The vertical axis denotes the frequency of each abnormal score

  • Table 1   A summary of three spoof face datasets
    DatabaseSubject VideoCameraSpoof attackModal type
    NUAA15
    24 genuine
    33 spoof
    Web-cam (640 $\times$ 480)Printed photoRGB
    REPLAY ATTACK
    50
    200 genuine
    1000 spoof
    MacBook 13$^{\prime\prime}$ camera (320 $\times$ 240)
    Printed photo
    Display photo (mobile/HD)
    Replayed video (mobile/HD)
    RGB
    CASIA MFSD50
    150 genuine
    450 spoof
    Low-quality camera (640 $\times$ 480)
    Normal-quality camera (480 $\times$ 640)
    Sony NEX-5 camera (1280 $\times$ 720)
    Printed photo
    Cut photo
    Replayed video (HD)
    RGB
  • Table 2   The effect of different loss compositions is evaluated in intra dataset experiment
    Loss compositionHTER (%) in intra-dataset experiment
    NUAA
    REPLAY ATTACK
    $\mathcal{L}_{\rm~irec}$+$\mathcal{L}_{\rm~a~d~v}$27.738.1
    $\mathcal{L}_{\rm~irec}$+$\mathcal{L}_{\rm~a~d~v}$+$\mathcal{L}_{\rm~zrec}$19.910.5
    $\mathcal{L}_{\rm~irec}$+$\mathcal{L}_{\rm~a~d~v}$+$\mathcal{L}_{\rm~ou}$010.2
    $\mathcal{L}_{\rm~irec}$+$\mathcal{L}_{\rm~a~d~v}$+$\mathcal{L}_{\rm~zrec}$+$\mathcal{L}_{\rm~ou}$010.1
  • Table 3   For cross domain experiment, four loss compositionsare optimized in CASIA MFSD and evaluated in REPLAY ATTACK, and vice versa
    HTER (%)
    Loss compositionTrainTestTrainTest
    REPLAY ATTACK
    CASIA MFSD
    CASIA MFSD
    REPLAY ATTACK
    $\mathcal{L}_{\rm~irec}$+$\mathcal{L}_{a~d~v}$44.831.5
    $\mathcal{L}_{\rm~irec}$+$\mathcal{L}_{\rm~a~d~v}$+$\mathcal{L}_{\rm~zrec}$39.825.0
    $\mathcal{L}_{\rm~irec}$+$\mathcal{L}_{\rm~a~d~v}$+$\mathcal{L}_{\rm~ou}$37.720.1
    ${L}_{\rm~irec}$+$\mathcal{L}_{\rm~a~d~v}$+$\mathcal{L}_{\rm~zrec}$+$\mathcal{L}_{\rm~ou}$35.313.5
  • Table 4   Performance comparison using accuracy measure on the NUAA dataset with other methods (%)
    MethodAccuracy
    Ours 99.3
    ADKMM (2017) [30] 99.3
    ND-CNN (2016) [4] 99.3
    DS-LSP (2015) [31] 98.5
    CDD (2013) [7] 97.7
    DoG-LRBLR (2010) [24] 87.5
    DoG-F (2004) [32] 84.5
    DoG-M (2012) [26] 81.8
  • Table 5   Performance comparison using HTER measure on the REPLAY ATTACK dataset with other methods (%)
    MethodDevelopment Test
    ${\rm~LBP}_{3~\times~3}^{u~2}+x^{2}$ (2012) [8] 31.24 34.01
    ${\rm~LBP}_{3~\times~3}^{u~2}+{\rm~LDA}$ (2012) [8] 19.60 17.17
    $ {\rm~LBP}_{3~\times~3}^{u~2}$ (2012) [8] 14.84 15.16
    LBP+SVM (2011) [33] 13.90 13.87
    DS-LBP (2015) [31] 13.73 12.50
    ND-CNN (2016) [4] 10
    ADKMM (2017) [30] 5.16 4.30
    Ours11.5 10.1
  • Table 6   Classification performance of the proposed approach in terms of HTER (%). The algorithm is trained using the CASIA MFSD dataset and tested on the REPLAY ATTACK dataset, and vice versa
    Method TrainTestTrainTestAverage
    CASIA MFSD
    REPLAY ATTACK
    REPLAY ATTACK
    CASIA MFSD
    LBP (2013) [5] 47.039.643.3
    LBP-TOP (2013) [5] 49.760.655.2
    Motion (2013) [5] 50.247.949.1
    CNN (2014) [9] 48.545.547.0
    Color LBP (2018) [35] 37.935.436.7
    Color Tex (2018) [35] 30.337.734.0
    Color SURF (2018) [35] 26.923.225.1
    Auxiliary (2018) [36] 27.628.428.0
    De-Spoof (2018) [34] 28.541.134.8
    GFA-CNN (2019) [37] 21.434.328.0
    Proposed method 13.535.324.4