logo

SCIENTIA SINICA Informationis, Volume 51 , Issue 6 : 900(2021) https://doi.org/10.1360/SSI-2019-0145

Towards training time attacks for federated machine learning systems

More info
  • ReceivedJul 8, 2019
  • AcceptedAug 2, 2019
  • PublishedMay 26, 2021

Abstract


Author information






References

[1] Yang Q, Liu Y, Chen T. Federated Machine Learning. ACM Trans Intell Syst Technol, 2019, 10: 1-19 CrossRef Google Scholar

[2] Konečn`y J, McMahan H B, Yu F X, et al. Federated learning: strategies for improving communication efficiency. 2016,. arXiv Google Scholar

[3] Konečn`y J, McMahan H B, Ramage D, et al. Federated optimization: distributed machine learning for on-device intelligence. 2016,. arXiv Google Scholar

[4] Laskov P, Lippmann R. Machine Learning in Adversarial Environments. Berlin: Springer, 2010. Google Scholar

[5] Mei S K, Zhu X J. Using machine teaching to identify optimal training-set attacks on machine learners. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence, 2015. 2871--2877. Google Scholar

[6] Biggio B, Roli F. Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recognition, 2018, 84: 317-331 CrossRef Google Scholar

[7] Feng J, Cai Q Z, Zhou Z H. Learning to confuse: generating training time adversarial data with auto-encoder. 2019,. arXiv Google Scholar

[8] Li M, Wang W, Zhou Z H. Exploiting remote learners in Internet environment with agents. Sci China Inf Sci, 2010, 53: 64-76 CrossRef Google Scholar

[9] Zhou Z H. Learnware: on the future of machine learning. Front Comput Sci, 2016, 10: 589-590 CrossRef Google Scholar

[10] Smith V, Chiang C K, Sanjabi M, et al. Federated multi-task learning. In: Proceedings of Neural Information Processing Systems (NIPS), 2017. 4424--4434. Google Scholar

[11] Ching T, Himmelstein D S, Beaulieu-Jones B K. Opportunities and obstacles for deep learning in biology and medicine.. J R Soc Interface, 2018, 15: 20170387 CrossRef PubMed Google Scholar

[12] Su L L, Xu J M. Securing distributed gradient descent in high dimensional statistical learning. In: Proceedings of ACM on Measurement and Analysis of Computing Systems, 2019. Google Scholar

[13] Ateniese G, Felici G, Mancini L V, et al. Hacking smart machines with smarter ones: how to extract meaningful data from machine learning classifiers. 2013,. arXiv Google Scholar

[14] Fredrikson M, Jha S, Ristenpart T. Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, 2015. 1322--1333. Google Scholar

[15] Shokri R, Stronati M, Song C Z, et al. Membership inference attacks against machine learning models. In: Proceedings of IEEE Symposium on Security and Privacy (SP), 2017. 3--18. Google Scholar

[16] Hitaj B, Ateniese G, Pérez-Cruz F. Deep models under the GAN: information leakage from collaborative deep learning. In: Proceedings of ACM SIGSAC Conference on Computer and Communications Security, 2017. 603--618. Google Scholar

[17] Phong L T, Aono Y, Hayashi T. Privacy-Preserving Deep Learning via Additively Homomorphic Encryption. IEEE TransInformForensic Secur, 2018, 13: 1333-1345 CrossRef Google Scholar

[18] Mohassel P, Zhang Y P. Secureml: a system for scalable privacy-preserving machine learning. In: Proceedings of IEEE Symposium on Security and Privacy, 2017. 19--38. Google Scholar

[19] Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks. 2013,. arXiv Google Scholar

[20] Koh P W, Liang P. Understanding black-box predictions via influence functions. In: Proceedings of the 34th International Conference on Machine Learning, 2017. 1885--1894. Google Scholar

[21] Mnih V, Kavukcuoglu K, Silver D. Human-level control through deep reinforcement learning.. Nature, 2015, 518: 529-533 CrossRef PubMed Google Scholar

[22] Vapnik V. Principles of risk minimization for learning theory. In: Proceedings of Neural Information Processing Systems (NIPS), 1992. 831--838. Google Scholar

[23] Bottou L. Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT, 2010. 177--186. Google Scholar

[24] Goodfellow I J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. 2015,. arXiv Google Scholar

[25] Lecun Y, Bottou L, Bengio Y. Gradient-based learning applied to document recognition. Proc IEEE, 1998, 86: 2278-2324 CrossRef Google Scholar

[26] Krizhevsky A. Learning Multiple Layers of Features From Tiny Images. Technical Report TR-2009, 2009. Google Scholar

[27] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770--778. Google Scholar

[28] Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer-assisted Intervention, 2015. 234--241. Google Scholar

[29] Kingma D P, Ba J. Adam: amethod for stochastic optimization. In: Proceedings of the 3rd International Conference for Learning Representations, 2014. Google Scholar

[30] Zhang C Y, Bengio S, Hardt M, et al. Understanding deep learning requires rethinking generalization. In: Proceedings of the 5th International Conference on Learning Representations, 2016. Google Scholar

[31] Pearson K. On lines and planes of closest fit to systems of points in space. London Edinburgh Dublin Philos Mag J Sci, 1901, 2: 559-572 CrossRef Google Scholar

  • Figure 1

    (Color online) First row: original training samples. Second row: adversarial training samples. (a) MNIST ($\epsilon$ = 0.3); (b) CIFAR-10 ($\epsilon$ = 0.032)

  • Figure 1

    (Color online) First row: original training samples. Second row: adversarial training samples. (a) MNIST ($\epsilon$ = 0.3); (b) CIFAR-10 ($\epsilon$ = 0.032)

  • Figure 2

    Test performance when different number of data parties are attacked. The horizontal red line indicates random guess accuracy. Different colors of histograms indicate the test performance of the federated model when the training data of 1 to 4 parties is adversarially modified. (a) MNIST; (b) CIFAR-10

  • Figure 2

    Test performance when different number of data parties are attacked. The horizontal red line indicates random guess accuracy. Different colors of histograms indicate the test performance of the federated model when the training data of 1 to 4 parties is adversarially modified. (a) MNIST; (b) CIFAR-10

  • Figure 3

    Test performance when using different classifiers under two-party learning scenario. The horizontal red line indicates random guess accuracy, blue histograms indicate the original test accuracy, and the orange and green ones indicate the test accuracy when one or two party's training data is attacked. (a) MNIST; (b) CIFAR-10

  • Figure 3

    Test performance when using different classifiers under two-party learning scenario. The horizontal red line indicates random guess accuracy, blue histograms indicate the original test accuracy, and the orange and green ones indicate the test accuracy when one or two party's training data is attacked. (a) MNIST; (b) CIFAR-10

  • Figure 4

    (a) MNIST-Train; (b) MNIST-Test; (c) CIFAR-Train; (d) CIFAR-Test. (a) and (c) represent deep features of the adversarial training data. (b) and (d) represent deep features of the cleaned test data

  • Figure 4

    (a) MNIST-Train; (b) MNIST-Test; (c) CIFAR-Train; (d) CIFAR-Test. (a) and (c) represent deep features of the adversarial training data. (b) and (d) represent deep features of the cleaned test data

  • Figure 5

    Clean samples and their corresponding adversarial noises for MNIST and CIFAR-10

  • Figure 5

    Clean samples and their corresponding adversarial noises for MNIST and CIFAR-10

  •   

    Algorithm 1 Fed-DeepConfuse

    Require:List of party can be accessed by the attacker $L_{\rm~atk}$, training data $\{D^1,~D^2,~\ldots,~D^n\}$, number of trials $T$, max iteration for training a classification model maxiter, learning rate of classification model $\alpha_{f^i}$, learning rate of the noise generator $\alpha_{g^i}$, batch size $b$;

    Output:Modified datasets $\{D^1,~D^2,~\ldots,~D^n\}$.

    for $k=1$ series to $n$

    if $k~\in~L_{\rm~atk}$ then

    $\xi~\leftarrow~{\rm~RandomInit}()$;

    ${g'}^k_\xi~\leftarrow~~g^k_\xi.{\rm~copy}()$;

    for $t=1$ series to $T$

    $\theta_0~\leftarrow~{\rm~RandomInit}()$;

    for $i=0$ series to maxiter

    $(x_i,y_i)\sim~D^k$; //Sample a mini-batch of training data

    $\theta'\leftarrow\theta_i-\alpha_{f^k}\nabla_{\theta_i}\mathcal{L}(f^k_{\theta_i}(x_i~+~{g'}^{k}_\xi(x_i)),y_i)$;

    $\xi'~\leftarrow~\xi'~+~\alpha_{g^k}\nabla_{\xi'}\mathcal{L}(f^k_{\theta'}(x),~y)$; //Update ${g'}^k_\xi$ using current $f^k_\theta$

    $x^{\rm~adv}_i~\leftarrow~x_i~+~g^k_\xi(x_i)$;

    $\theta_{i+1}\leftarrow\theta_{i}-\alpha_{f^k}\nabla_{\theta_{i}}\mathcal{L}(f^k_{\theta_{i}}(x_i^{\rm~adv}),y_i)$; //Update $f^k_\theta$ by SGD

    end for

    $g^k_\xi~\leftarrow~{g'}^k_\xi$;

    end for

    $m\leftarrow~\text{len}(D^k)$;

    for $i=1$ series to $m$

    $(x_i,~y_i)~\leftarrow~D^k[i]$;

    $x'_i~\leftarrow~x_i~+~g^k_\xi(x_i)$;

    $D^k[i]~\leftarrow~(x_i,~y_i)$;

    end for

    end if

    end for

    return $\{D^1,~D^2,~\ldots,~D^n\}$.

  •   

    Algorithm 1 Fed-DeepConfuse

    Require:List of party can be accessed by the attacker $L_{\rm~atk}$, training data $\{D^1,~D^2,~\ldots,~D^n\}$, number of trials $T$, max iteration for training a classification model maxiter, learning rate of classification model $\alpha_{f^i}$, learning rate of the noise generator $\alpha_{g^i}$, batch size $b$;

    Output:Modified datasets $\{D^1,~D^2,~\ldots,~D^n\}$.

    for $k=1$ series to $n$

    if $k~\in~L_{\rm~atk}$ then

    $\xi~\leftarrow~{\rm~RandomInit}()$;

    ${g'}^k_\xi~\leftarrow~~g^k_\xi.{\rm~copy}()$;

    for $t=1$ series to $T$

    $\theta_0~\leftarrow~{\rm~RandomInit}()$;

    for $i=0$ series to maxiter

    $(x_i,y_i)\sim~D^k$; //Sample a mini-batch of training data

    $\theta'\leftarrow\theta_i-\alpha_{f^k}\nabla_{\theta_i}\mathcal{L}(f^k_{\theta_i}(x_i~+~{g'}^{k}_\xi(x_i)),y_i)$;

    $\xi'~\leftarrow~\xi'~+~\alpha_{g^k}\nabla_{\xi'}\mathcal{L}(f^k_{\theta'}(x),~y)$; //Update ${g'}^k_\xi$ using current $f^k_\theta$

    $x^{\rm~adv}_i~\leftarrow~x_i~+~g^k_\xi(x_i)$;

    $\theta_{i+1}\leftarrow\theta_{i}-\alpha_{f^k}\nabla_{\theta_{i}}\mathcal{L}(f^k_{\theta_{i}}(x_i^{\rm~adv}),y_i)$; //Update $f^k_\theta$ by SGD

    end for

    $g^k_\xi~\leftarrow~{g'}^k_\xi$;

    end for

    $m\leftarrow~\text{len}(D^k)$;

    for $i=1$ series to $m$

    $(x_i,~y_i)~\leftarrow~D^k[i]$;

    $x'_i~\leftarrow~x_i~+~g^k_\xi(x_i)$;

    $D^k[i]~\leftarrow~(x_i,~y_i)$;

    end for

    end if

    end for

    return $\{D^1,~D^2,~\ldots,~D^n\}$.

  • Table 1   Experimental configurations under different federated situations
    MNISTCIFAR-10
    Two-party learningParty A 0, 1, 2, 3, 4, 5 Airplane, automobile, bird, cat, deer, dog, frog, horse
    Party B 4, 5, 6, 7, 8, 9 Bird, cat, deer, dog, frog, horse,ship, truck
    Three-party learningParty A 0, 1, 2, 3, 4 Airplane, automobile, bird, cat, deer
    Party B 3, 4, 5, 6, 7 Cat, deer, dog, frog, horse
    Party C 5, 6, 7, 8, 9 Dog, frog, horse, ship, truck
    Four-party learningParty A 0, 1, 2, 3 Airplane, automobile, bird, cat
    Party B 2, 3, 4, 5 Bird, cat, deer, dog
    Party C 4, 5, 6, 7 Dog, frog, horse,ship
    Party D 6, 7, 8, 9 Frog, horse, ship, truck
  • Table 2   Prediction accuracy of the federated model taking only noises as inputs That is, the accuracy between the true label and $f_{\theta}(g_{\xi}(x))$ where $x$ is the clean sample
    $\text{Noise}_\text{{train}}$ (%) $\text{Noise}_\text{{test}}$ (%)
    MNIST 66.78 66.05
    CIFAR-10 44.60 45.91
  • Table 3   Comparison of federated learning scenario and standard learning scenario when half of the training data is adverbially modified
    Clean50% Attacks
    Standard 2-party $\delta$-accuracy Standard 2-party $\delta$-accuracy
    learning fed-learning loss learning fed-learning loss
    MNIST 99.32 98.38 0.94 98.01 67.69 30.32
    CIFAR 93.01 70.36 22.65 91.36 51.37 39.99
qqqq

Contact and support