logo

SCIENTIA SINICA Informationis, Volume 50 , Issue 10 : 1511(2020) https://doi.org/10.1360/N112018-00214

Distributed logistic regression with differential privacy

More info
  • ReceivedAug 6, 2018
  • AcceptedMar 19, 2019
  • PublishedOct 9, 2020

Abstract


Funded by

国家自然科学基金(11571011)

NSFC -广东省大数据科学研究中心项目(U1811461)


Supplement

Appendix

附录

ADMM 算法

考虑下述优化问题: $$\underset{Z, w}{\min} f(w)+h(Z) {\rm s.t.} AZ+Bw=0.$$ 首先, 我们通过引入参数将有约问题转化为无约束优化: $$ L(Z,w,V)=f(w)+h(Z)+\langle V, AZ+Bw \rangle +\frac{c}{2}\|AZ+Bw\|_2^2,$$ 其中$V$为Lagrange乘子, $c>0$ 为预选罚参数. 下面我们借助于ADMM算法求解上述问题.

ADMM算法的每一次迭代包括下面3个步骤.

步骤 1: 更新$Z$, A1$$Z(k+1)=\underset{Z}{\arg\min} L(Z,w(k),V(k)).\eqno({\rm A1})$$

步骤 2: 更新$w$, A2$$w(k+1)=\underset{w}{\arg\min} L(Z(k+1),w,V(k)).\eqno({\rm A2})$$

步骤 3: 更新Lagrange乘子 $V$, A3$$V(k+1)= V(k)+c(AZ(k+1)+Bw(k+1)).\eqno({\rm A3})$$

具体的, 在第$k+1$次迭代过程中, 我们首先关于$Z$最小化$L(Z,w(k),V(k))$,给出$Z(k+1)$的更新解, 其次关于$w$最小化$L(Z(k+1),w,V(k))$ ,给出$w(k+1)$ 最后通过$V(k+1)=V(k)+c(AZ(k+1)+Bw(k+1))$ 更新$V(k+1)$. 重复上述过程,直至算法收敛.

定理2的证明

在分析ADMM算法的收敛性之前,我们给出一些记号. 令$M$表示$A^{\rm~T}A$的最小严格正特征值, $M'$表示$B^{\rm~T}B$的最小严格正特征值. 显然$f(w)$是李普希茨(Lipschitz)可微的. 对于任意$j\in~\mathcal{J}$, 令 $$h(Z_j)=\frac{\lambda}{2J}\|Z_j\|_2^2,$$ 那么存在$~\beta~\geq~0,~$ 对于任意$~Z_j,Z_j'~,$ A4$$ h(Z_j)+\frac{\beta}{2}\|Z_j-Z_j'\|_2^2 \geq h(Z_j')+\langle \nabla h(Z_j'), Z_j-Z_j' \rangle \eqno({\rm A4})$$ 成立. 为了证明定理2, 我们给出下面3个引理.

引理A1 对于所有$k$, 下述命题成立:

(1) $\nabla~f(w(k+1))=-B^{\rm~T}V(k+1)~$;

(2) $\|Z(k)-Z(k+1)\|_2^2\leq~\frac{1}{M}\|AZ(k)-AZ(k+1)\|_2^2$;

(3) $\|w(k)-w(k+1)\|_2^2\leq~\frac{1}{M'}\|Bw(k)-Bw(k+1)\|_2^2$;

(4) $\|V(k)-V(k+1)\|_2^2\leq\frac{L_f^2}{M'}\|w(k)-w(k+1)\|_2^2.$

Proof. 由$w(k+1)$的最优性条件: $$0=\nabla f(w(k+1))+B^{\rm T}V(k)+cB^{\rm T}(AZ(k+1)+Bw(k+1)),$$ 以及 $$V(k+1)=V(k)+c(AZ(k+1)+Bw(k+1)),$$ 可知(1)成立. 注意到$M$和$M'$分别为$A^{\rm~T}A$和 $B^{\rm~T}B$的最小严格正特征值,那么(2)和(3)显然成立. 由于 \begin{equation}\begin{split} &\|V(k)-V(k+1)\|_2^2\leq \frac{1}{M'}\|B^{\rm T}(V(k)-V(k+1))\|_2^2 \\ &\overset{\rm (a)}{=}\frac{1}{M'}\|\nabla f(w(k))-\nabla f(w(k+1))\|_2^2 \overset{\rm (b)}{\leq} \frac{L_f^2}{M'}\|w(k)-w(k+1)\|_2^2, \end{split} \end{equation} 其中(a)我们利用了等式(1), (b)利用了条件$f$是李普希茨可微的.那么(4)成立.

引理A2 若 $c>~\max~\{\frac{2L_{f}}{M'},~L_{f}M',\frac{\beta}{M}\}$, 那么当$k\rightarrow~\infty$时$L(Z^{k+1},w^{k+1},V^{k+1})$收敛.

Proof. 首先, 我们证明当预选参数$c\max~\{\frac{2L_{f}}{M'},~L_{f}M',\frac{\beta}{M}\}$时, 随着ADMM算法的迭代, $L(Z,w,V)$是单调递减的. 考虑 \begin{equation}\begin{split} &L(Z(k),w(k), V(k))-L(Z(k+1),w(k+1), V(k+1)) \\ &=(L(Z(k),w(k), V(k))-L(Z(k+1),w(k), V(k)))+(L(Z(k+1),w(k), V(k))-L(Z(k+1),w(k+1), V(k+1))). \end{split} \end{equation} 讨论上式第1部分, \begin{equation}\begin{split} &L(Z(k),w(k),V(k))-L(Z(k+1),w(k),V(k)) \\ &= h(Z(k))-h(Z(k+1))+\langle V(k), AZ(k)-AZ(k+1) \rangle +\frac{c}{2}\|AZ(k)+Bw(k)\|_2^2-\frac{c}{2}\|AZ(k+1)+Bw(k)\|_2^2 \\ &\overset{\rm (c)}{=}h(Z(k))-h(Z(k+1))+\langle A^{\rm T}V(k), Z(k)-Z(k+1)\rangle \\ & +\langle cA^{\rm T}(AZ(k+1)+Bw(k)), Z(k)-Z(k+1) \rangle +\frac{c}{2}\|AZ(k)-AZ(k+1)\|_2^2 \\ &\overset{\rm (d)}{=}h(Z(k))-h(Z(k+1))- \langle \partial h(Z(k+1)), Z(k)-Z(k+1) \rangle + \frac{c}{2}\|AZ(k)-AZ(k+1)\|_2^2 \\ &\overset{\rm (e)}{\geq} -\frac{\beta}{2}\|Z(k)-Z(k+1)\|_2^2+\frac{cM}{2}\|Z(k)-Z(k+1)\|_2^2 \\ &= \frac{cM-\beta}{2}\|Z(k)-Z(k+1)\|_2^2, \end{split} \end{equation} 其中, (c)中我们利用等式: $\|b+c\|_2^2-\|a+c\|_2^2=\|b-a\|_2^2+2\langle~a+c,b-a\rangle$,并令 $a=AZ(k+1)$, $b=AZ(k)$, $c=Bw(k)$; (d)中利用 $\nabla~h(Z(k+1))~=~-(A^{\rm~T}V(k)+~cA^{\rm~T}(AZ(k+1)+Bw(k)))$; (e)中利用引理A1的(2) 以及不等式(A4). 讨论第2部分, \begin{equation}\begin{split} &L(Z(k+1),w(k), V(k))-L(Z(k+1),w(k+1), V(k+1)) \\ &=f(w(k))-f(w(k+1))+\langle V(k),AZ(k+1)+Bw(k) \rangle -\langle V(k+1),AZ(k+1)+Bw(k+1) \rangle \\ & +\frac{c}{2}\|AZ(k+1)+Bw(k)\|_2^2-\frac{c}{2}\|AZ(k+1)+Bw(k+1)\|_2^2 \\ &=f(w(k))-f(w(k+1))+\langle B^{\rm T}V(k+1),w(k)-w(k+1) \rangle -\langle c(AZ(k+1)+Bw(k+1)), AZ(k+1)+Bw(k) \rangle \\ & +\frac{c}{2}\langle AZ(k+1)+Bw(k), AZ(k+1)+Bw(k) \rangle -\frac{c}{2}\|AZ(k+1)+Bw(k+1)\|_2^2 \\ &\overset{\rm (f)}{\geq} -\frac{L_f}{2}\|w(k)-w(k+1)\|_2^2+\frac{c}{2}\|Bw(k)-Bw(k+1)\|_2^2 -\frac{1}{c}\|V(k)-V(k+1)\|_2^2 \\ &\overset{\rm (g)}{\geq} \left(\frac{cM'-L_f}{2}-\frac{L_f^2}{cM'}\right)\|w(k)-w(k+1)\|_2^2, \end{split} \end{equation} 其中, (f)中利用引理A1的(1)以及 $f$是李普希茨可微的; (g)中利用引理A1的(3)和(4). 结合上述两个不等式, ADMM算法的迭代满足 \begin{equation}\begin{split} &L(Z(k),w(k),V(k))-L(Z(k+1),w(k+1),V(k+1)) \\ &\geq \left(\frac{cM-\beta}{2}\right)\|Z(k)-Z(k+1)\|_2^2 + \left(\frac{cM'-L_f}{2}-\frac{L_f^2}{cM'}\right)\|w(k)-w(k+1)\|_2^2, \end{split} \end{equation} 若$c~>~\max~\{\frac{2L_{f}}{M'},\frac{\beta}{M}\}~$, 那么随着ADMM的迭代$L(Z,w,~V)$单调递减. 下面我们证明对任意$k$, 二次增广Lagrange函数有下界. 注意到对任意$Z(k+1)$, 存在$w'$使得$AZ(k+1)+Bw'=0$, 则二次增广Lagrange函数可表示为 \begin{equation}\begin{split} &L(Z(k+1),w(k+1), V(k+1)) \\ &=h(Z(k+1))+f(w(k+1))+\langle V(k+1), AZ(k+1)+Bw(k+1) \rangle +\frac{c}{2}\|AZ(k+1)+Bw(k+1)\|_2^2 \\ &=h(Z(k+1))+f(w(k+1)) +\langle V^{k+1}, AZ(k+1)+Bw'+Bw(k+1)-Bw' \rangle+\frac{c}{2}\|AZ(k+1)+Bw(k+1)\|_2^2 \\ &=h(Z(k+1))+f(w(k+1))+\langle \nabla f(w(k+1)), w'-w(k+1) \rangle +\frac{c}{2}\|AZ(k+1)+Bw(k+1)\|_2^2 \\ &\geq h(Z(k+1))+f(w') -\frac{L_f}{2}\|w'-w(k+1)\|_2^2 +\frac{c}{2}\|AZ(k+1)+Bw(k+1)\|_2^2 \\ &\geq h(Z(k+1))+f(w') -\frac{L_fM'}{2}\|Bw'-Bw(k+1)-AZ(k+1)-Bw'\|_2^2 +\frac{c}{2}\|AZ(k+1)+Bw(k+1)\|_2^2 \\ &\overset{\rm (h)}\geq h(Z(k+1))+f(w')+\frac{c-L_fM'}{2}\|AZ(k+1)+Bw(k+1)\|_2^2 \\ &\geq h(Z(k+1))+f(w')\geq 0, \end{split} \end{equation} 其中, 注意到假设$c>L_fM'$, (h)成立. 那么 $L(Z(k+1),w(k+1),V(k+1))$有下界. 综上所述, 当$k\rightarrow~\infty$时, $L(Z(k+1),w(k+1),V(k+1))$收敛.

最后, 我们给出一个梯度的性质.

引理A3 存在$d^{k+1}\in~\nabla~L(Z(k+1),w(k+1),V(k+1))$, 使得当$k\rightarrow~\infty~$时, $\|d(k+1)\|_2^2~\rightarrow~0$.

Proof. 注意到$\nabla~L(Z(k+1),w(k+1),V(k+1))~=(\nabla_Z~L~,~\nabla_w~L~,~\nabla_V~L)$. 因为$\nabla_V~L~=~AZ(k+1)+Bw(k+1)=\frac{1}{c}(V(k+1)-V(k)),$ 有$\|\nabla_V~L\|_2^2~\leq~\frac{L_f^2}{c^2M'}\|w(k)-w(k+1)\|_2^2$. 并且 \begin{equation}\begin{split} \nabla_w L &= \nabla f(w(k+1))+B^{\rm T}V(k+1)+cB^{\rm T}(AZ(k+1)+Bw(k+1)) \\ &=B^{\rm T}(V(k+1)-V(k)). \end{split} \end{equation} 那么 $\|\nabla_w~L\|_2^2=\|B^{\rm~T}(V(k+1)-V(k))\|_2^2.$ 为了给出$\|\nabla_Z~L~\|_2^2$的界, 注意到 \begin{equation}\begin{split} \nabla_Z L&=\nabla_Z h(Z(k+1))+ A^{\rm T}V(k+1)+cA^{\rm T}(AZ(k+1)+Bw(k+1)) \\ & \overset{\rm (I)}{=}A^{\rm T}(V(k+1)-V(k))+cA^{\rm T}(Bw(k+1)-Bw(k)), \end{split} \end{equation} 其中, (I)利用$Z(k+1)$的最优化条件. 那么 $\|\nabla_Z~L~\|_2^2~\leq~\|A^{\rm~T}(V(k+1)-V(k))\|_2^2+\|cA^{\rm~T}(Bw(k+1)-Bw(k))\|_2^2$. 由于当 $k\rightarrow~\infty~$时, $L(Z(k),w(k),V(k))-L(Z(k+1),w(k+1),V(k+1))\rightarrow~0$,结合引理A2, 有 $\|Z(k)-Z(k+1)\|_2^2\rightarrow~0,$ $\|w(k)-w(k+1)\|_2^2~\rightarrow~0,$ 和 $\|V(k)-V(k+1)\|_2^2\rightarrow~0$. 那么当$k\rightarrow~\infty$时, $\|d(k+1)\|_2^2~\rightarrow~0$. 引理得证.

上述3个引理直接可推出定理2成立.

分布式$K$折交叉验证

我们用适合分布式的$K$折交叉验证 来选择调控参数 $\lambda$. 假设$\lambda\in\Lambda~=\{\lambda_1,\ \lambda_2,~\ldots,~\lambda_L\}$, 且$\lambda_1>\lambda_2>\cdots>\lambda_L$. 首先将全部数据随机分为$K$部分, 记为$\{\tilde{X}_i,\tilde{Y}_i\}_{i=1}^K$, 那么对于第$~j$ 台计算机来说, 它的数据可 根据第$k$部分数据分为两部分: $\{X_j^k,Y_j^k\}=\{X_j,Y_j\}\cap \{\tilde{X}_k,\tilde{Y}_k\}$以及 $\{X_j^{(-k)},Y_j^{(-k)}\}=\{X_j,Y_j\}\setminus \{\tilde{X}_k,\tilde{Y}_k\}$. 即既属于第$j$台计算机又属于第$k$部分的数据和属于第$j$台计算机但不属于第$k$部分的数据. 用数据集$\{X_j^{(-k)},Y_j^{(-k)}\}$做训练集, 对于不同的$\lambda_i$ 运行分布式算法, 得到$\{w_j^{(-k)}(\lambda_i)\}_{i=1}^L$. 用$\{X_j^k,Y_j^k\}$做测试集计算均方误差$~e_j^k(\lambda_{i})$. 对于每台计算机以及每一部分数据重复上述步骤, 并计算每一个$\lambda_i$的总预测误差$e(\lambda_i)=\sum_{j=1}^J\sum_{k=1}^K e_j^k(\lambda_i)$. 最后, 选择预测误差最小的$\lambda_{\rm~dcv}={\arg\min}_{\lambda_i} \{e(\lambda_i)\}_{i=1}^L\}$作为调控参数.


References

[1] Sweeney L. k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY. Int J Unc Fuzz Knowl Based Syst, 2002, 10: 557-570 CrossRef Google Scholar

[2] Sweeney L. Achieving k -anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzz Knowl Based Syst, 2008, 10: 571--588. Google Scholar

[3] Machanavajjhala A, Gehrke J, Kifer D. l-diversity: privacy beyond k-anonymity. In: Proceedings of IEEE 22nd International Conference on Data Engineering, Atlanta, 2006. 24. Google Scholar

[4] Li N, Li T, Venkatasubramanian S. t-closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of IEEE 23rd International Conference on Data Engineering, Istanbul, 2007. 106--115. Google Scholar

[5] Dwork C. Differential Privacy. In: Proceedings of International Colloquium on Automata, Languages and Programming, Venice, 2006. 1--12. Google Scholar

[6] Dwork C, Roth A. The Algorithmic Foundations of Differential Privacy. FNT Theor Comput Sci, 2014, 9: 211-407 CrossRef Google Scholar

[7] Zhou S G, Li F, Tao Y F. Privacy preservation in database applications: a survey. Chin J Comput, 2009, 32: 847-861 CrossRef Google Scholar

[8] Xiong P, Zhu T Q, Wang X F. A survey on differential privacy and applications. Chin J Comput, 2014, 37: 101--122. Google Scholar

[9] Dwork C, Mcsherry F, Nissim K. Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Conference on Theory of Cryptography, New York, 2006. 265--284. Google Scholar

[10] Mcsherry F, Talwar K. Mechanism design via differential privacy. In: Proceedings of the 48th Annual Symposium on Foundations of Computer Science, Providence, 2007. 94--103. Google Scholar

[11] Xiao X, Wang G, Gehrke J. Differential Privacy via Wavelet Transforms. IEEE Trans Knowl Data Eng, 2011, 23: 1200-1214 CrossRef Google Scholar

[12] Fan L Y, Xiong L. An Adaptive Approach to Real-Time Aggregate Monitoring With Differential Privacy. IEEE Trans Knowl Data Eng, 2014, 26: 2094-2106 CrossRef Google Scholar

[13] Wu Y J, Ge C, Zhang L Q. An algorithm for differential privacy streaming data publication based on matrix mechanism under exponential decay mode. Sci Sin-Inf, 2017, 47: 1493-1509 CrossRef Google Scholar

[14] Zhang X J, Meng X F. Differential privacy in data publication and analysis. Chin J Comput, 2014, 37: 927--949. Google Scholar

[15] Cormode G, Procopiuc C, Srivastava D, et al. Differentially Private Spatial Decompositions. In: Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, 2012. 20--31. Google Scholar

[16] Chaudhuri K, Monteleoni C, Sarwate A. Differentially Private Empirical Risk Minimization. J Mach Learn Res, 2011, 12: 1069--1109. Google Scholar

[17] Kifer D, Smith A, Thakurate A. Private Convex Empirical Risk Minimization and High-dimensional Regression. J Mach Learn Res, 2012, 23: 1--41. Google Scholar

[18] Wang Y N, Wang Y X, Singh A. Differentially private subspace clustering. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, 2015. 1000--1008. Google Scholar

[19] Park M, Foulds J, Chaudhuri K, et al. DP-EM: differentially private expectation maximization. In: Proceedings of the 20th International Conference on ArtificialIntelligence and Statistics, Fort Lauderdale, 2017. 54. Google Scholar

[20] Abadi M, Chu A, Goodfello I, et al. Deep learning with differential privacy. In: Proceedings of the 2016 ACMSIGSAC Conference on Computer and Communication Security, Vienna, 2016. 303--318. Google Scholar

[21] Dwork C, Lei J. Differential privacy and robust statistics. In: Proceedings of ACM Symposium on Theory of Computing, Bethesd, 2009. Google Scholar

[22] Wang Y, Lei J, Fienberg S. Learning with differential privacy: stability, learnability and the sufficiency and necessity of ERM principle. J Mach Learn Res, 2016, 17: 6353--6392. Google Scholar

[23] Mann G, McDonald R, Mohri M, et al. Efficient large-scale distributed training of conditional maximum entropy models. In: Proceedings of the 23rd Annual Conference on Neural Information Processing Systems, Vancouver, 2009. 1231--1239. Google Scholar

[24] McDonald R, Hall K, Mann G. Distributed training strategies for the structured perceptron. In: Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, 2010. 456--464. Google Scholar

[25] Zhang Y, Duchi J, Wainwright M. Communication-efficient algorithms for statistical optimization. In: Proceedings of the 51st IEEE Conference on Decision and Control, Maui, 2012. 6792--6792. Google Scholar

[26] Zhang Y, Duchi J, Wainwright M. Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates. J Mach Learn Res, 2013, 30: 592--617. Google Scholar

[27] Boyd S, Parikh N, Chu E, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn, 2011, 3: 1--122. Google Scholar

[28] Mateos G, Bazerque J A, Giannakis G B. Distributed Sparse Linear Regression. IEEE Trans Signal Process, 2010, 58: 5262-5276 CrossRef ADS Google Scholar

[29] Wang P, Zhang H, Liang Y. Model selection with distributed SCAD penalty. J Appl Stat, 2018, 45: 1938-1955 CrossRef Google Scholar

[30] Wang J, Kolar M, Srebro N, et al. Efficient distributed learning with sparsity. In: Proceedings of the International Conference on Machine Learning, Sydney, 2017. Google Scholar

[31] Huo Z, Meng X F. A Survey of Trajectory Privacy-Preserving Techniques. Chin J Comput, 2011, 34: 1820-1830 CrossRef Google Scholar

[32] Drenker S, Kader A. Nonintrusive monitoring of electric loads. IEEE Comput Appl Power, 1999, 12: 47-51 CrossRef Google Scholar

[33] Han S, Topcu U, Pappas G J. Differentially Private Distributed Constrained Optimization. IEEE Trans Automat Contr, 2017, 62: 50-64 CrossRef Google Scholar

[34] Ji Z, Jiang X, Wang S. Differentially private distributed logistic regression using private and public data.. BMC Med Genomics, 2014, 7: S14 CrossRef PubMed Google Scholar

[35] Zhang T, Zhu Q. Dynamic Differential Privacy for ADMM-Based Distributed Classification Learning. IEEE TransInformForensic Secur, 2017, 12: 172-187 CrossRef Google Scholar

[36] Zhang X, Khalili M, Liu M. Improving the privacy and accuracy of ADMM-Based distributed algorithms. In: Proceedings of the 35th International Conference on Machine Learning, Stockholm, 2018. 5796--5805. Google Scholar

[37] Wang Y, Yin W, Zeng J. Global Convergence of ADMM in Nonconvex Nonsmooth Optimization. J Sci Comput, 2019, 78: 29-63 CrossRef Google Scholar

[38] Alon U, Barkai N, Notterman D A. Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proc Natl Acad Sci USA, 1999, 96: 6745-6750 CrossRef ADS Google Scholar

  • Table 1   Efficiency of distributed Logistic algorithm (DLA)
    Data size $n=10^4,~~p=10^3$ $n=10^4,~~p=10^4$ $n=10^4,~~p=10^5$ $n=10^5,~~p=10^3$
    DLA ($J=5$) 175.21 s 1613.15 s 47720.4 s 1533.87 s
    DLA ($J=10$) 84.70 s 832.54 s 23988.1 s 782.69 s
    DLA ($J=100$) 8.63 s 78.43 s 2754.2 s 72.32 s
    Glmfit 114.02 s 6327.79 s More than 24 hours 1268.64 s
  •   

    Algorithm 1 Distributed Logistic algorithm (DLA)

    Input: Data $D_{\rm~all}=\{D_{j}\}_{j=1}^J$, pre-selected parameter $c$, step size $\alpha$, iteration number $K$;

    For all $~j\in\mathcal{J}$, let $w_j(0)=(0,\ldots,0)^{\rm~T},$ $~Z_j(0)=(0,\ldots,0)^{\rm~T},~$ $V_j(0)=(0,\ldots,0)^{\rm~T}$;

    Locally run;

    for $k=1$ to $K$

    Update $Z_j(k+1)$ via (8), and transmit it to neighbors;

    for $m=1$ to $M$

    Update $w^m_j(k+1)$ via (10);

    end for

    Let $w_j(k+1)=w^M_j(k+1)$, and transmit it to neighbors;

    Update $V_{ji}$ via (11), where $i=1,\ldots,J$, and transmit it to neighbors;

    end for

    Output: $w^{*}=w_j(K),~\forall~j\in~\mathcal{J}$.

  •   

    Algorithm 2 Distributed Logistic output perturbation algorithm (DLP)

    Input: Data $D_{\rm~all}=\{D_{j}\}_{j=1}^J$, pre-selected parameter $c$, step size $\alpha$, iteration number $K$, privacy degree $~\epsilon$;

    For all $~j\in\mathcal{J}$, let $w_j(0)=(0,\ldots,0)^{\rm~T},$ $~Z_j(0)=(0,\ldots,0)^{\rm~T},~$ $V_{j}(0)=(0,\ldots,0)^{\rm~T}$;

    Generate noise $b~\thicksim~\exp(-\frac{n\epsilon\lambda}{2})$;

    Run distributed Logistic algorithm, and obtain $w^{*}$;

    Output: $w^{*}+b$.

  •   

    Algorithm 3 Distributed Logistic variable perturbation algorithm (DLVP)

    Input: Data $D_{\rm~all}\!=\!\{D_{j}\}_{j=1}^J$, pre-selected parameter $c$, step size $\alpha$, iteration number $K$, privacy degree $\epsilon\!~=\!~\sum_{k=1}^K~\epsilon(k)$;

    For all $~j\in\mathcal{J}$, let $w(0)=(0,\ldots,0)^{\rm~T},$ $~Z(0)=(0,\ldots,0)^{\rm~T},~$ $V_{ij}(0)=(0,\ldots,0)^{\rm~T}$;

    Locally run;

    for $k=1$ to $K$

    Generate noise $b_j(k)~\thicksim~\exp(-\frac{n\epsilon(k)}{2\alpha}\|b_j(k)\|_2)$;

    Update $Z_j(k+1)$ via (12), and transmit it to neighbors;

    Update $w_j(k+1)$ via (14), and transmit it to neighbors;

    Update $V_{ji}(k)$ via (15), where $i=1,\ldots,J$, and transmit it to neighbors;

    end for

    Output: $w^{*}=\frac{1}{J}\sum_{j=1}^Jw_j(K)$.