#  SCIENCE CHINA Information Sciences, Volume 61 , Issue 9 : 092202(2018) https://doi.org/10.1007/s11432-016-9173-8

## Distributed regression estimation with incomplete data in multi-agent networks More info
• ReceivedNov 23, 2016
• AcceptedJun 21, 2017
• PublishedJan 4, 2018
Share
Rating

### Abstract ### Acknowledgment

This work was supported by National Key Research and Development Program of China (Grant No. 2016YFB0901902) and National Natural Science Foundation of China (Grant Nos. 61573344, 61333001, 61374168).

### Supplement

Appendix

Proof of Lemma Lm3

With the observation noise of (14), we obtain \begin{align} \mathbb{E}\|\epsilon^{i}_k\|=\mathbb{E}\|R^{\bar{A},i}_k\xi^{i}_{k}-R^{\bar{A},i}\xi^{i}_{k}+r^{y\bar{A},i}-y^{i}_{k}\bar{A}^{i}_{k}\|\leqslant \mathbb{E}\|R^{\bar{A},i}_k-R^{\bar{A},i}\|\|\xi^{i}_{k}\|+\mathbb{E}\|r^{y\bar{A},i}-y^{i}_{k}\bar{A}^{i}_{k}\|, \forall i \in \mathcal{N}. \tag{28} \end{align} Therefore, for all $\epsilon>0$, there exists a $k_{1}$ ($k_{1}$ is an integer) for $k>~k_{1}$, such that $\|R^{\bar{A},i}_k-R^{\bar{A},i}\|<\epsilon$. Define $M_{1}=\max\{\|R^{\bar{A},i}_{1}-R^{\bar{A},i}\|,\|R^{\bar{A},i}_{2}-R^{\bar{A},i}\|,\ldots,\|R^{\bar{A},i}_{k_{1}}-R^{\bar{A},i}\|,\epsilon\}$. Then $\|R^{\bar{A},i}_k-R^{\bar{A},i}\|\leqslant~M_{1}$ for $k\geqslant~0$. Analogously, $\|r^{y\bar{A},i}_{k}-R^{\bar{A},i}\|\leqslant~M_{2}$ for $k\geqslant~0$. From Remark Rem3, we have $\|\xi^{i}_{k}\|<C_{x}$. Hence, $\mathbb{E}\|\epsilon_i(k)\|\leqslant~M_1C_x+M_2=M_{\epsilon},~\forall k\geqslant~0$. By (14), $d^{i}_{k}=\nabla~g_i(k)+\epsilon_i(k)$. Thus, $\mathbb{E}\|d^{i}_{k}\|\leqslant~\mathbb{E}\|\nabla~g_i(k)\|+\mathbb{E}\|\epsilon_i(k)\|\leqslant~C_g+M_{\epsilon}=M_d$, which is bounded.

Proof of Lemma Lm4

For all $i\in~\mathcal{N},\;k\geqslant~0$, define $p^{i}_{k+1}=\xi^{i}_{k+1}-\sum_{j=1}^{N}w_{ij}(k)\xi^j_{k}$. We rewrite (8) compactly in terms of $\Psi(k,s)$ as follows: $\xi^{i}_{k+1}=\sum_{j=1}^{N}~[\Psi(k,0)]_{ij}\xi^{j}_{0}+p^{i}_{k+1}+\sum_{s=1}^{k}\sum_{j=1}^{N}~[\Psi(k,s)]_{ij}p^j_{s}$, for $k\geqslant~s$. Moreover, with Assumption Ass1 and by induction, the following equality holds: $\bar{\xi}_{k+1}=\frac{1}{N}\sum_{i=1}^{N}\xi^{i}_{0}+\frac{1}{N}\sum_{s=1}^{k+1}\sum_{j=1}^{N}p^j_{s}$. Consequently, we obtain that, for $i\in~\mathcal{N}$, $\xi^{i}_{k+1}-\bar{\xi}_{k+1}=\sum_{j=1}^{N}( [\Psi(k,0)]_{ij}-\frac{1}{N})\xi^{j}_{0}+(p^{i}_{k+1} -\frac{1}{N}\sum_{j=1}^{N}p^j_{k+1})+\sum_{s=1}^{k}\sum_{j=1}^{N} (~[\Psi(k,0)]_{ij}-\frac{1}{N})p^j_{s}$. Therefore, $\forall~i\in~\mathcal{N}$, \begin{align} \|\xi^{i}_{k+1}-\bar{\xi}_{k+1}\|\leqslant\sum_{j=1}^{N} | [\Psi(k,0)]_{ij}-\frac{1}{N}\||\xi^{i}_{0}\|+\|p^{i}_{k+1}\| +\left\|\frac{1}{N}\sum_{j=1}^{N}p^j_{k+1}\right\|+\sum_{s=1}^{k}\sum_{j=1}^{N} | [\Psi(k,0)]_{ij}-\frac{1}{N}\||p^j_{s}\|. \tag{29} \end{align} Plugging in the estimate of $\Psi(k,s)$ in Lemma Lm1 and $\|\xi^{i}_{0}\|\leqslant~\max_{1\leqslant~i\leqslant~N}\|\xi^{i}_{0}\|$, we have \begin{align} \|\xi^{i}_{k+1}-\bar{\xi}_{k+1}\|\leqslant N\lambda \beta^{k}\max_{1\leqslant i\leqslant N}\|\xi^{i}_{0}\|+\|p^{i}_{k+1}\|+\frac{1}{N}\sum_{j=1}^{N}\|p^{j}_{k+1}\|+\lambda\sum_{s=1}^{k}\beta^{k-s}\sum_{j=1}^{N}\|p^{j}_{s}\|. \tag{30} \end{align} Next, from the definition of $p_i(k)$, we get \begin{align} \|p^i_{k+1}\|=\left\|P_{X}\left(\sum_{j=1}^{N}w_{ij}(k)\xi^{i}_{k}-\iota_{k}d^{i}_{k}\right)-\sum_{j=1}^{N}w_{ij}(k)\xi^j_{k}\right\|\leqslant\iota_{k}\|d^{i}_{k}\|. \tag{31} \end{align} With (30) and (31), the proof is completed.

Proof of Theorem Thm2

From Theorem Thm1, $\|\xi^i_{k+1}-\bar{\xi}_{k+1}\|$ converges in mean. Then, on the base of Fatou's Lemma 1), the following relation holds $0\leqslant\mathbb{E}[\underset{k\rightarrow\infty}{\liminf}\|\xi^i_{k+1}-\bar{\xi}_{k+1}\|]\leqslant\underset{k\rightarrow\infty}{\liminf}\mathbb{E}[\|\xi^i_{k+1} -\bar{\xi}_{k+1}\|]=0$, which yields $\mathbb{E}[\underset{k\rightarrow\infty}{\liminf}\|\xi^i_{k+1}-\bar{\xi}_{k+1}\|]=0$. Therefore,$\underset{k\rightarrow\infty}{\liminf}\|\xi^i_{k+1}-\bar{\xi}_{k+1}\|=0$ holds almost surely. Since $~\|\xi^i_{k+1}-\bar{\xi}_{k}\|^2\leqslant\|\hat{\xi}^{i}_{k+1}-\bar{\xi}_{k}\|^2$, \begin{align} \|\xi^i_{k+1}-\bar{\xi}_{k}\|^2\leqslant \|\hat{\xi}^{i}_{k+1}-\bar{\xi}_{k}\|^2\leqslant\sum_{j=1}^{N}w_{ij}(k)\|\xi^j_{k}-\bar{\xi}_{k}\|^2+\iota^2_k\|d^{i}_{k}\|^2+2\iota_{k}\|d^{i}_{k}\|\sum_{j=1}^{N}w_{ij}(k)\|\xi^j_{k}-\bar{\xi}_{k}\|. \tag{32} \end{align} Note that $\sum_{i=1}^{N}~\|\xi^i_{k+1}-\bar{\xi}_{k}\|^2\leqslant\sum_{i=1}^{N}\sum_{j=1}^{N}w_{ij}(k)\|\xi^{j}_{k}-\bar{\xi}_{k}\|^2+\sum_{i=1}^{N}\iota^2_k\|d^{i}_{k}\|^2+2\sum_{i=1}^{N}\iota_{k}\|d^{i}_{k}\|\sum_{j=1}^{N}w_{ij}(k)\|\xi^j_{k}-\bar{\xi}_{k}\|$, $~i\in\mathcal{N}$, which implies $\sum_{i=1}^{N}\sum_{j=1}^{N}w_{ij}(k)\|\xi^{j}_{k}-\bar{\xi}_{k}\|^2=\sum_{i=1}^{N}\|\xi^{i}_{k}-\bar{\xi}_{k}\|^2$. Therefore, \begin{align} \sum_{i=1}^{N}\|\xi^i_{k+1}-\bar{\xi}_{k}\|^2\leqslant\sum_{i=1}^{N}\|\xi^{i}_{k}-\bar{\xi}_{k}\|^2+\sum_{i=1}^{N}\iota^2_k\|d^{i}_{k}\|^2 +2\sum_{i=1}^{N}\iota_{k}\|d^{i}_{k}\|\sum_{j=1}^{N}w_{ij}(k)\|\xi^j_{k}-\bar{\xi}_{k}\|. \tag{33} \end{align} Taking the conditional expectation of both side of (33) yields \begin{align} \sum_{i=1}^{N}\mathbb{E}[\|\xi^i_{k+1}-\bar{\xi}_{k+1}\|^2|F_k]&\leqslant \sum_{i=1}^{N}\|\xi^{i}_{k}-\bar{\xi}_{k}\|^2+2M_{d}\sum_{j=1}^{N}\iota_{k}\|\xi^j_{k}-\bar{\xi}_{k}\|+N\iota^2_kM_{d}^2. \tag{34} \end{align} According to Theorem 6.2 of , $\sum_{k=1}^{\infty}\iota_{k}\|\xi^j_{k}-\bar{\xi}_{k}\|<\infty$ with probability $1$. Therefore, together with $\sum_{k=1}^{\infty}N\iota^2_kM_{d}^2<\infty$, $\|\xi^i_{k+1}-\bar{\xi}_{k}\|^2$ converges almost surely by Lemma Lm2. Hence, the conclusion follows.

Rudin W. Real and Complex Analysis. New York: McGraw-Hill Book Company, 1986. 5–71.

Proof of Theorem Thm3

Clearly, $\|\xi^i_{k+1}-\xi^*\|^2\leqslant~\|\hat{\xi}^{i}_{k+1}-\xi^{*}\|^2$, and then $\|\xi^i_{k+1}-\xi^*\|^2\leqslant~\|\sum_{j=1}^{N}w_{ij}(k)\xi^{i}_{k}-\xi^*~\|^2~+ \iota^2_k \|d^{i}_{k}\|^2-2\iota_{k}(d^{i}_{k})^\text{T}~(\sum_{j=1}^{N}w_{ij}(k)\xi^{i}_{k}-\xi^*)$, which follows from  that, $\forall~x_{1},x_{2}$, $g(x_{2})\geqslant~g(x_{1})+\nabla~g(x_{1})^\text{T}(x_{2}-x_{1})$. Recalling that, $\mathbb{E}\|d^{i}_{k}\|\leqslant~M_{d}$ in Lemma Lm3 and $\|\nabla g^{i}(\xi)\|\leqslant~C_{g}$ in Remark Rem5, we have \begin{align} (\nabla g^{i}_{k})^\text{T}\left(\sum_{j=1}^{N}w_{ij}(k)\xi^{i}_{k}-\xi^*\right)\geqslant g^{i}(\bar{\xi}_{k})-g_{i}(\xi^*)-C_{g}\left\|\sum_{j=1}^{N}w_{ij}(k)\xi^{i}_{k}-\bar{\xi}_{k}\right\|, \tag{35} \end{align} and $\mathbb{E}[\epsilon_{i}^\text{T}(k)(\sum_{j=1}^{N}w_{ij}(k)\xi^{i}_{k}-\xi^*)]\leqslant \mathbb{E}\|\epsilon^{i}_k\|\|\sum_{j=1}^{N}w_{ij}(k)\xi^{i}_{k}-\xi^*\|$ for all $k=0,1,2,\ldots$. Therefore, \begin{align} \mathbb{E}[\|\xi^i_{k+1}-\xi^*\|^2|F_{k}] \leqslant & \mathbb{E}\left[\left\|\sum_{j=1}^{N}w_{ij}(k)\xi^{i}_{k}-\xi^* \right\|^2\Bigg|F_{k}\right]+\iota^2_k\mathbb{E}\|d^{i}_{k}\|^2+2\iota_{k}C_{g}\mathbb{E} \left[\left\|\sum_{j=1}^{N}w_{ij}(k)\xi^{i}_{k}-\bar{\xi}_{k}\right\|\Bigg|F_{k}\right] \\ &-2\iota_{k}\mathbb{E}\|\epsilon^{i}_k\|\left\|\sum_{j=1}^{N}w_{ij}(k)\xi^{i}_{k}-\xi^*\right\| -2\iota_{k}(g_{i}(\bar{\xi}_{k})-g_{i}(\xi^*)). \tag{36} \end{align} By the double stochasticity of matrix $W(k)$, \begin{align} \begin{cases}\displaystyle \sum_{i=1}^n \mathbb{E}\left[\left\|\sum_{j=1}^{N}w_{ij}(k)\xi^{i}_{k}-\xi^*\right\|^2 \Bigg|F_{k}\right]\leqslant \sum_{i=1}^n \|\xi^i_{k}-\xi^*\|^2, \tag{37} \\ \displaystyle \sum_{i=1}^n \mathbb{E}\left[\left\|\sum_{j=1}^{N}w_{ij}(k)\xi^{i}_{k}-\bar{\xi}_{k}\right\| \Bigg|F_{k}\right]\leqslant \sum_{i=1}^n \|\xi^i_{k}-\bar{\xi}_{k}\|. \end{cases} \tag{38} \end{align} Then, with probability $1$, for $i\in~\mathcal{N}$, it holds \begin{align} \sum_{i=1}^N \mathbb{E}[\|\xi^i_{k+1}-\xi^* \|^2|F_{k}]&\leqslant\sum_{i=1}^{N} \|\xi^i_{k}-\xi^*\|^2+w_{k}-v_{k}, \tag{39} \end{align} where \begin{align} \begin{cases}\displaystyle w_{k}=\sum_{i=1}^N\iota^2_k\mathbb{E}\|d^{i}_{k}\|^2+2\iota_{k}C_{g}\sum_{i=1}^N\mathbb{E}\|\xi^{i}_{k}-\bar{\xi}_{k}\|, \tag{40} \\ \displaystyle v_{k}=2\sum_{i=1}^N\iota_{k}\mathbb{E}[\|\epsilon^{i}_k\|]\left\|\sum_{j=1}^{N}w_{ij}(k)\xi^{i}_{k}-\xi^*\right\|+2\iota_{k}g(\bar{\xi}_{k}-g(\xi^*)). \end{cases} \tag{41} \end{align} By Theorem 6.2 in , $\sum_{k=1}^{\infty}2\iota_{k}C_{g}\sum_{i=1}^N\mathbb{E}\|\xi^{i}_{k}-\bar{\xi}_{k}\|<\infty$. Since $\sum_{k=1}^{\infty}\iota^2_k<\infty$, $\sum_{k=1}^{\infty} \iota^2_k\mathbb{E}\|d^{i}_{k}\|^2~\leqslant \sum_{k=1}^{\infty}\iota^2_kNM_{d}^{2}<\infty$. Therefore, $\sum_{k=1}^{\infty}w_{k}<\infty$.

From Lemma Lm2, the sequence $\sum_{i=1}^{N}~\|\xi^i_{k}-\xi^*\|^2$ converges with probability 1 and $\sum_{k=1}^{\infty}v_{k}<\infty$.

As for $v_{k}$, according to the boundedness of $\xi^{i}_{k}$ and the ergodicity of $\bar{A}^{i}_{k}$, we conclude that $\lim_{k\rightarrow\infty}R^{\bar{A},i}_k\xi^{i}_{k}-R^{\bar{A},i}\xi^{i}_{k}=0$. Moreover, $\lim_{k\rightarrow\infty}r^{y\bar{A},i}-y^{i}_{k}\bar{A}^{i}_{k}=0$ by the stationary property of $y^{i}_{k},~a^{i}_{k}$.

Therefore \begin{equation} \lim_{k\rightarrow\infty}2\sum_{i=1}^N\mathbb{E}[\|\epsilon^{i}_{k}\|]\left\|\sum_{j=1}^{N}w_{ij}(k)\xi^{i}_{k}-\xi^*\right\|=0. \tag{42}\end{equation}

Similar to the demonstration of Theorem 6.2 in , we get $\sum_{k=1}^{\infty}2\sum_{i=1}^N\iota_{k}\mathbb{E}[\|\epsilon^{i}_k\|]\|\sum_{j=1}^{N}w_{ij}(k)\xi^{i}_{k}-\xi^*\|<\infty$, which implies $\sum_{k=1}^{\infty}2\iota_{k}(g(\bar{\xi}_{k})-g(\xi^*))<\infty$. Since $\sum_{k=1}^{\infty}2\iota_{k}(g(\bar{\xi}_{k})-g(\xi^*))<\infty$ and $\sum_{i=1}^{\infty}\iota_{k}=\infty$, $\liminf_{k\rightarrow \infty}~g(\bar{\xi}_{k})=g(\xi^*)$ holds almost surely. Therefore, $\lim_{k\rightarrow~\infty}\|\xi^i_{k}-\bar{\xi}_{k}~\|=0$ holds almost surely for all $i$, which yields the conclusion.

Proof of Lemma Lm5

Define $r^{i}_{k}=\xi^{i}_{k}-\hat{\xi}^{i}_{k}=P_{X}(\hat{\xi}^{i}_{k})-\hat{\xi}^{i}_{k}$. Since $X$ is convex and $W(k)$ is doubly stochastic, we have $\sum_{j=1}^{n}w_{ij}(k)x^{j}_k\in~X$, which leads to $\|r^{i}_{k+1}\||\leqslant~\|P_{X}(\hat{\xi}^{i}_{k+1})-\sum_{j=1}^{n}w_{ij}(k)\xi^{j}_{k}\|+\iota_{k}\|d^{i}_{k}\|\leqslant~2\iota_{k}\|d^{i}_{k}\|$. By Algorithm 1, we obtain $\bar{\xi}_{k+1}=\bar{\xi}_{k}-\frac{\iota_{k}}{N}\sum_{i=1}^{N}(\triangledown g^{i}_k+\epsilon^{i}_k)+\frac{1}{N}\sum_{i=1}^{N}r^{i}_{k+1}$. As a result, we can decompose $\|\bar{\xi}_{k+1}-\xi\|^{2}$ by \begin{align} \|\bar{\xi}_{k+1}-\xi\|^{2}=\|\bar{\xi}_{k}-\xi\|^{2}+\frac{1}{N^{2}} \left\|\sum_{i=1}^{N}(r^{i}_{k+1}+\iota_{k}d^{i}_{k})\right\|^{2} +\frac{2}{N}\sum_{i=1}^{N}\langle r^{i}_{k+1},\bar{\xi}_{k}-\xi\rangle-\frac{2\iota_{k}}{N}\sum_{i=1}^{N}\langle\triangledown g^{i}_{k},\bar{\xi}_{k}-\xi\rangle-\frac{2\iota_{k}}{N}\sum_{i=1}^{N}\langle\epsilon^{i}_k, \bar{\xi}_{k}-\xi\rangle. \tag{43} \end{align} Let us check $-\sum_{i=1}^{N}\langle~\triangledown g^{i}_{k},\bar{\xi}_{k}-\xi\rangle$. Based on Lemma (Lm1), we obtain \begin{align} -\langle \triangledown g^{i}_{k},\bar{\xi}_{k}-\xi\rangle =& -\langle \triangledown g^{i}_{k},\bar{\xi}_{k}-\xi^{i}_{k}\rangle-\langle \triangledown g^{i}_{k},\xi^{i}_{k}-\xi\rangle \leqslant \|\triangledown g^{i}_{k}\|\|\bar{\xi}_{k}-\xi^{i}_{k}\|+g^{i}(\bar{\xi}_{k})-g^{i}_{k}-\frac{\mu}{2}\|\xi^{i}_{k}-x\|^{2}+g^{i}(\xi)-g^{i}(\bar{\xi}_{k}) \\ \leqslant & \|\triangledown g^{i}_{k}\|\|\bar{\xi}_{k}-\xi^{i}_{k}\|+\langle \triangledown \bar{g}^{i}_{k},\bar{\xi}_{k}-\xi^{i}_{k}\rangle -\frac{\mu}{2}\|\xi^{i}_{k}-\xi\|^{2}-\frac{\mu}{2}\|\xi^{i}_{k}-\bar{\xi}_{k}\|^{2}+g^{i}(\xi)-g^{i}(\bar{\xi}_{k}). \tag{44} \end{align} Since $\langle~\triangledown~\bar{g}^{i}_{k},\bar{\xi}_{k}-\xi^{i}_{k}\rangle\leqslant~\|\triangledown~\bar{g}^{i}_{k}\|\|\bar{\xi}_{k}-\xi^{i}_{k}\|$ and $\|\xi^{i}_{k}-\xi\|^{2}+\|\xi^{i}_{k}-\bar{\xi}_{k}\|^{2}~\geqslant~\frac{1}{2}\|\bar{\xi}_{k}-\xi\|^{2}$, we can estimate $-\langle~\triangledown~g^{i}_{k},\bar{\xi}_{k}-\xi\rangle$ as follows: $-\langle~\triangledown~g^{i}_{k},\bar{\xi}_{k}-\xi\rangle\leqslant~(\|\triangledown~g^{i}_{k}\|+\|\triangledown~\bar{g}^{i}_{k}\|)\|\bar{\xi}_{k}-\xi^{i}_{k}\| +g^{i}(\xi)-g^{i}(\bar{\xi}_{k})-\frac{\mu}{4}\|\bar{\xi}_{k}-\xi\|^{2}$.

Summing up over $i=1,2,\ldots,N$, the following inequality holds: \begin{align} -\sum_{i=1}^{N}\langle \triangledown g^{i}_{k},\bar{\xi}_{k}-\xi\rangle\leqslant \sum_{i=1}^{N} (\|\triangledown g^{i}_{k}\|+\|\triangledown \bar{g}^{i}_{k}\|) \|\bar{\xi}_{k}-\xi^{i}_{k}\|+g(\xi)-g(\bar{\xi}_{k})-\frac{\mu N}{4}\|\bar{\xi}_{k}-\xi\|^{2}. \tag{45} \end{align} Next, it is not hard to get that, for $k=0,1,\ldots$, \begin{align} -\sum_{i=1}^{N}\langle \epsilon^{i}_k,\bar{\xi}_{k}-\xi\rangle\leqslant \sum_{i=1}^{N}\|\epsilon^{i}_k\|\|\bar{\xi}_{k}-\xi^{i}_{k}\|+\sum_{i=1}^{N}\| \epsilon^{i}_k\|\|\xi^{i}_{k}-\xi\|. \tag{46} \end{align} Then \begin{align} \langle r^{i}_{k+1},\bar{\xi}_{k}-\xi\rangle\leqslant \langle r^{i}_{k+1},\bar{\xi}_{k}-\hat{\xi}^{i}_{k+1}\rangle+\langle P_{X}(\hat{\xi}^{i}_{k+1}) -\hat{\xi}^{i}_{k+1},\hat{\xi}^{i}_{k+1}-\xi\rangle. \tag{47} \end{align} Because the projection operator satisfies the following inequality \begin{align} \langle P_{X}(\hat{\xi})-\hat{\xi},\hat{\xi}-\xi\rangle\leqslant -\|P_{X}(\hat{\xi})-\hat{\xi}\|^{2}\leqslant 0,\; \forall \xi\in X, \tag{48} \end{align} it follows from (48) with (47) that \begin{align} \langle r^{i}_{k+1},\bar{\xi}_{k}-\xi\rangle\leqslant \langle r^{i}_{k+1},\bar{\xi}_{k}-\hat{\xi}^{i}_{k+1}\rangle\leqslant 2\iota_{k}\|d^{i}_{k}\|\|\bar{\xi}_{k} -\hat{\xi}^{i}_{k+1}\|. \tag{49} \end{align} Moreover, \begin{align} \frac{1}{N^{2}}\left\|\sum_{i=1}^{N}(r^{i}_{k+1}+\iota_{k}d^{i}_{k})\right\|^{2}= \frac{1}{N^{2}}\left(\sum_{i=1}^{N}(r^{i}_{k+1}+\iota_{k}d^{i}_{k})\right)^{2}\leqslant \frac{9\iota_{k}^{2}}{N^{2}}\left(\sum_{i=1}^{N}\|d^{i}_{k}\|\right)^{2}. \tag{50} \end{align} Combining (45), (46), (49), and (50) with (43) yields the conclusion.

### References

 Nedic A, Ozdaglar A. Distributed Subgradient Methods for Multi-Agent Optimization. IEEE Trans Automat Contr, 2009, 54: 48-61 CrossRef Google Scholar

 Shi G, Johansson K H. Robust Consensus for Continuous-Time Multiagent Dynamics. SIAM J Control Optim, 2013, 51: 3673-3691 CrossRef Google Scholar

 Zhang Y Q, Lou Y C, Hong Y G, et al. Distributed projection-based algorithms for source localization in wireless sensor networks. IEEE Trans Wirel Commun, 2015, 43: 3131--3142. Google Scholar

 Feng H, Jiang Z D, Hu B, et al. The incremental subgradient methods on distributed estimations in-network. Sci China Inf Sci, 2014, 57: 092103. Google Scholar

 Lou Y, Hong Y, Wang S. Distributed continuous-time approximate projection protocols for shortest distance optimization problems. Automatica, 2016, 69: 289-297 CrossRef Google Scholar

 Yi P, Hong Y, Liu F. Initialization-free distributed algorithms for optimal resource allocation with feasibility constraints and application to economic dispatch of power systems. Automatica, 2016, 74: 259-269 CrossRef Google Scholar

 Kokaram A C. On Missing Data Treatment for Degraded Video and Film Archives: A Survey and a New Bayesian Approach. IEEE Trans Image Process, 2004, 13: 397-415 CrossRef ADS Google Scholar

 Molenberghs G, Kenward M G. Missing Data in Clinical Studies. New York: Wiley, 2007. Google Scholar

 Ibrahim J G, Chen M H, Lipsitz S R. Missing-Data Methods for Generalized Linear Models. J Am Statistical Association, 2005, 100: 332-346 CrossRef Google Scholar

 Gholami M R, Jansson M, Strom E G. Diffusion Estimation Over Cooperative Multi-Agent Networks With Missing Data. IEEE Trans Signal Inf Process over Networks, 2016, 2: 276-289 CrossRef Google Scholar

 Davey A, Savla J. Statistical Power Analysis with Missing Data: A Structural Equation Modeling Approach. Oxford, UK: Routledge Academic, 2009. Google Scholar

 Sundhar Ram S, Nedi? A, Veeravalli V V. Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization. J Optim Theor Appl, 2010, 147: 516-545 CrossRef Google Scholar

 Graybill F, Iyer H K. Regression Analysis: Concepts and Applications. California: Duxbury Press Belmont, 1994. Google Scholar

 Yan F, Sundaram S, Vishwanathan S V N. Distributed Autonomous Online Learning: Regrets and Intrinsic Privacy-Preserving Properties. IEEE Trans Knowl Data Eng, 2013, 25: 2483-2493 CrossRef Google Scholar

 Hazan E, Kale S. Beyond the regret minimization barrier: optimal algorithms for stochastic strongly-convex optimization. J Mach Learn Res, 2014, 15: 2489--2512. Google Scholar

 Shamir O, Zhang T. Stochastic gradient descent for non-smooth optimization: convergence results and optimal averaging schemes. In: Proceedings of International Conference on Machine Learning, Edinburgh, 2012. 71--79. Google Scholar

 Towfic Z J, Chen J, Sayed A H. On distributed online classification in the midst of concept drifts. Neurocomputing, 2013, 112: 138-152 CrossRef Google Scholar

 Widrow B, Stearns S D. Adaptive Signal Processing. Cliffs: Prentice-Hall, 1985. 1--32. Google Scholar

 Sayed A. Adaptation, Learning, and Optimization over Networks. FNT Machine Learning, 2014, 7: 311-801 CrossRef Google Scholar

 Sayed A H, Tu S Y, Chen J. Diffusion Strategies for Adaptation and Learning over Networks: An Examination of Distributed Strategies and Network Behavior. IEEE Signal Process Mag, 2013, 30: 155-171 CrossRef ADS Google Scholar

 Polyak B T. Introduction to Optimization. New York: Optimization Software Inc., 1983. 2--8. Google Scholar

 Godsil C, Royle G. Algebraic Graph Theory. New York: Springer-Verlag, 2001. 1--18. Google Scholar

 Ferguson T S. A Course in Large Sample Theory. London: Chapman and Hall Ltd., 1996. 3--4. Google Scholar

 Durrett R. Probability Theory and Examples. Camberidge, UK: Camberidge Press, 2010. 328--347. Google Scholar

 Enders C K. Applied Missing Data Analysis. New York: The Guilford Press, 2010. Google Scholar

 Kushner H J, Yin G. Stochastic Approximation and Recursive Algorithms and Applications. New York: Springer-Verlag, 1997. 117--157. Google Scholar

 Widrow B, McCool J M, Larimore M G. Stationary and nonstationary learning characteristics of the LMS adaptive filter. Proc IEEE, 1976, 64: 1151-1162 CrossRef Google Scholar

 Yi P, Hong Y. Stochastic sub-gradient algorithm for distributed optimization with random sleep scheme. Control Theor Technol, 2015, 13: 333-347 CrossRef Google Scholar

 Larsen R J, Max M L. An Introduction to Mathematical Statistics and Its Applications. 4th ed. New York: Pearson, 2006. 221--280. Google Scholar

• Figure 1

The topology of the networks.

• Figure 2

(Color online) The learning curves. (a) Missing data with threshold $[-10,-4,-1,-2,-10]^\top$; (b) missing data with threshold $[-10,-10,-10,-10,-10]^\top$.

• Figure 3

(Color online) The performances of $R(T)$ for all agents.

•

Algorithm 1 Distributed adaptive gradient-based algorithm (DAGA)

$R^{\bar{A},i}_k=(1-\rho_{k})R^{\bar{A},i}_{k-1}+\rho_{k}\bar{A}^{i}_{k}(\bar{A}^{i}_{k})^\text{T}$;

$r^{y\bar{A},i}_{k}=(1-\rho_{k})r^{y~\bar{A},i}_{k-1}+\rho_{k}y^{i}_{k}\bar{A}^{i}_{k}$;

$d^{i}_{k}=R^{\bar{A},i}_k\xi^{i}_{k}-r^{y\bar{A},i}_{k}$;

$\hat{\xi}^{i}_{k+1}=\sum_{j=1}^{N}w_{ij}(k)\xi^{i}_{k}-\iota_{k}d^{i}_{k}$;

$\xi^{i}_{k+1}=P_{X}(\hat{\xi}^{i}_{k+1})$.