## A two-step method for estimating high-dimensional Gaussian graphical models

• AcceptedJul 29, 2018
• PublishedMay 14, 2020
Share
Rating

### Acknowledgment

This work was supported by National Natural Science Foundation of China (Grant No. 11671059).

### Supplement

Appendix

Lemma 1. Let $\hat~\beta$ be the solution to $$\{ \hat{\beta}_{jj'} \} = \arg\min \dfrac{1}{2n} \sum\limits_{j=1}^{p} \sum\limits_{i=1}^{n} \bigg(x_{ij} - \sum\limits_{j' \neq j} x_{ij'} \beta_{jj'}\bigg)^2 + \lambda_1 \sum\limits_{j \neq j'} |\beta_{jj'}|. \tag{3}$$ The optimality conditions for 3 can be written as \begin{align} \dfrac{1}{n}\sum\limits_{i=1}^{n} x_{ij'} \cdot \bigg(x_{ij} - \sum\limits_{j'\neq j} x_{ij'} \hat \beta_{jj'} \bigg) = \lambda_1 \gamma_{jj'}, \tag{4} \end{align} where $$\gamma_{jj'}= \left \{\! \begin{array}{ll} {\rm sign} (\hat \beta_{jj'}), \text{if} j \neq j' \text{ and } \hat \beta_{jj'} \neq 0, \\ \text{a real number }\in [-1,1], \text{if} j \neq j' \text{ and } \hat \beta_{jj'} =0. \end{array} \right. \tag{5}$$

Note that given $~j~$, 4 can be seen as an LASSO solution and hence has at most $~\min\{~n,p~\}~$ nonzero components.

ProofProof of Lemma rm1. The partial derivative of the first part of 3 with respect of $\beta_{jj'}$ can be calculated as follows: \begin{align}&\partial \dfrac{1}{2n} \bigg\{ \sum\limits_{j=1}^{p} \cdot \sum\limits_{i=1}^{n} \bigg(x_{ij}-\sum\limits_{j' \neq j}x_{ij'} \beta_{jj'}\bigg)^2 \bigg\}\bigg{/} \partial \beta_{jj'} \\ & = \dfrac{1}{2n}\sum\limits_{i=1}^{n} \partial \bigg\{ \sum\limits_{j=1}^{p}\bigg(x_{ij}-\sum\limits_{j' \neq j}x_{ij'} \beta_{jj'}\bigg)^2 \bigg\}\bigg{/} \partial \beta_{jj'} \\ & = - \dfrac{1}{n}\sum\limits_{i=1}^{n}x_{ij'}\cdot \bigg(x_{ij} - \sum\limits_{j' \neq j} x_{ij'} \beta_{jj'}\bigg). \end{align} Note that $~\gamma_{jj'}~$ of 5 is the subgradient of the function $f(x)~=~\|x\|_1$ evaluated at $x~=~\hat~\beta$. Therefore, by the Karush-Kuhn-Tucker (KKT) conditions, we have $~\{\hat~\beta_{jj'}\}~$ is a solution of 3 if and only if $\hat~\beta_{jj'}$ satisfies 4 and 5.

The following notation will be used throughout the proof of Theorem 3.1. With a bit abuse of notation, we let $~\hat~u~=~\sqrt{n}(\hat~\beta~-~\beta)~$ and $W$ be $~(p-1)~\times~p~$ matrices with elements $~\sqrt{n}(\hat~\beta_{jj'}~-\beta_{jj'}~)$ and $~\frac{1}{\sqrt{n}}\sum~_{i=1}^n~x_{ij'}~\epsilon_{ij}~$,$~j~\neq~j'~$, respectively, and $C$ be the $~p\times~p$ matrix with elements $\frac{1}{n}\sum~_{i=1}^nx_{ij}x_{ij'}$, where $~j,~j'~=~1,\ldots,p~$. Let $~\beta_{\cdot~j}~$, $~\hat~u_{\cdot~j}~$ and $W_{\cdot~j}$ denote the $~j~$-th column of $~\beta~$, $\hat~u~$ and $W$, respectively, e.g., $$\hat u_{\cdot j} = (\hat u_{1j}, \ldots, \hat u_{(j-1)j}, \hat u_{(j+1)j}, \ldots, \hat u_{pj})^\T .$$ Similarly, $~W_{S_j}~$ and $~\hat~u_{S_j}~$ denote the sub-vector of $~W_{\cdot~j}~$ and $~\hat~u_{\cdot~j}~$ corresponding to $S_j$, respectively.

Lemma 2. Conditional on $\{~2\|W_{\cdot~j}\|_{\infty}~\leqslant~\sqrt{n}\lambda_1\}$, we have $$\|\hat u_{S^c_j}\|_1 \leqslant 3\|\hat u_{S_j}\|_1, \textrm{for} j= 1,\ldots,p.$$

Proof. Given $~j~=~1,\ldots,p~$, we have \begin{align}& \dfrac{1}{2n} \sum\limits_{i=1}^n \bigg(x_{ij} - \sum_{j' \neq j} x_{ij'} \hat \beta_{jj'}\bigg)^2 + \lambda_1 \sum_{j' \neq j} |\hat \beta_{jj'}|_1 \\ & \leqslant \dfrac{1}{2n} \sum\limits_{i=1}^n \bigg(x_{ij} - \sum_{j' \neq j} x_{ij'} \beta_{jj'}\bigg)^2 + \lambda_1 \sum_{j' \neq j} |\beta_{jj'}|_1. \end{align} Since $\sum_{j'~\neq~j}~|\hat~\beta_{jj'}|_1~=~\|\hat~\beta_{S_j}\|_1~+~\|\hat~\beta_{S^c_j}\|_1$ and $\|\hat~\beta_{S_j}\|_1~\geqslant~-\|\hat~\beta_{S_j}~-~\beta_{S_j}\|_1~+~\|\beta_{S_j}\|_1,$ we have \begin{align} & \dfrac{1}{n}\sum_{i=1}^{n}\sum_{j' \neq j} x^2_{ij'}(\hat \beta_{jj'} - \beta_{jj'})^2 + 2\lambda_1\|\hat \beta_{S^c_j}\|_1 \\ & \leqslant 2\dfrac{1}{n} \sum_{i=1}^{n}\epsilon_i\sum_{j' \neq j} x_{ij'}(\hat \beta_{jj'} - \beta_{jj'}) +2 \lambda_1\|\hat \beta_{S_j} - \beta_{S_j}\|_1. \tag{6} \end{align} Conditional on $~\{~2\|W_{\cdot~j}\|_{\infty}~\leqslant~\sqrt{n}\lambda_1\}~$, we also have \begin{align} 2\dfrac{1}{n} \sum\limits_{i=1}^{n}\epsilon_{ij}\sum\limits_{j' \neq j} x_{ij'}(\hat \beta_{jj'} - \beta_{jj'}) & \leqslant \lambda_1\sum_{j' \neq j} | \hat \beta_{jj'}-\beta_{jj'}| \\ & \leqslant \lambda_1\| \hat \beta_{S_j}-\beta_{S_j}\|_1 + \lambda_1\|\hat \beta_{S^c_j}\|_1. \tag{7} \end{align} Combining 6 and 7, we have \begin{align}& \dfrac{1}{n}\sum_{i=1}^{n}\sum_{j' \neq j} x^2_{ij'}(\hat \beta_{jj'} - \beta_{jj'})^2 + 2\lambda_1\|\hat \beta_{S^c_j}\|_1 \leqslant 3\lambda_1 \|\hat \beta_{S_j} - \beta_{S_j}\|_1 + \lambda_1 \|\hat \beta_{S^c_j}\|_1. \end{align} Therefore, $\|\hat~u_{S^c_j}\|_1~\leqslant~3\|\hat~u_{S_j}\|_1.$

Lemma 3. Given $j~=~1,\ldots,p$, according to rm (C.1) and rm(C.2) conditional on $$\{ 2\|W_{S_j}\|_{2} \leqslant \sqrt{l_0}\sqrt{n}\lambda_1\} \cap \{ 2\|W_{\cdot j}\|_{\infty} \leqslant \sqrt{n}\lambda_1\},$$ we have $$\|\hat u_{S_j} \|_2\leqslant \dfrac{3}{ \kappa_1}\sqrt{l_0} \sqrt{n}\lambda_1.$$

Proof. It holds that $$\text{Set} F(\beta) = \dfrac{1}{2n} \sum\limits_{j=1}^{p} \sum\limits_{i=1}^{n}\bigg (x_{ij} - \sum\limits_{j' \neq j} x_{ij'} \beta_{jj'}\bigg)^2 + \lambda_1 \sum\limits_{j \neq j'} |\beta_{jj'}|.$$ Define $~V(u)~=~F(\hat~\beta)~-~F(\beta)~$: \begin{align}V(u)& = \sum\limits_{j=1}^p \bigg\{ \dfrac{1}{2n} u^\T_{\cdot j} C_{-j} u_{\cdot j} - \dfrac{1}{n}u^\T_{\cdot j} W_{\cdot j} \bigg\} + \lambda_1 \sum\limits_{j \neq j'}\bigg(\bigg| \beta_{jj'} + \dfrac{u_{jj'}}{\sqrt{n}}\bigg|-|\beta_{jj'}|\bigg) \\ & = \sum\limits_{j=1}^p \bigg\{ \dfrac{1}{2n} u^\T_{\cdot j} C_{-j} u_{\cdot j} -\dfrac{1}{n}u^\T_{\cdot j} W_{\cdot j} +\lambda_1\bigg(\bigg\|\beta_{\cdot j} + \dfrac{u_{\cdot j}}{\sqrt{n}}\bigg\|_1 - \|\beta_{\cdot j}\|_1\bigg) \bigg\} \\ & \triangleq \sum\limits_{j=1}^p \{ G_j(u) \}, \end{align} where $~C_{-j}~$ denotes the $~(p~-~1)~\times~(p-1)~$ matrix without the $~j~$-th column and the $~j~$-th row, $~j=1,\ldots,p~$. We have $\hat~u~=~\arg\min~_u~V(u).$ Note that according to [17] there are universal positive constants $c'$ and $c$ such that with probability at least $1~-~c'\exp(-cn)$, we have for all $v~\in~\mR^p$, $$\|C^{1/2}v\|_2\geqslant \dfrac{1}{8}\|\Sigma^{1/2} v\|_2.$$ By (C.2) and $\|\hat~u_{S^c_j}\|_1~\leqslant~3\|\hat~u_{S_j}\|_1$, we have $\hat~u^\T_{\cdot~j}~C_{-j}~\hat~u_{\cdot~j}~\geqslant~\kappa_1\|\hat~u_{S_j}\|^2_2.$ Then for $~j~=1,\ldots,p~$, we have, for $u_{\cdot~j}$ satisfying (C.2), that \begin{align} G_j(u ) &= \dfrac{1}{2n} u^\T_{\cdot j} C_{-j} u_{\cdot j} -\dfrac{1}{n}u^\T_{\cdot j} W_{\cdot j} +\lambda_1\bigg(\bigg\|\beta_{\cdot j} + \dfrac{u_{\cdot j}}{\sqrt{n}}\bigg\|_1 - \|\beta_{\cdot j}\|_1\bigg) \\ & \geqslant \dfrac{1}{n} \bigg[\dfrac{ \kappa_1}{2}\|u_{S_j}\|_2^2 -|u^\T_{S_j} W_{S_j}|- \sqrt{n}\lambda_1\|u_{S_j}\|_1\bigg] + \bigg[\dfrac{\lambda_1}{\sqrt{n}}\|u_{S^c_j}\|_1 - \dfrac{1}{n}|u^\T_{S^c_j} W_{S^c_j}| \bigg]. \tag{8} \end{align} Conditional on $~\{~2\|W_{\cdot~j}\|_{\infty}~\leqslant~\sqrt{n}\lambda_1~\}~$, the second part of 8 can be calculated as $$\dfrac{\lambda_1}{\sqrt{n}}\|u_{S^c_j}\|_1 - \dfrac{1}{n}|u^\T_{S^c_j} W_{S^c_j}| \geqslant \sum_{j' \notin S^c_j} |u_{jj'}|\bigg[ \dfrac{\lambda_1}{\sqrt{n}}- \dfrac{\|W_{S^c_j}\|_\infty}{n} \bigg]>0. \tag{9}$$ By (C.1), the first term of the right-hand side of 8 is bounded as \begin{align*}& \dfrac{1}{n} \bigg[\dfrac{ \kappa_1}{2}\|u_{S_j}\|_2^2 -|u^\T_{S_j} W_{S_j}|- \sqrt{n}\lambda_1\|u_{S_j}\|_1\bigg] \\ & \geqslant \dfrac{1}{n}\|u_{S_j}\|_2 \bigg[\dfrac{ \kappa_1}{2}\|u_{S_j}\|_2 - \|W_{S_j}\|_2 - \sqrt{n}\lambda_1\cdot \sqrt{l_0}\bigg]. \end{align*} Hence $G_j(u)$ is greater than zero as well when $$\|u_{S_j}\|_2 >\dfrac{2}{ \kappa_1} \{ \|W_{S_j}\|_2 + \sqrt{l_0}\sqrt{n}\lambda_1 \}.$$ Conditional on $\{~2\|W_{S_j}\|_{2}~\leqslant~\sqrt{l_0}\sqrt{n}\lambda_1\}$, define $$M_0 \equiv \dfrac{3}{ \kappa_1}\sqrt{l_0} \sqrt{n}\lambda_1.$$ Since we have $G_{j}(0)~=0$, it follows that the minimum of $G_j(u)$ cannot be attained at any $u$ satisfying $\|\hat~u_{S_j}\|_2~>~M_0$. Thus, conditional on $~\{~2\|W_{S_j}\|_2~\leqslant~\sqrt{l_0}\sqrt{n}\lambda_1\}~$ and (C.2), we have $~\|\hat~u_{S_j}\|_{2}~\leqslant~M_0$.

ProofProof of Theorem rm3.1. According to Lemma 3, (C.1) and (C.2), conditional on $$\{ 2\|W_{S_j}\|_{2} \leqslant \sqrt{l_0}\sqrt{n}\lambda_1\} \cap \{ 2\|W_{\cdot j}\|_{\infty} \leqslant \sqrt{n}\lambda_1\},$$ we have for $j~=~1,\ldots,p$, $$\|\hat u_{S_j} \|_2\leqslant \dfrac{3}{ \kappa_1}\sqrt{l_0} \sqrt{n}\lambda_1$$ and $$\sqrt{n}\|\hat \beta_{S_j} - \beta_{S_j}\|_\infty \leqslant \|\hat u_{S_j} \|_2 \leqslant \dfrac{3}{ \kappa_1} \sqrt{l_0} \sqrt{n}\lambda_1.$$ Since $\lambda_1~=~K_1\sqrt{\log~p/n}$, we have for every $(j,j')~\in~S$, $\hat~\beta_{jj'}~\neq~0$.

For the event in Lemma 3, by $n~>~K_2~l_0~\log~p$, we have \begin{align*}&{\rm P}\bigg(\|W_{\cdot j}\|_{\infty} > \frac{K_1}{2} \sqrt{\log p}\bigg) \leqslant p \cdot \bigg(\frac{K_1}{2} \sqrt{\log p/n}\bigg)^{-1} \exp\bigg(-\dfrac{K_1}{4}\log p\bigg) < \frac{1}{p}, \\ &{\rm P}\bigg(\|W_{S_j}\|_{2} > \frac{K_1}{2} \sqrt{l_0}\sqrt{\log p} \bigg) \leqslant l_0 \cdot \bigg(\frac{K_1}{2} \sqrt{\log p/n}\bigg)^{-1} \exp\bigg(-\dfrac{K_1}{4}\log p\bigg) < \frac{1}{p} \end{align*} and for all $v~\in~\mR^p$, $${\rm P}\bigg( \|C^{1/2}v \|_2\geqslant \dfrac{1}{8} \|\Sigma^{1/2} v \|_2\bigg) <\frac{1}{p}.$$ Thus, we have \begin{align}{\rm P}( S \nsubseteq \hat A)& \leqslant\frac{1}{p}\rightarrow 0, \textrm{as} n \rightarrow \infty. \end{align} This completes the proof.

Now we prove Theorem 3.2. We change the definition of $~S~$ a little. Specifically, let $~S~$ be the set containing both diagonal and off-diagonal nonzero entries of $~\Theta~$, i.e., $$S = \{ (j,j'): \theta_{jj'} \neq 0 \}, \tag{10}$$ and $~\mbox{Cardinality}(S)~=q~+p~$. Then we introduce the following Lemma, where more details can be found in [37] and [9].

Lemma 4. By rm (C.4) let $~Z_i$'s be independent identically distributed from $~\mathcal{N}(0,\Sigma)~$ and $~\Lambda_{\max}(\Sigma)~<~\kappa_2~<~\infty~$. Then for $~\Sigma~=~\{\sigma_{jj'}\}~$, the associated sample covariance $~\hat~\Sigma~=\{~\hat~\sigma_{jj'}~\}~$ satisfies the tail bound $${\rm P}(|\hat \sigma_{jj'} - \sigma_{jj'}|> \delta) \leqslant K_3 \exp(-K_4n\delta^2),$$ where $~K_3~$ and $~K_4~$ are positive constants depending on the maximum eigenvalue of $~\Sigma~$, and $~|\delta|~\leqslant~\kappa_3~$, where $~\kappa_3~$ depends on $~\kappa_2~$ as well.

Proof. We omit the proof and refer the readers to [9,37].

Note that given the property of the solution of LASSO, we have $~\mbox{Cardinality}(\hat~A)~<~n\cdot~p$, and thus $\hat~A$ satisfies (C.3). Furthermore, according to Theorem 3.1, we have $S~\subseteq~\hat~A$ with high probability. For notational simplicity, we denote it as $~A~$ for the rest of the proof instead of $~\hat~A~$.

Lemma 5. For any $~\lambda_2~>0~$ and sample covariance $~\hat~\Sigma~$ with strictly positive diagonal elements, the restricted log-determinant problem has a unique solution $~\hat~\Theta~\succ~0~$ characterized by $$- \hat \Theta^{-1} + \hat \Sigma + \lambda_2 \hat Z =0, \tag{11}$$ where $~\hat~Z_A~$ is an element of the subdifferential $\partial~(\sum~_{(j,j')~\in~{A}}~|w_{jj'}\theta_{jj'}|)$ such that $$\hat Z_{jj'} = \left \{\! \begin{array}{ll} {\rm sign} (\hat \theta_{jj'}) \cdot \hat w_{jj'}, \textrm{if} \hat \theta_{jj'} \neq 0 , \\ \text{a real number } \in [-\hat w_{jj'},\hat w_{jj'}], \textrm{if} \hat \theta_{jj'} = 0. \end{array} \right.$$

This result is similar to [10], and hence we omit the proof here.

Lemma 6. Suppose rm(C.1)–(C.3) hold. Conditional on the first step solution, with appropriately chosen $~\lambda_2~$ and the sample size, the following event holds with probability at least $~1-~\frac{1}{p^{\tau-1}}~$ $(~\tau~>~1)$, $${\rm sign}(\hat \Theta_{ S^c}) ={\rm sign}(\Theta_{S^c}) = 0.$$

Proof. We assume there exists a solution $\bar{\Theta}~$ for the following restricted log-determinant problem: $$\bar{\Theta} = \arg \min\limits_{\Theta \in \mathcal{M}_{S}} \{-\log |\Theta| + \mbox{tr} (\Theta \hat \Sigma) + \lambda_2 \widetilde{Z}\}, \tag{12}$$ where $\mathcal{M}_{S}~=~\{\Theta~\in~\mR^{p~\times~p};~\Theta~\succ~0~~\text{and}~~\theta_{jj'}~=~0~~\text{for~all}~~(j,j')~\notin~S,~j~\neq~j'\}$.

By construction, set $\widetilde{Z}_{jj'}~=~\frac{1}{\lambda_2}~(-~\hat~\Sigma_{jj'}~+~[\bar{\Theta}^{-1}]_{jj'})$, which makes $~\widetilde{Z}~$ satisfy 11. If we are able to show $|\widetilde{Z}_{jj'}~|~\leqslant~w_{jj'}$, where $~(j,j')~\in~A/S~$, then it implies $~\bar{\Theta}~$ is equal to the solution $~\hat~\Theta~$ of the original criterion. The argument is along the lines of a result due to [10]. We apply it to the second step of the proposed method, which has shrunk many edges to zeros.

Let $~\Delta~=~\bar{\Theta}-~\Theta~$, and $~R(\Delta)~$ be the difference between the gradient $~\nabla~(\log~|\bar~\Theta|)~=~\bar{\Theta}^{-1}~$ and its first-order Taylor expansion around $~\Theta~$ by using the first and second derivatives of the log-determinant function, which is $$R(\Delta) = \bar{\Theta}^{-1} - \Theta^{-1} + \Theta^{-1} \Delta \Theta^{-1}.$$

Then the solution of 12 can be rewritten as $$\Theta^{-1} \Delta \Theta^{-1} + H - R(\Delta) + \lambda_2 \widetilde{Z} = 0,$$ where $~H~\triangleq~\hat~\Sigma~-~\Theta^{-1}=\hat~\Sigma~-~\Sigma$. Let $~\Gamma~=~\Theta^{-1}~\otimes~\Theta^{-1}~$, where $~\otimes~$ denotes the Kronecker matrix product. Consider the sub-block of $~\Gamma~$, i.e., $$\Gamma_{SS} := [\Theta^{-1} \otimes \Theta^{-1}]_{SS} \in R^{(q+p) \times (q+p)},$$ where $~q~$ is the number of nonzero off-diagonal entries of the concentration matrix (divided by 2) and $~p~$ is the number of diagonal elements of the concentration matrix. Let $~\bar~\Delta~$ be the vector version of $~\Delta~$. The above equality can be rewritten into two blocks, \begin{align*}&\Gamma_{SS}\bar \Delta + \bar H_{S} - \bar R_{S} + \lambda_2 \bar{\hat {Z}}_{S}= 0, \\ &\Gamma_{(A/S)S}\bar \Delta + \bar H_{A/S} - \bar R_{A/S} + \lambda_2 \bar{\widetilde{Z}}_{A/S}= 0. \end{align*} It follows that $$\bar{\widetilde{Z}}_{A/S} =- \frac{1}{\lambda_2} (\Gamma_{(A/S)S}\bar \Delta + \bar H_{(A/S)}- \bar R_{(A/S)} ). \tag{13}$$

We now consider the upper bounds of $~\|\bar~H_A\|_{\infty}$ and $~\|\bar~R_A\|_{\infty}~$. First, consider a multivariate Gaussian vector; the deviation of the sample covariance matrix $~\hat~\Sigma~$ has an exponential type tail bound. Let $~\delta~=~(\frac{\tau~\log(np)}{n})^{1/2}~$. We have \begin{align} {\rm P}(\|\bar H_A\|_{\infty}>\delta)&< n \cdot p \cdot {\rm P}(|\hat \sigma_{jj'} - \sigma_{jj'}| > \delta) \\ & \leqslant n\cdot p\cdot K_3\exp(-K_4n\delta^2) \\ &\le \dfrac{1}{p^{\tau-1}}. \end{align} Second, use [10], and set $~G_0~=~\frac{3}{2}~l_0~\|\Sigma\|^3_\infty$, where $\|\Sigma\|_{\infty}=\max_{j~=~1,\ldots,p}\sum_{j'=1}^p|\Sigma_{jj'}|$. Then conditional on the first step, we have \begin{align}\| \bar R_A\|_{\infty} & \leqslant G_0 \|\hat \Theta_A - \Theta_A\|_{\infty}. \end{align} Note that the above inequality holds according to the result $$\|\hat \Theta_A - \Theta_A\|_{\infty} < (3 l_0 \cdot \|\Sigma\|_\infty)^{-1}.$$

Given $j$, since $~\mbox{Cardinality}(A_j)~\leqslant~n~$, the maximum likelihood estimate of $~\Theta~$ based on $~A~$ exists and is unique [38]. By [16,39], $~\widetilde{\Theta}~$ converges to a certain matrix with the rate at least $~1/p^{\tau~-~1}~$.

Define $~\mathcal{B}~$ as \begin{align}\mathcal{B} := \{\Delta_S: \|\Delta_S\|_\infty \leqslant 2K(\Gamma) (\| H_S\|_{\infty} +\lambda_2 M_1 \}, \end{align} where $~K(\Gamma)~=~\|\Gamma^{-1}_{SS}\|_{\infty}$ and there exists a constant $~M_1~$ such that $~\|\hat~w_{S}\|_{\infty}~<~M_1~$. By [10], the solution $\Delta_S~=~\bar{\Theta}_S-~\Theta_S$ is contained inside the $~l_{\infty}~$ ball. Conditional on the event $~\mathcal{B}~$ and choosing a suitable $~n~$, we have $\|\bar~R_S\|_{\infty}\leqslant\delta~$.

Now we prove the upper bound of $~|\bar{\widetilde{Z}}_{A/S}|~$ in 13. By setting $\widetilde{Z}_{jj'}$, we have $~-\bar{\Theta}^{-1}_S~+~\hat~\Sigma_S~+~\lambda_2~\widetilde{Z}_S~=~0~$. Thus the following equality holds: $$\bar\Delta_S-(\Gamma_{SS})^{-1}{\rm vec}(-[(\Theta+\Delta)^{-1}]_S + \hat \Sigma_S + \lambda_2 \widetilde{Z}_S ) = \bar\Delta_S. \tag{14}$$

By a direct calculation, we have \begin{align} &\bar\Delta_S-(\Gamma_{SS})^{-1}{\rm vec}(-[(\Theta+\Delta)^{-1}]_S + \hat \Sigma_S + \lambda_2 \widetilde{Z}_S ) \\ & =(\Gamma_{SS})^{-1}R_S - (\Gamma_{SS})^{-1}(\bar H_S +\lambda_2 \bar{\widetilde{Z}}_S). \tag{15} \end{align} By letting $~\lambda_2=~\frac{4~M_2}{\alpha~}\delta$ and setting $~\max_{(j,j')~\in~A/S}|\widetilde{\theta}_{jj'}|~\leqslant~M_2~$, from 14 and 15, with $e~\in~A~/~S~$, it follows \begin{align}|e\bar{\widetilde{Z}}_{A/S}| & = e\bigg|\frac{1}{\lambda_2} (\Gamma_{(A/S)S}\bar \Delta + \bar H_{(A/S)}- \bar R_{(A/S)} )\bigg| \\ & = \bigg|-\dfrac{1}{\lambda_2}e \Gamma_{(A/S)S}(\Gamma_{SS})^{-1} (\bar H_S - \bar R_S) +e \Gamma_{(A/S)S} (\Gamma_{SS})^{-1} \bar{\hat {Z}}_S \\ & \,- e\dfrac{1}{\lambda_2} (\bar H_{(A/S)} - \bar R_{(A/S)})\bigg| \\ & \leqslant \dfrac{1}{\lambda_2} |e\Gamma_{(A/S)S}(\Gamma_{SS})^{-1} | (\|\bar H_S\|_{\infty} + \|\bar R_S\|_{\infty}) + |e\Gamma_{(A/S)S} (\Gamma_{SS})^{-1} \bar{\widetilde{Z}}_S | \\ & \, + \dfrac{1}{\lambda_2} (\|\bar H_S\|_{\infty} + \|\bar R_S\|_{\infty}) \\ &\leqslant \dfrac{\alpha}{2M_2}(1-\alpha) +(1- \alpha)M_1 + \dfrac{\alpha}{2M_2} \\ & <\frac{q}{M_2}. \end{align} Then we have $|\widetilde{Z}_{jj'}|~\leqslant~w_{jj'}$ where $(j,j')~\in~A/S$, which implies $\bar~\Theta$ is equal to the solution $\hat~\Theta$ and ${\rm~sign}(\hat~\Theta_{S^c})~=~0.$

ProofProof of Theorem rm3.2. We now have all necessary ingredients to prove Theorem 3.2. We show that with high probability the nonzero edge set of $\bar~\Theta$ is equal to the solution of the proposed two-step method $\hat~\Theta$. Thus we have $$\|\hat{\Theta}_S -\Theta_S\|_{\infty} \leqslant 2K(\Gamma) (\| H_S\|_{\infty} +\lambda_2M_1).$$

Let $~M~=~M_1~\times~M_2~$. With probability at least $~1-~\frac{1}{p^{\tau-1}}~$, we have $$\|\hat{\Theta}_S -\Theta_S\|_{\infty} \leqslant 2K(\Gamma)\bigg(1+\dfrac{4M}{\alpha}\bigg) \bigg(\dfrac{\tau\log(np)}{n}\bigg)^{1/2}.$$

The minimum absolute value $~\theta_{\min}~$ of nonzero entries of $\Theta$ satisfies $$\theta_{\min} >4K(\Gamma)\bigg(1+\dfrac{4M}{\alpha}\bigg) \bigg(\dfrac{\tau\log(np)}{n}\bigg)^{1/2}.$$ Hence it equals to saying ${\rm~sign}(\hat~\Theta_{S})~={\rm~sign}(\Theta_{S}).$

In addition, according to the result of Lemma 6, with probability at least $~1-~\frac{1}{p^{\tau-1}}~$, we have $${\rm sign}(\hat \Theta_{S^c}) ={\rm sign}(\Theta_{S^c}) = 0. \tag{16}$$

Since 16 is built on the first step of the proposed method, overall, we have \begin{align}{\rm P}({\rm sign}(\hat \Theta) \neq {\rm sign}(\Theta)) & \leqslant {\rm P}({\rm sign}(\hat \Theta_A) \neq {\rm sign}(\Theta_A), S \subseteq A) + {\rm P}(S \nsubseteq A) \\ &\leqslant \dfrac{1}{p^{\tau-1}}+ \exp(-n^{\eta}). \end{align} This completes the proof.

ProofProof of Theorem rm3.3. Note we have $$\|\hat{\Theta}_S -\Theta_S\|_{\infty} \leqslant 2K(\Gamma)\bigg(1+\dfrac{4M}{\alpha}\bigg) \bigg(\dfrac{\tau\log (np)}{n}\bigg)^{1/2},$$ with probability at least $~1-\frac{1}{p^{\tau-1}}-~\exp(-n^{\eta})~$. By the setting of $~S~$ in 10, we have $~\mbox{Cardinality}(S)~=~p+q~$. It is straightforward to calculate that \begin{align}\|\hat \Theta -\Theta\|_{F} &\leqslant 2K(\Gamma)\bigg(1+\dfrac{4M}{\alpha}\bigg) \bigg(\dfrac{\tau(p+q)\log(np)}{n}\bigg)^{1/2}. \end{align} This completes the proof.

### References

[1] Banerjee O, Ghaoui L-E, d'Aspremont A. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J Mach Learn Res, 2008, 9: 485--516. Google Scholar

[2] Bickel P J, Levina E. Regularized estimation of large covariance matrices. Ann Statist, 2008, 36: 199-227 CrossRef Google Scholar

[3] Burnatowska-Hledin M A, Kossoris J B, Van Dort C J. T47D breast cancer cell growth is inhibited by expression of VACM-1, a cul-5 gene.. Biochem BioPhys Res Commun, 2004, 319: 817-825 CrossRef PubMed Google Scholar

[4] Chang H-Y, Nuyten D-S, Sneddon J-B, et al. Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc Natl Acad Sci USA, 2001, 102: 3738--343. Google Scholar

[5] Chen M, Ren Z, Zhao H. Asymptotically Normal and Efficient Estimation of Covariate-Adjusted Gaussian Graphical Model.. J Am Statistical Association, 2016, 111: 394-406 CrossRef PubMed Google Scholar

[6] Dempster A-P. Covariance selection. Biometrics, 1972, 28: 157--175. Google Scholar

[7] Fan J-Q, Feng Y, Wu Y-C. Network exploration via the adaptive LASSO and scad penalties. Ann Appl Stat, 2009, 1: 521--541. Google Scholar

[8] Fan J, Li R. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. J Am Statistical Association, 2001, 96: 1348-1360 CrossRef Google Scholar

[9] Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space. J R Statistical Soc-Ser B (Statistical Methodology), 2008, 70: 849-911 CrossRef Google Scholar

[10] Ferguson T S. An Inconsistent Maximum Likelihood Estimate. J Am Statistical Association, 1982, 77: 831-834 CrossRef Google Scholar

[11] Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso.. Biostatistics, 2008, 9: 432-441 CrossRef PubMed Google Scholar

[12] Han X, Poon R Y C. Critical differences between isoforms of securin reveal mechanisms of separase regulation.. Mol Cell Biol, 2013, 33: 3400-3415 CrossRef PubMed Google Scholar

[13] Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. New York: Springer, 2009. Google Scholar

[14] Hayward B E, Moran V, Strain L. Bidirectional Imprinting of a Single Gene: GNAS1 Encodes Maternally, Paternally, and Biallelically Derived Proteins. Proc Natl Acad Sci USA, 1998, 95: 15475-15480 CrossRef PubMed ADS Google Scholar

[15] Janková J, van de Geer S. Confidence intervals for high-dimensional inverse covariance estimation. Electron J Statist, 2015, 9: 1205-1229 CrossRef Google Scholar

[16] Lam C, Fan J-Q. Sparsistency and rates of convergence in large covariance matrix estimation. Ann Statist, 2009, 37: 42--54. Google Scholar

[17] Li N, Zhang J, Liao D. Association between C4, C4A, and C4B copy number variations and susceptibility to autoimmune diseases: a meta-analysis. Sci Rep, 2017, 7: 42628 CrossRef PubMed ADS Google Scholar

[18] Meinshausen N. Relaxed Lasso. Comput Stat Data Anal, 2007, 52: 374-393 CrossRef Google Scholar

[19] Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the Lasso. Ann Statist, 2006, 34: 1436-1462 CrossRef Google Scholar

[20] Meinshausen N, Yu B. Lasso-type recovery of sparse representations for high-dimensional data. Ann Statist, 2009, 37: 246-270 CrossRef Google Scholar

[21] Miyoshi Y, Iwao K, Egawa C. Association of centrosomal kinase STK15/BTAK mRNA expression with chromosomal instability in human breast cancers.. Int J Cancer, 2001, 92: 370-373 CrossRef PubMed Google Scholar

[22] Negahban S, Ravikumar P, Wainwright M-J, et al. A unified framework for high-dimensional analysis of $m$-estimators with decomposable regularizers. Statist Sci, 2012, 27: 1348--1356. Google Scholar

[23] Ning Y, Liu H. A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Ann Statist, 2017, 45: 158-195 CrossRef Google Scholar

[24] Otterbach F, Callies R, Frey U H. The T393C polymorphism in the gene GNAS1 of G protein is associated with survival of patients with invasive breast carcinoma.. Breast Cancer Res Treat, 2007, 105: 311-317 CrossRef PubMed Google Scholar

[25] Peng J, Wang P, Zhou N. Partial Correlation Estimation by Joint Sparse Regression Models.. J Am Statistical Association, 2009, 104: 735-746 CrossRef PubMed Google Scholar

[26] Raskutti G, Wainwright M-J, Yu B. Restricted eigenvalue properties for correlated gaussian designs. J Mach Learn Res, 2010, 11: 2241--2259. Google Scholar

[27] Raskutti G, Wainwright M J, Yu B. Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$-Balls. IEEE Trans Inform Theor, 2011, 57: 6976-6994 CrossRef Google Scholar

[28] Ravikumar P, Wainwright M J, Raskutti G. High-dimensional covariance estimation by minimizing $\ell_1$-penalized log-determinant divergence. Electron J Statist, 2011, 5: 935-980 CrossRef Google Scholar

[29] Rothman A J, Bickel P J, Levina E. Sparse permutation invariant covariance estimation. Electron J Statist, 2008, 2: 494-515 CrossRef Google Scholar

[30] Taylor J, Tibshirani R. Can J Stat, 2018, 46: 41-61 CrossRef PubMed Google Scholar

[31] Uhler C, Raskutti G, Bühlmann P. Geometry of the faithfulness assumption in causal inference. Ann Statist, 2013, 41: 436-463 CrossRef Google Scholar

[32] van de Geer S, Bühlmann P, Ritov Y. On asymptotically optimal confidence regions and tests for high-dimensional models. Ann Statist, 2014, 42: 1166-1202 CrossRef Google Scholar

[33] van de Vijver M-J, He Y-D, van't Veer L-J, et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med, 2002, 37: 1999--2009. Google Scholar

[34] Wang H-C, Chiu C-F, Tsai R-Y, et al. Association of genetic polymorphisms of EXO1 gene with risk of breast cancer in taiwan. Anticancer Res, 2009, 29: 3897--3901. Google Scholar

[35] West M, Blanchette C, Dressman H. Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA, 2001, 98: 11462-11467 CrossRef PubMed ADS Google Scholar

[36] Yuan M. Efficient computation of $\ell_1$ regularized estimates in Gaussian graphical models. J Comput Graphical Stat, 2008, 17: 809-826 CrossRef Google Scholar

[37] Yuan M, Lin Y. Model selection and estimation in the Gaussian graphical model. Biometrika, 2007, 94: 19-35 CrossRef Google Scholar

[38] Zhang C H, Zhang S S. Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc B, 2014, 76: 217-242 CrossRef Google Scholar

[39] Zhou S-H, Rütimann P, Xu M, et al. High-dimensional covariance estimation based on Gaussian graphical models. J Mach Learn Res, 2011, 12: 2975--3026. Google Scholar

• Figure 3

(Color online) The inferred graph from the breast cancer microarray data set by the proposed method. The yellow nodes correspond to the seven genes that have the highest estimated degrees

• Table 1

Table 1Comparison of Frobenius norm errors for four simulation models

 $n=100$ $p=50$ Two-step method Adaptive GLASSO Gelato SCAD Model 1 0.16 (0.06) 0.17 (0.07) 0.17 (0.09) 0.38 (0.05) Model 2 2.23 (0.12) 2.32 (0.16) 2.91 (0.11) 2.29 (0.07) Model 3a 1.41 (0.14) 2.63 (0.11) 1.68 (0.31) 2.16 (0.12) Model 3b 2.10 (0.14) 2.79 (0.11) 2.65 (0.41) 2.54 (0.09) $n=50$ $p=100$ Two-step method Adaptive GLASSO Gelato SCAD Model 1 0.19 (0.07) 0.26 (0.13) 0.24 (0.14) 0.82 (0.02) Model 2 4.36 (0.16) 4.61 (0.07) 4.86 (0.04) 5.81 (0.02) Model 3a 4.08 (0.25) 4.43 (0.25) 4.22 (0.36) 5.65 (0.35) Model 3b 5.69 (0.33) 6.19 (0.17) 5.86 (0.88) 6.98 (0.15) $n=200$ $p=400$ Two-step method Adaptive GLASSO Gelato SCAD Model 1 0.15 (0.02) 0.82 (0.03) 0.61 (0.02) 0.60 (0.02) Model 2 4.25 (0.07) 5.19 (0.08) 4.83 (0.06) 5.23 (0.05) Model 3a 6.55 (0.05) 6.70 (0.09) 5.41 (0.08) 6.56 (0.06) Model 3b 7.16 (0.19) $10.09~(0.04)~\,$ 9.17 (0.05) 9.83 (0.05) $n=400$ $p=800$ Two-step method Adaptive GLASSO Gelato SCAD Model 1 0.08 (0.04) 0.78 (0.02) 0.25 (0.02) 1.03 (0.01) Model 2 4.07 (0.04) 6.40 (0.03) 4.13 (0.02) 6.45 (0.04) Model 3a 9.28 (0.20) $14.77~(0.36)~\,$ $15.99~(0.12)~\,$ 9.93 (0.08) Model 3b $10.51~(0.04)~\,$ $16.57~(0.17)~\,$ $18.21~(0.10)~\,$ $14.39~(0.03)~\,$

Citations

Altmetric