logo

SCIENTIA SINICA Informationis, Volume 47 , Issue 11 : 1493-1509(2017) https://doi.org/10.1360/N112017-00111

An algorithm for differential privacy streaming data publication based on matrix mechanism under exponential decay mode

More info
  • ReceivedMay 17, 2017
  • AcceptedJun 13, 2017
  • PublishedNov 14, 2017

Abstract


Funded by

国家自然科学基金(61300026)

福建省自然科学基金(2017J01754)


References

[1] Fung B, Wang K, Chen R, et al. Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv, 2010, 42: 2623--2627. Google Scholar

[2] Dwork C. Differential privacy. In: Proceedings of the 33rd International Conference on Automata, Languages and Programming, Venice, 2006. 1--12. Google Scholar

[3] Zhou S G, Li F, Tao Y F. Privacy preservation in database applications: a survey. Chin J Comp, 2009, 32: 847-861 CrossRef Google Scholar

[4] Xiong P, Zhu T Q, Wang X F. A survey on differential privacy and applications. Chin J Comput, 2014, 37: 101--122. Google Scholar

[5] Zhang X J, Meng X F. Differential privacy in data publication and analysis. Chin J Comput, 2014, 37: 927--949. Google Scholar

[6] Dwork C, Naor M, Pitassi T, et al. Differential privacy under continual observation. In: Proceedings of the 42nd ACM Symposium on Theory of Computing, Cambridge, 2010. 715--724. Google Scholar

[7] Chan T H H, Shi E, Song D. Private and continual release of statistics. ACM Trans Inf Syst Secur, 2011, 14: 405--417. Google Scholar

[8] Cao J, Xiao Q, Ghinita G, et al. Efficient and accurate strategies for differentially-private sliding window queries. In: Proceedings of the 16th International Conference on Extending Database Technology, Genoa, 2013. 191--202. Google Scholar

[9] Zhang X J, Meng X F. Stream histogram publication method with differential privacy. J Softw, 2016, 27: 381--393. Google Scholar

[10] Bolot J, Fawaz N, Muthukrishnan S, et al. Private decayed predicate sums on streams. In: Proceedings of the 16th International Conference on Database Theory, Genoa, 2013. 284--295. Google Scholar

[11] Li C, Hay M, Rastogi V, et al. Optimizing linear counting queries under differential privacy. In: Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Indianapolis, 2010. 123--134. Google Scholar

[12] Yuan G, Zhang Z, Winslett M. Low-rank mechanism. Proc VLDB Endow, 2012, 5: 1352-1363 CrossRef Google Scholar

[13] Hay M, Rastogi V, Miklau G. Boosting the accuracy of differentially private histograms through consistency. Proc VLDB Endow, 2010, 3: 1021-1032 CrossRef Google Scholar

[14] Kellaris G, Papadopoulos S, Xiao X. Differentially private event sequences over infinite streams. Proc VLDB Endow, 2014, 7: 1155-1166 CrossRef Google Scholar

  • Figure 1

    Noise accumulation

  • Figure 2

    Comparison of query distortion with different moments (Search Logs). (a) $\epsilon=1.0$; (b) $\epsilon=0.1$; (c) $\epsilon=0.01$

  • Figure 3

    Comparison of query distortion with different moments (Nettrace). (a) $\epsilon=1.0$; (b) $\epsilon=0.1$; (c) $\epsilon=0.01$

  • Figure 4

    Comparison of query distortion with different moments (WorldCup98). (a) $\epsilon=1.0$; (b) $\epsilon=0.1$; (c) $\epsilon=0.01$

  • Figure 5

    Comparison of query distortion with different decay factors (Search Logs). (a) $\epsilon=1.0$; (b) $\epsilon=0.1$; (c) $\epsilon=0.01$

  • Figure 6

    Comparison of query distortion with different decay factors (Nettrace). (a) $\epsilon=1.0$; (b) $\epsilon=0.1$; (c) $\epsilon=0.01$

  • Figure 7

    Comparison of query distortion with different decay factors (WorldCup98). (a) $\epsilon=1.0$; (b) $\epsilon=0.1$; (c) $\epsilon=0.01$

  • Figure 8

    Comparison of different algorithm efficiency

  • Figure 9

    Run time of diagonal optimization algorithm

  • Table 1   Data sets
    Data set Search Logs NetTrace WorldCup98
    Size 32768 65536 7518579
  •   

    Algorithm 1 策略矩阵构造算法BuildL

    Require:数据规模$N$,衰减因子$p$;

    Output:策略矩阵${\boldsymbol~L}$;

    初始化策略矩阵${\boldsymbol~L}$,将所有元素置为0;

    for $j=1$ to $N$

    $i=j$;

    while $i<N$ do

    更新矩阵元素$\textbf{{L}}[i][j]=p^{i-j}$;

    $i\leftarrow~i+\mathrm{lowbit}(i)$;

    end while

    end for

    返回矩阵${\boldsymbol~L}$.

  •   

    Algorithm 2 策略矩阵构造算法BuildB

    Require:数据规模$N$,衰减因子$p$;

    Output:还原矩阵${\boldsymbol~B}$;

    初始化策略矩阵${\boldsymbol~B}$,将所有元素置为0;

    for $i=1$ to $N$

    $j=i$;

    while $j>0$ do

    更新矩阵元素$\textbf{{B}}[i][j]=p^{i-j}$;

    $j\leftarrow~j-\mathrm{lowbit}(j)$;

    end while

    end for

    返回矩阵${\boldsymbol~B}$.

  •   

    Algorithm 3 指数衰减模式下差分隐私流数据发布DM

    Require:预设时刻上限$T$,衰减因子$p$;

    Output:每一时刻的发布结果$s_{t}$;

    for $t=1$ to $T$

    更新实际统计量$\Phi~_{\mathrm{lowbit}(t)}\leftarrow~D_{t}+\sum_{j=0}^{\mathrm{lowbit}(t)-1}\Phi~_{j}$;

    添加噪声$\overset{\sim~}{\Phi~_{\mathrm{lowbit}(t)}}\leftarrow~\Phi~_{\mathrm{lowbit}(t)}+\mathrm{Lap}(\frac{\Delta~\textbf{{L}}}{\epsilon~})$;

    $k\leftarrow~t,~s_{t}\leftarrow~0$;

    while $k>0$ do

    $s_{t}\leftarrow~s_{t}+(\overset{\sim~}{\Phi~_{\mathrm{lowbit}(k)}}\times~p^{i-k})$;

    $k\leftarrow~k-\mathrm{lowbit}(k)$;

    end while

    发布隐私数据$s_{t}$.

    end for

  •   

    Algorithm 4 对角阵系数求解算法getLambda

    Require:时刻上限$T$, 下标$k$, 衰减因子$p$;

    Output:对角阵系数$\lambda~_{k}$;

    初始化$\lambda~_{k}$为1, 根据式(32)计算出所需的系数$\delta~_{1}~\sim~\delta~_{\mathrm{log}_{2}(T)+1}$;

    $kt\leftarrow~k,~m\leftarrow~\mathrm{log}_{2}(T)+1,~\mathrm{div}\leftarrow~2^{m-1}$;

    while $\mathrm{div}\neq~kt$ do

    if $kt<\mathrm{div}$ then

    $\lambda~_{k}\leftarrow~\lambda~_{k}\times~\delta~_{m}$;

    else

    $kt\leftarrow~kt-\mathrm{div}$;

    end if

    $\mathrm{div}~\leftarrow~\frac{\mathrm{div}}{2},~m\leftarrow~m-1$;

    end while

    $\lambda~_{k}\leftarrow~\frac{\lambda~_{k}\times~(1-\delta~_{m})}{p}$;

    返回对角阵系数$\lambda~_{k}$.

  •   

    Algorithm 5 指数衰减模式下的基于对角矩阵优化差分隐私流数据发布算法DMFDA

    Require:时刻上限$T$, 衰减因子$p$;

    Output:每一时刻的发布结果$s_{t}$;

    for $t=1$ to $T$

    更新实际统计量$\Phi~_{\mathrm{lowbit}(t)}\leftarrow~D_{t}+\sum_{j=0}^{\mathrm{lowbit}(t)-1}\Phi~_{j}$;

    $\lambda~_{t}\leftarrow~\mathrm{getLambda}(T,t,p)$;

    添加噪声$\overset{\sim~}{\Phi~_{\mathrm{lowbit}(t)}}\leftarrow~\lambda~_{t}\times~\Phi~_{\mathrm{lowbit}(t)}+\mathrm{Lap}(\frac{1}{\epsilon~})$;

    $k\leftarrow~t,~s_{t}\leftarrow~0$;

    while $k>0$ do

    $s_{t}\leftarrow~s_{t}+\frac{(\overset{\sim~}{\Phi~_{\mathrm{lowbit}(k)}}\times~p^{i-k})}{\lambda~_{k}}$;

    $k\leftarrow~k-\mathrm{lowbit}(k)$;

    end while

    发布隐私数据$s_{t}$.

    end for