logo

SCIENTIA SINICA Mathematica, Volume 50 , Issue 5 : 695(2020) https://doi.org/10.1360/N012019-00092

Hyperparameter tuning methods in automated machine learning

More info
  • ReceivedApr 1, 2019
  • AcceptedJun 20, 2019
  • PublishedMay 7, 2020

Abstract


References

[1] Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach Learn Res, 2012, 13: 281--305. Google Scholar

[2] Mckay M D, Beckman R J, Conover W J. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 2000, 42: 55-61 CrossRef Google Scholar

[3] Bratley P, Fox B L. Algorithm 659: Implementing Sobol's quasirandom sequence generator. ACM Trans Math Software, 1988, 14: 88-100 CrossRef Google Scholar

[4] Shahriari B, Swersky K, Wang Z. Taking the human out of the loop: A review of Bayesian optimization. Proc IEEE, 2016, 104: 148-175 CrossRef Google Scholar

[5] Snoek J, Larochelle H, Adams R P. Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems. Red Hook: Curran Associates, 2012, 2951--2959. Google Scholar

[6] Hutter F, Hoos H H, Leyton-Brown K. Sequential model-based optimization for general algorithm configuration. In: International Conference on Learning and Intelligent Optimization. Berlin-Heidelberg: Springer, 2011, 507--523. Google Scholar

[7] Bergstra J S, Bardenet R, Bengio Y, et al. Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems. Red Hook: Curran Associates, 2011, 2546--2554. Google Scholar

[8] Breiman L. Random forests. Mach Learn, 2001, 45: 5--32. Google Scholar

[9] Bergstra J, Komer B, Eliasmith C. Hyperopt: A python library for model selection and hyperparameter optimization. Comput Sci Disc, 2015, 8: 014008 CrossRef Google Scholar

[10] Feurer M, Klein A, Eggensperger K, et al. Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2015, 2962--2970. Google Scholar

[11] Kotthoff L, Thornton C, Hoos H H, et al. Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. J Mach Learn Res, 2017, 18: 826--830. Google Scholar

[12] Wang Z, Zoghi M, Hutter F, et al. Bayesian optimization in high dimensions via random embeddings. In: Twenty-Third International Joint Conference on Artificial Intelligence. Menlo Park: AAAI Press/International Joint Conferences on Artificial Intelligence, 2013, 1778--1784. Google Scholar

[13] Kandasamy K, Schneider J, Póczos B. High dimensional Bayesian optimisation and bandits via additive models. In: International Conference on Machine Learning. Reykjavik: JMLR, 2015, 295--304. Google Scholar

[14] Lizotte D J, Greiner R, Schuurmans D. An experimental methodology for response surface optimization methods. J Global Optim, 2012, 53: 699-736 CrossRef Google Scholar

[15] Wang Z, Shakibi B, Jin L, et al. Bayesian multi-scale optimistic optimization. In: Artificial Intelligence and Statistics. Reykjavik: JMLR, 2014, 1005--1014. Google Scholar

[16] Ginsbourger D, Le Riche R, Carraro L. Kriging is well-suited to parallelize optimization. In: Computational Intelligence in Expensive Optimization Problems. Berlin-Heidelberg: Springer, 2010, 131--162. Google Scholar

[17] Desautels T, Krause A, Burdick J W. Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization. J Mach Learn Res, 2014, 15: 3873--923. Google Scholar

[18] Hutter F, Hoos H H, Leyton-Brown K. Parallel algorithm configuration. In: International Conference on Learning and Intelligent Optimization. Berlin-Heidelberg: Springer, 2012, 55--70. Google Scholar

[19] Swersky K, Snoek J, Adams R P. Multi-task Bayesian optimization. In: Advances in Neural Information Processing Systems. Red Hook: Curran Associates, 2013, 2004--2012. Google Scholar

[20] Bardenet R, Brendel M, Kégl B, et al. Collaborative hyperparameter tuning. In: International Conference on Machine Learning. Reykjavik: JMLR, 2013, 199--207. Google Scholar

[21] Klein A, Falkner S, Bartels S. Fast Bayesian hyperparameter optimization on large datasets. Electron J Stat, 2017, 11: 4945-4968 CrossRef Google Scholar

[22] Wu C H, Tzeng G H, Goo Y J. A real-valued genetic algorithm to optimize the parameters of support vector machine for predicting bankruptcy. Expert Syst Appl, 2007, 32: 397-408 CrossRef Google Scholar

[23] Tsai J T, Chou J H, Liu T K. Tuning the structure and parameters of a neural network by using hybrid Taguchi-genetic algorithm. IEEE Trans Neural Netw, 2006, 17: 69-80 CrossRef PubMed Google Scholar

[24] Juang C F. A hybrid of genetic algorithm and particle swarm optimization for recurrent network design. IEEE Trans Syst Man Cybern B, 2004, 34: 997-1006 CrossRef PubMed Google Scholar

[25] Guo X C, Yang J H, Wu C G. A novel LS-SVMs hyper-parameter selection based on particle swarm optimization. Neurocomputing, 2008, 71: 3211-3215 CrossRef Google Scholar

[26] Lorenzo P R, Nalepa J, Kawulok M, et al. Particle swarm optimization for hyper-parameter selection in deep neural networks. In: Proceedings of the Genetic and Evolutionary Computation Conference. New York: ACM, 2017, 481--488. Google Scholar

[27] Zoph B, Le Q V. Neural architecture search with reinforcement learning.. arXiv Google Scholar

[28] Baker B, Gupta O, Naik N, et al. Designing neural network architectures using reinforcement learning.. arXiv Google Scholar

[29] Fang K T, Wang Y. A sequential algorithm for optimization and its applications to regression analysis. In: Lecture Notes in Contemporary Mathematics. Beijing: Science Press, 1990, 17--28. Google Scholar

[30] Fang K T, Wang Y. Number-Theoretic Methods in Statistics. London: Chapman and Hall, 1994. Google Scholar

[31] Fang K T, Lin D K J, Winker P. Uniform design: Theory and application. Technometrics, 2000, 42: 237-248 CrossRef Google Scholar

[32] Fang K T, Li R, Sudjianto A. Design and Modeling for Computer Experiments. London: Chapman and Hall, 2006. Google Scholar

[33] Fang K T, Liu M Q, Qin H, et al. Theory and Application of Uniform Experimental Designs. Singapore: Springer, 2018. Google Scholar

[34] Hickernell F J. A generalized discrepancy and quadrature error bound. Math Comp, 1998, 67: 299-322 CrossRef Google Scholar

[35] Jin R, Chen W, Sudjianto A. An efficient algorithm for constructing optimal design of computer experiments. J Statist Plann Inference, 2005, 134: 268-287 CrossRef Google Scholar

[36] Crombecq K, Gorissen D, Deschrijver D. A novel hybrid sequential design strategy for global surrogate modeling of computer experiments. SIAM J Sci Comput, 2011, 33: 1948-1974 CrossRef Google Scholar

[37] van der Herten J, Couckuyt I, Deschrijver D. A fuzzy hybrid sequential design strategy for global surrogate modeling of high-dimensional computer experiments. SIAM J Sci Comput, 2015, 37: A1020-A1039 CrossRef Google Scholar

[38] Huang C M, Lee Y J, Lin D K J. Model selection for support vector machines via uniform design. Comput Statistist Data Anal, 2007, 52: 335-346 CrossRef Google Scholar

[39] Chang C C, Lin C J. LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol, 2011, 2: 1-27 CrossRef Google Scholar

  • 图 1

    (网络版彩图)自动化机器学习框架

  • 图 2

    (网络版彩图) 4种方法的比较: 在2维空间生成25个试验点. (a)格子点法; (b)随机搜索; (c)超拉丁方抽样; (d) Sobol序列

  • 图 3

    (网络版彩图) Bayes优化示例

  • 图 4

    (网络版彩图) SNTO与SNTO-New示意图. (a) 均匀设计$\boldsymbol{U_{20}(20^{2})}$; (b) SNTO; (c) SNTO-New

  • 图 5

    (网络版彩图) 不同最大尝试次数下, 各方法在SVM超参优化问题上的表现. (a) Heart 数据集; (b) Breast 数据集

  • 图 6

    (网络版彩图) 不同最大尝试次数下, 各方法在XGBoost超参优化问题上的表现. (a) Heart 数据集; (b) Breast 数据集

  • 图 7

    (网络版彩图) 不同超参优化方法的试验点分布情形(以SVM-Heart为例). (a)格子点法; (b)随机搜索; (c)超拉丁方抽样; (d) Sobol序列; (e) 均匀设计; (f) SMAC; (g) GP-EI; (h) TPE; (i) SNTO-New

  • 表 1   不同超参优化方法在各机器学习模型和数据集上的表现 (最大尝试次数100)
    模型 数据集 Grid Rand LHS Sobol UD SMAC GP-EI TPE SNTO-New
    SVM Heart 0.8560 0.8589 0.8582 0.8563 0.8586 0.8600 0.8630 0.8600 0.8619
    Breast 0.9793 0.9789 0.9782 0.9789 0.9789 0.9789 0.9798 0.9802 0.9808
    XGBoost Heart - 0.8571 0.8571 0.8556 0.8556 0.8600 0.8545 0.8571 0.8600
    Breast - 0.9796 0.9800 0.9800 0.9791 0.9807 0.9817 0.9816 0.9817
  •   

    Algorithm 1 Bayes优化算法在机器学习超参优化中的应用

    生成 $n_{0}$ 个初始试验点 $\{\x_{1},~\ldots,~\x_{n_{0}}\}$, 每个试验点代表一组超参.

    对每一组超参, 训练机器学习模型, 计算其模型表现$\{~y_{1},~\ldots,~y_{n_{0}}~\}$.

    基于已知试验点和对应表现, 构建代理模型$p(y\mid~\x)$, 并生成相应的采集函数$\alpha(\x)$.

    for $t~=~1,~2,~\ldots$

    添加新的试验点$~\x_{n_{0}+t}~=~\mathop{\arg\max}_{\x}~\alpha(\x)$, 并计算该超参对应的模型表现.

    更新代理模型和采集函数$\alpha(\x)$.

    end for

    输出最佳超参组合.

  •   

    Algorithm 2 SNTO算法

    在搜索区域$C^{s}$生成包含$n_{0}$个点的均匀设计$D~=~D_{0}$, 并验证其对应超参表现.

    for $t~=~1,2,~\ldots$

    从已知试验点中找到表现最好的一个 $~\x^{(t-1)}~$.

    以$\x^{(t-1)}$为中心, 根据(12)计算缩放后空间的边界值$\{a^{(t)}_{i},~b^{(t)}_{i}\}$.

    if $(b_{i}^{(t)}-a_{i}^{(t)})<\delta$ then

    终止算法.

    end if

    在缩放后的子空间添加一组包含$n_{t}$个试验点的均匀设计$D_{t}$.

    验证新的试验点, 并更新$D~=~D~\cup~D_{t}$.

    end for

    输出最优试验点$x^{*}~\in~D$.

  •   

    Algorithm 3 改进的SNTO算法

    在搜索区域$C^{s}$生成包含$n_{0}$个点的均匀设计$D~=~D_{0}$, 并验证其对应超参表现.

    for $t~=~1,2,~\ldots$

    从已知试验点中找到表现最好的一个 $~\x^{(t-1)}~$.

    以$~\x^{(t-1)}$为中心对每一维进行缩放. 如果部分缩放空间超出边界, 则将其向中心平移直到全部位于边界内.

    if $(b_{i}^{(t)}-a_{i}^{(t)})<\delta$ then

    终止算法.

    end if

    在缩放后的子空间确定一组包含$n_{t}$个试验点的均匀设计$D_{t}$.

    对$D_{t}$中每个试验点的所有因子各自添加一个随机扰动, 并对不同的因子进行随机置换.

    验证新的试验点, 并更新$D~=~D~\cup~D_{t}$.

    end for

    输出最优试验点$x^{*}~\in~D$.

qqqq

Contact and support