logo

SCIENCE CHINA Information Sciences, Volume 62 , Issue 11 : 212102(2019) https://doi.org/10.1007/s11432-019-9932-3

Accelerating DNN-based 3D point cloud processing for mobile computing

More info
  • ReceivedApr 13, 2019
  • AcceptedJun 3, 2019
  • PublishedSep 19, 2019

Abstract


References

[1] Gallardo N, Gamez N, Rad P, et al. Autonomous decision making for a driver-less car. In: Proceedings of IEEE System of Systems Engineering Conference (SoSE), Waikoloa, 2017. 1--6. Google Scholar

[2] Lin S C, Zhang Y, Hsu C H, et al. The architectural implications of autonomous driving: constraints and acceleration. In: Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, 2018. 751--766. Google Scholar

[3] Kuindersma S, Deits R, Fallon M. Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot. Auton Robot, 2016, 40: 429-455 CrossRef Google Scholar

[4] Wang X J, Zhou Y F, Pan X, et al. A robust 3D point cloud skeleton extraction method (in Chinese). Sci Sin Inform, 2017, 47: 832--845. Google Scholar

[5] Qi C R, Su H, Mo K, et al. Pointnet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 1: 4. Google Scholar

[6] Qi C R, Yi L, Su H, et al. Pointnet+: deep hierarchical feature learning on point sets in a metric space. In: Proceedings of Neural Information Processing Systems, 2017. 5099--5108. Google Scholar

[7] Vazou N, Seidel E L, Jhala R. Refinement types for Haskell. SIGPLAN Not, 2014, 49: 269-282 CrossRef Google Scholar

[8] Chen Y H, Emer J, Sze V. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. In: Proceedings of ACM SIGARCH Computer Architecture News, 2016. 367--379. Google Scholar

[9] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, 2012. 1097--1105. Google Scholar

[10] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770--778. Google Scholar

[11] Su H, Maji S, Kalogerakis E, et al. Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of IEEE International Conference on Computer Vision, 2015. 945--953. Google Scholar

[12] Arsalan Soltani A, Huang H, Wu J, et al. Synthesizing 3D shapes via modeling multi-view depth maps and silhouettes with deep generative networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1511--1519. Google Scholar

[13] Qi C R, Su H, Niessner M, et al. Volumetric and multi-view cnns for object classification on 3D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 5648--5656. Google Scholar

[14] Zhou Y, Tuzel O. Voxelnet: end-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 4490--4499. Google Scholar

[15] Hua B S, Tran M K, Yeung S K. Pointwise convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 984--993. Google Scholar

[16] Song L, Wang Y, Han Y, et al. C-brain: a deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization. In: Proceedings of Design Automation Conference (DAC), 2016. 1--6. Google Scholar

[17] Wu Z, Song S, Khosla A, et al. 3D shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 1912--1920. Google Scholar

[18] Armeni I, Sener O, Zamir A R, et al. 3D semantic parsing of large-scale indoor spaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 1534--1543. Google Scholar

[19] Muralimanohar N, Balasubramonian R, Jouppi N P. CACTI 6.0: a tool to model large caches. HP Laboratories, 2009, 22--31. Google Scholar

  • Figure 1

    (Color online) Illustrative example of point-based DNNs for point cloud data. (a) An instance of mobile robotic applications; (b) a point-based DNN.

  • Figure 2

    (Color online) Neighbor pixels/points. (a) Neighbor pixels are regular in conventional Conv layers. (b) Irregular neighbors within a radius $r$ in a convolution-like layer. (c) Irregular neighbors within a kernel size $K$ in a pointwise layer. Both (b) and (c) are illustrated for a 2D example and can be smoothly extended to 3D metric space according to the formulations in (b) and (c), respectively.

  •   

    Algorithm 1 Grid-based neighbor point search

    ELSIF(Pointwise) $k_x$= 1 : $K_x$; $k_y$ = 1 : $K_y$ $p_{gx}=p_x-K_x/2+k_x$, $p_{gy}=p_y-K_y/2+k_y$;

    $x_i=(-1/(2g)):(1/(2g$)), $y_i=(-1/(2g)):(1/(2g$)) $Mn_x~\leq~p_{gx}+x_i$.$g~\leq~Mx_x$, $Mn_y$ $\leq$ $p_{gy}+y_i$.$g~\leq~Mx_y$ $addr=(p_{gx}+x_i$.$g-Mn_x)/g+((p_{gy}+y_i$.$g-Mn_y)/g$).($Mx_x-Mn_x)/g$;

    $t\_{\rm~out}$ =

    Retrieve (addr, $S$, $C$, $G$); out$+=t\_{\rm~out}$; cnt+ = Count($t\_{\rm~out}$); $i=1:$ Count($t\_{\rm~out}$) $\vert$$p-t\_{\rm~out}$($i$)$\vert$$\leq$$K$/2 out+ = $t\_{\rm~out}(i)$; cnt+;

    //Initialization

    Inputs: $P$: input points, $g$: a grid size;

    Find the minmun/maximum boundaries of $P$ ($\langle~Mn_x,Mx_x~\rangle,\langle~Mn_y,Mx_y\rangle$);

    Build grids based on $P$, $Mn$/$Mx$, and $g$;

    Store grids of points, the start address and count of each grid;

    //Retrieval

    Inputs: $G$: grid based points; $\langle~S,~C\rangle$: start address and count ofgrids; $p$($p_x$,$p_y$): a center point; $g$: a grid size; $r$: a radius size;$K$($K_x$,$K_y$): a kernel size;

    Outputs: out: neighbor results (out = 0), cnt: their count (cnt = 0);

    if (Ball Query) then

    for $x_i=(-r$/$g$) : $r$/$g$; $y_i=(-r$/$g$) : $r$/$g$

    $p_{gx}=p_x+g$.$x_i$, $p_{gy}=p_y+g$.$y_i$;

    if $Mn_x~\leq~p_{gx}~\leq~Mx_x$, $Mn_y~\leq~p_{gy}~\leq~Mx_y$ then

    addr = ($p_{gx}-Mn_x$)/$g~+((p_{gy}-Mn_y$)/$g$).($Mx_x-Mn_x)/g$; $t\_{\rm~out}$ = Retrieve(addr, $S$, $C$, $G$);

    for $i$=1 : Count($t\_{\rm~out}$)

    if $\Vert~p-t\_{\rm~out}$($i$)$\Vert<r$ then

    out$+=t\_{\rm~out}$($i$); cnt+;

    end if

    end for

    end if

    end for

  • Table 1   Characteristics of benchmarks
    Description Network Total Neighbor Input Dataset
    abbreviationlayer search layerspoints
    PointNet in scene recognition [5] PN$_{-}$r 7 0 1024 ModelNet40 [17]
    PointNet in semantic segmentation [5] PN$_{-}$s 8 0 2048 ShapeNet [17]
    PointNet+ in scene recognition [6] PNpp$_{-}$r 7 2 1024 ModleNet40
    PointNet+ in semantic segmentation [6] PNpp$_{-}$s 10 2 2048 ShapeNet
    Pointwise CNN in scene recognition [15] Pw$_{-}$r 6 4 2048 ModelNet40
    Pointwise CNN in semantic segmentation [15] Pw$_{-}$s 5 5 4096 S3DIS [18]
  • Table 2   Normalized energy consumption compared with GPU
    PN$_{-}$r PN$_{-}$s PNpp$_{-}$r PNpp$_{-}$s Pw$_{-}$r Pw$_{-}$s Gmean
    GPU 1 1 1 1 1 1 1
    PointPU 0.1% 0.1% 0.1% 6.3E$-$5 0.1% 1.5E$-$5 0.05%
  • Table 3   Detailed characteristics of PointPU against to the GPU baseline
    Platform NVIDIA Tesla K40 PointPU
    Technology 28 nm 45 nm, 1.1 V
    Frequency (MHz) 745 700
    Average power (W) 89 0.726
    Area 2.67 mm$^{2}$