catarc aa0c1cf006 first commit bevfusion driver and detect 6 månader sedan
..
configs aa0c1cf006 first commit bevfusion driver and detect 6 månader sedan
dependencies aa0c1cf006 first commit bevfusion driver and detect 6 månader sedan
libraries aa0c1cf006 first commit bevfusion driver and detect 6 månader sedan
qat aa0c1cf006 first commit bevfusion driver and detect 6 månader sedan
src aa0c1cf006 first commit bevfusion driver and detect 6 månader sedan
tool aa0c1cf006 first commit bevfusion driver and detect 6 månader sedan
.gitignore aa0c1cf006 first commit bevfusion driver and detect 6 månader sedan
CMakeLists.txt aa0c1cf006 first commit bevfusion driver and detect 6 månader sedan
CUDA-BEVFusion.pro aa0c1cf006 first commit bevfusion driver and detect 6 månader sedan
README.md aa0c1cf006 first commit bevfusion driver and detect 6 månader sedan
bevparams.xml aa0c1cf006 first commit bevfusion driver and detect 6 månader sedan

README.md

CUDA-BEVFusion

This repository contains sources and model for BEVFusion inference using CUDA & TensorRT. title

3D Object Detection(on nuScenes validation set)

  • For all models, we used the BEVFusion-Base configuration.
    • The camera resolution is 256x704
  • For the camera backbone, we chose SwinTiny and ResNet50.
Model Framework Precision mAP NDS FPS
Swin-Tiny
BEVFusion-Base
PyTorch FP32+FP16 68.52 71.38 8.4(on RTX3090)
ResNet50 PyTorch FP32+FP16 67.93 70.97 -
ResNet50 TensorRT FP16 67.89 70.98 18(on ORIN)
ResNet50-PTQ TensorRT FP16+INT8 67.66 70.81 25(on ORIN)
  • Note: The time we reported on ORIN is based on the average of nuScenes 6019 validation samples.
    • Since the number of lidar points is the main reason that affects the FPS.
    • Please refer to the readme of 3DSparseConvolution for more details.

Demonstration

Model and Data

  • For quick practice, we provide an example data of nuScenes. You can download it from ( NVBox ) or ( Baidu Drive ). It contains the following:
    1. Camera images in 6 directions.
    2. Transformation matrix of camera/lidar/ego.
    3. Use for bevfusion-pytorch data of example-data.pth, allow export onnx only without depending on the full dataset.
  • All models (model.zip) can be downloaded from ( NVBox ) or ( Baidu Drive ). It contains the following:
    1. swin-tiny onnx models.
    2. resnet50 onnx and pytorch models.
    3. resnet50 int8 onnx and PTQ models.

Prerequisites

To build bevfusion, we need to depend on the following libraries:

  • CUDA >= 11.0
  • CUDNN >= 8.2
  • TensorRT >= 8.5.0
  • libprotobuf-dev
  • Compute Capability >= sm_80
  • Python >= 3.6

The data in the performance table was obtained by us on the Nvidia Orin platform, using TensorRT-8.6, cuda-11.4 and cudnn8.6 statistics.

Quick Start for Inference

  • note: Please use git clone --recursive to pull this repository to ensure the integrity of the dependencies.

1. Download models and datas to CUDA-BEVFusion directory

  • download model.zip from ( NVBox ) or ( Baidu Drive )
  • download nuScenes-example-data.zip from ( NVBox ) or ( Baidu Drive )

    # download models and datas to CUDA-BEVFusion
    cd CUDA-BEVFusion
    
    # unzip models and datas
    unzip model.zip
    unzip nuScenes-example-data.zip
    
    # here is the directory structure after unzipping
    CUDA-BEVFusion
    |-- example-data
    |-- 0-FRONT.jpg
    |-- 1-FRONT_RIGHT.jpg
    |-- ...
    |-- camera_intrinsics.tensor
    |-- ...
    |-- example-data.pth
    `-- points.tensor
    |-- src
    |-- qat
    |-- model
    |-- resnet50int8
    |   |-- bevfusion_ptq.pth
    |   |-- camera.backbone.onnx
    |   |-- camera.vtransform.onnx
    |   |-- default.yaml
    |   |-- fuser.onnx
    |   |-- head.bbox.onnx
    |   `-- lidar.backbone.xyz.onnx
    |-- resnet50
    `-- swint
    |-- bevfusion
    `-- tool
    

2. Configure the environment.sh

  • Install python dependency libraries

    apt install libprotobuf-dev
    pip install onnx
    
  • Modify the TensorRT/CUDA/CUDNN/BEVFusion variable values in the tool/environment.sh file.

    # change the path to the directory you are currently using
    export TensorRT_Lib=/path/to/TensorRT/lib
    export TensorRT_Inc=/path/to/TensorRT/include
    export TensorRT_Bin=/path/to/TensorRT/bin
    
    export CUDA_Lib=/path/to/cuda/lib64
    export CUDA_Inc=/path/to/cuda/include
    export CUDA_Bin=/path/to/cuda/bin
    export CUDA_HOME=/path/to/cuda
    
    export CUDNN_Lib=/path/to/cudnn/lib
    
    # For CUDA-11.x:    SPCONV_CUDA_VERSION=11.4
    # For CUDA-12.x:    SPCONV_CUDA_VERSION=12.6
    export SPCONV_CUDA_VERSION=11.4
    
    # resnet50/resnet50int8/swint
    export DEBUG_MODEL=resnet50int8
    
    # fp16/int8
    export DEBUG_PRECISION=int8
    export DEBUG_DATA=example-data
    export USE_Python=OFF
    
  • Apply the environment to the current terminal.

    . tool/environment.sh
    

5. Compile and run

  1. Building the models for tensorRT

    bash tool/build_trt_engine.sh
    
  2. Compile and run the program

    # Generate the protobuf code
    bash src/onnx/make_pb.sh
    
    # Compile and run
    bash tool/run.sh
    

Export onnx and PTQ

  • For more detail, please refer here

For Python Interface

  1. Modify USE_Python=ON in environment.sh to enable compilation of python.
  2. Run bash tool/run.sh to build the libpybev.so.
  3. Run python tool/pybev.py to test the python interface.

For PyTorch BEVFusion

  • Use the following command to get a specific commit to avoid failure.

    git clone https://github.com/mit-han-lab/bevfusion
    
    cd bevfusion
    git checkout db75150717a9462cb60241e36ba28d65f6908607
    

Further performance improvement

  • Since the number of point clouds fluctuates more, this has a significant impact on the FPS.
    • Consider using the ground removal or range filter algorithms provided in cuPCL, which can decrease the inference time by lidar.
  • We just implemented the recommended partial quantization method. However, users can further reduce the inference latency by sparse pruning and 4:2 sparsity.
    • In the resnet50 model at large resolutions, using the --sparsity=force option can significantly improve inference performance. For more details, please refer to ASP (automatic sparsity tools).
  • In general, the camera backbone has less impact on accuracy and more impact on latency.
    • A lighter camera backbone (such as resnet34) will achieve lower latency.

References