catarc aa0c1cf006 first commit bevfusion driver and detect		6 månader sedan
..
configs	aa0c1cf006 first commit bevfusion driver and detect	6 månader sedan
dependencies	aa0c1cf006 first commit bevfusion driver and detect	6 månader sedan
libraries	aa0c1cf006 first commit bevfusion driver and detect	6 månader sedan
qat	aa0c1cf006 first commit bevfusion driver and detect	6 månader sedan
src	aa0c1cf006 first commit bevfusion driver and detect	6 månader sedan
tool	aa0c1cf006 first commit bevfusion driver and detect	6 månader sedan
.gitignore	aa0c1cf006 first commit bevfusion driver and detect	6 månader sedan
CMakeLists.txt	aa0c1cf006 first commit bevfusion driver and detect	6 månader sedan
CUDA-BEVFusion.pro	aa0c1cf006 first commit bevfusion driver and detect	6 månader sedan
README.md	aa0c1cf006 first commit bevfusion driver and detect	6 månader sedan
bevparams.xml	aa0c1cf006 first commit bevfusion driver and detect	6 månader sedan

CUDA-BEVFusion

This repository contains sources and model for BEVFusion inference using CUDA & TensorRT.

3D Object Detection(on nuScenes validation set)

For all models, we used the BEVFusion-Base configuration.
- The camera resolution is 256x704
For the camera backbone, we chose SwinTiny and ResNet50.

Model	Framework	Precision	mAP	NDS	FPS
Swin-Tiny BEVFusion-Base	PyTorch	FP32+FP16	68.52	71.38	8.4(on RTX3090)
ResNet50	PyTorch	FP32+FP16	67.93	70.97	-
ResNet50	TensorRT	FP16	67.89	70.98	18(on ORIN)
ResNet50-PTQ	TensorRT	FP16+INT8	67.66	70.81	25(on ORIN)

Note: The time we reported on ORIN is based on the average of nuScenes 6019 validation samples.
- Since the number of lidar points is the main reason that affects the FPS.
- Please refer to the readme of 3DSparseConvolution for more details.

Demonstration

Model and Data

For quick practice, we provide an example data of nuScenes. You can download it from ( NVBox ) or ( Baidu Drive ). It contains the following:
1. Camera images in 6 directions.
2. Transformation matrix of camera/lidar/ego.
3. Use for bevfusion-pytorch data of example-data.pth, allow export onnx only without depending on the full dataset.
All models (model.zip) can be downloaded from ( NVBox ) or ( Baidu Drive ). It contains the following:
1. swin-tiny onnx models.
2. resnet50 onnx and pytorch models.
3. resnet50 int8 onnx and PTQ models.

Prerequisites

To build bevfusion, we need to depend on the following libraries:

CUDA >= 11.0
CUDNN >= 8.2
TensorRT >= 8.5.0
libprotobuf-dev
Compute Capability >= sm_80
Python >= 3.6

The data in the performance table was obtained by us on the Nvidia Orin platform, using TensorRT-8.6, cuda-11.4 and cudnn8.6 statistics.

Quick Start for Inference

note: Please use git clone --recursive to pull this repository to ensure the integrity of the dependencies.

1. Download models and datas to CUDA-BEVFusion directory

download model.zip from ( NVBox ) or ( Baidu Drive )

download nuScenes-example-data.zip from ( NVBox ) or ( Baidu Drive )

# download models and datas to CUDA-BEVFusion
cd CUDA-BEVFusion

# unzip models and datas
unzip model.zip
unzip nuScenes-example-data.zip

# here is the directory structure after unzipping
CUDA-BEVFusion
|-- example-data
|-- 0-FRONT.jpg
|-- 1-FRONT_RIGHT.jpg
|-- ...
|-- camera_intrinsics.tensor
|-- ...
|-- example-data.pth
`-- points.tensor
|-- src
|-- qat
|-- model
|-- resnet50int8
|   |-- bevfusion_ptq.pth
|   |-- camera.backbone.onnx
|   |-- camera.vtransform.onnx
|   |-- default.yaml
|   |-- fuser.onnx
|   |-- head.bbox.onnx
|   `-- lidar.backbone.xyz.onnx
|-- resnet50
`-- swint
|-- bevfusion
`-- tool

2. Configure the environment.sh

Install python dependency libraries

apt install libprotobuf-dev
pip install onnx

Modify the TensorRT/CUDA/CUDNN/BEVFusion variable values in the tool/environment.sh file.

# change the path to the directory you are currently using
export TensorRT_Lib=/path/to/TensorRT/lib
export TensorRT_Inc=/path/to/TensorRT/include
export TensorRT_Bin=/path/to/TensorRT/bin

export CUDA_Lib=/path/to/cuda/lib64
export CUDA_Inc=/path/to/cuda/include
export CUDA_Bin=/path/to/cuda/bin
export CUDA_HOME=/path/to/cuda

export CUDNN_Lib=/path/to/cudnn/lib

# For CUDA-11.x:    SPCONV_CUDA_VERSION=11.4
# For CUDA-12.x:    SPCONV_CUDA_VERSION=12.6
export SPCONV_CUDA_VERSION=11.4

# resnet50/resnet50int8/swint
export DEBUG_MODEL=resnet50int8

# fp16/int8
export DEBUG_PRECISION=int8
export DEBUG_DATA=example-data
export USE_Python=OFF

Apply the environment to the current terminal.
```
. tool/environment.sh
```

5. Compile and run

Building the models for tensorRT
```
bash tool/build_trt_engine.sh
```

Compile and run the program

# Generate the protobuf code
bash src/onnx/make_pb.sh

# Compile and run
bash tool/run.sh

Export onnx and PTQ

For more detail, please refer here

For Python Interface

Modify USE_Python=ON in environment.sh to enable compilation of python.
Run bash tool/run.sh to build the libpybev.so.
Run python tool/pybev.py to test the python interface.

For PyTorch BEVFusion

Use the following command to get a specific commit to avoid failure.

git clone https://github.com/mit-han-lab/bevfusion

cd bevfusion
git checkout db75150717a9462cb60241e36ba28d65f6908607

Further performance improvement

Since the number of point clouds fluctuates more, this has a significant impact on the FPS.
- Consider using the ground removal or range filter algorithms provided in cuPCL, which can decrease the inference time by lidar.
We just implemented the recommended partial quantization method. However, users can further reduce the inference latency by sparse pruning and 4:2 sparsity.
- In the resnet50 model at large resolutions, using the --sparsity=force option can significantly improve inference performance. For more details, please refer to ASP (automatic sparsity tools).
In general, the camera backbone has less impact on accuracy and more impact on latency.
- A lighter camera backbone (such as resnet34) will achieve lower latency.

README.md