Benchmarks
This section provides benchmarks for the Vollo Trees accelerator for a variety decision tree models.
Performance figures are given for two configurations of the Vollo accelerator. A 256-unit configuration which is provided for the IA-840F accelerator card and a 128-unit configuration which is provided for the IA-420F accelerator card. If you require a different configuration, please contact us at vollo@myrtle.ai.
All these performance numbers can be measured using the vollo-trees-sdk
with the correct accelerator card
by running the provided benchmark script.
IA-840F: 256 units
Raw buffer API
This is using buffers allocated with vollo_rt_get_raw_buffer
which lets the runtime skip IO copy.
model | num trees | max depth | input features | fully populated | mean latency (us) | 99th percentile latency (us) |
---|---|---|---|---|---|---|
single-decision-t1-d1-f32 | 1 | 1 | 32 | No | 1.7 | 1.9 |
example-t1000-d5-f128 | 1000 | 5 | 128 | No | 1.9 | 2.0 |
example-t1000-d5-f128-full | 1000 | 5 | 128 | Yes | 1.8 | 2.0 |
example-t1000-d8-f128 | 1000 | 8 | 128 | No | 1.9 | 2.1 |
example-t1000-d8-f128-full | 1000 | 8 | 128 | Yes | 1.9 | 2.1 |
example-t512-d8-f512 | 512 | 8 | 512 | No | 1.9 | 2.1 |
example-t1024-d8-f1024 | 1024 | 8 | 1024 | No | 2.0 | 2.2 |
example-t4096-d8-f1024-full | 4096 | 8 | 1024 | Yes | 2.2 | 2.3 |
User buffers
model | numtrees | max depth | input features | fully populated | mean latency (us) | 99th percentile latency (us) |
---|---|---|---|---|---|---|
single-decision-t1-d1-f32 | 1 | 1 | 32 | No | 1.8 | 2.0 |
example-t1000-d5-f128 | 1000 | 5 | 128 | No | 1.9 | 2.1 |
example-t1000-d5-f128-full | 1000 | 5 | 128 | Yes | 1.9 | 2.1 |
example-t1000-d8-f128 | 1000 | 8 | 128 | No | 2.0 | 2.2 |
example-t1000-d8-f128-full | 1000 | 8 | 128 | Yes | 2.0 | 2.1 |
example-t512-d8-f512 | 512 | 8 | 512 | No | 2.1 | 2.3 |
example-t1024-d8-f1024 | 1024 | 8 | 1024 | No | 2.3 | 2.5 |
example-t4096-d8-f1024-full | 4096 | 8 | 1024 | Yes | 2.5 | 2.7 |
IA-420F
Raw buffer API
model | num trees | max depth | input features | fully populated | mean latency (us) | 99th percentile latency (us) |
---|---|---|---|---|---|---|
single-decision-t1-d1-f32 | 1 | 1 | 32 | No | 1.7 | 1.9 |
example-t1000-d5-f128 | 1000 | 5 | 128 | No | 1.9 | 2.1 |
example-t1000-d5-f128-full | 1000 | 5 | 128 | Yes | 1.9 | 2.1 |
example-t1000-d8-f128 | 1000 | 8 | 128 | No | 1.9 | 2.1 |
example-t1000-d8-f128-full | 1000 | 8 | 128 | Yes | 1.9 | 2.1 |
example-t512-d8-f512 | 512 | 8 | 512 | No | 1.9 | 2.1 |
example-t1024-d8-f1024 | 1024 | 8 | 1024 | No | 2.0 | 2.2 |
example-t4096-d8-f1024-full | 4096 | 8 | 1024 | Yes | 2.5 | 2.6 |
User buffers
model | numtrees | max depth | input features | fully populated | mean latency (us) | 99th percentile latency (us) |
---|---|---|---|---|---|---|
single-decision-t1-d1-f32 | 1 | 1 | 32 | No | 1.8 | 1.9 |
example-t1000-d5-f128 | 1000 | 5 | 128 | No | 2.0 | 2.2 |
example-t1000-d5-f128-full | 1000 | 5 | 128 | Yes | 2.0 | 2.1 |
example-t1000-d8-f128 | 1000 | 8 | 128 | No | 2.1 | 2.2 |
example-t1000-d8-f128-full | 1000 | 8 | 128 | Yes | 2.0 | 2.2 |
example-t512-d8-f512 | 512 | 8 | 512 | No | 2.1 | 2.3 |
example-t1024-d8-f1024 | 1024 | 8 | 1024 | No | 2.3 | 2.5 |
example-t4096-d8-f1024-full | 4096 | 8 | 1024 | Yes | 2.8 | 3.0 |