Benchmarks

This section provides benchmarks for the Vollo Trees accelerator for a variety decision tree models.

Performance figures are given for two configurations of the Vollo accelerator. A 256-unit configuration which is provided for the IA-840F accelerator card and a 128-unit configuration which is provided for the IA-420F accelerator card. If you require a different configuration, please contact us at vollo@myrtle.ai.

All these performance numbers can be measured using the vollo-trees-sdk with the correct accelerator card by running the provided benchmark script.

IA-840F: 256 units

Raw buffer API

This is using buffers allocated with vollo_rt_get_raw_buffer which lets the runtime skip IO copy.

modelnum treesmax depthinput featuresfully populatedmean latency (us)99th percentile latency (us)
single-decision-t1-d1-f321132No1.71.9
example-t1000-d5-f12810005128No1.92.0
example-t1000-d5-f128-full10005128Yes1.82.0
example-t1000-d8-f12810008128No1.92.1
example-t1000-d8-f128-full10008128Yes1.92.1
example-t512-d8-f5125128512No1.92.1
example-t1024-d8-f1024102481024No2.02.2
example-t4096-d8-f1024-full409681024Yes2.22.3

User buffers

modelnumtreesmax depthinput featuresfully populatedmean latency (us)99th percentile latency (us)
single-decision-t1-d1-f321132No1.82.0
example-t1000-d5-f12810005128No1.92.1
example-t1000-d5-f128-full10005128Yes1.92.1
example-t1000-d8-f12810008128No2.02.2
example-t1000-d8-f128-full10008128Yes2.02.1
example-t512-d8-f5125128512No2.12.3
example-t1024-d8-f1024102481024No2.32.5
example-t4096-d8-f1024-full409681024Yes2.52.7

IA-420F

Raw buffer API

modelnum treesmax depthinput featuresfully populatedmean latency (us)99th percentile latency (us)
single-decision-t1-d1-f321132No1.71.9
example-t1000-d5-f12810005128No1.92.1
example-t1000-d5-f128-full10005128Yes1.92.1
example-t1000-d8-f12810008128No1.92.1
example-t1000-d8-f128-full10008128Yes1.92.1
example-t512-d8-f5125128512No1.92.1
example-t1024-d8-f1024102481024No2.02.2
example-t4096-d8-f1024-full409681024Yes2.52.6

User buffers

modelnumtreesmax depthinput featuresfully populatedmean latency (us)99th percentile latency (us)
single-decision-t1-d1-f321132No1.81.9
example-t1000-d5-f12810005128No2.02.2
example-t1000-d5-f128-full10005128Yes2.02.1
example-t1000-d8-f12810008128No2.12.2
example-t1000-d8-f128-full10008128Yes2.02.2
example-t512-d8-f5125128512No2.12.3
example-t1024-d8-f1024102481024No2.32.5
example-t4096-d8-f1024-full409681024Yes2.83.0