Files
snapclient/components/esp-dsp/docs/esp-dsp-benchmarks.rst
Carlos 15b4baba28 - merge with original master from jorgen
- minimize RAM usage of all components
- use both IRAM and DRAM in player component so we can buffer up to 1s on modules without SPI RAM
- support fragemented pcm chunks so we can use all available RAM if there isn't a big enough block available but still enough HEAP
- reinclude all components from jorgen's master branch
- add custom i2s driver to get a precise timing of initial sync
- change wrong usage of esp_timer for latency measurement of snapcast protocol
- add player component
2021-08-19 21:57:16 +02:00

67 lines
5.2 KiB
ReStructuredText

Espressif DSP Library Benchmarks
================================
The table bellow contains benchmarks of functions provided by ESP-DSP library. The values are CPU cycle counts taken to execute each of the functions. Values in "ESP32" column are for the optimized (assembly) implementation, values in "ANSI C" column are for the non-optimized implementation.
+----------------------------------------------------------+----------+----------+
| Function name and arguments | CPU cycles |
+----------------------------------------------------------+----------+----------+
| | ESP32 | ANSI C |
+==========================================================+==========+==========+
| | | |
+----------------------------------------------------------+----------+----------+
| **Dot Product** | | |
+----------------------------------------------------------+----------+----------+
| dsps_dotprod_f32 for N=256 points | 1057 | 2597 |
+----------------------------------------------------------+----------+----------+
| dsps_dotprode_f32 for N=256 points, with step 1 | 1318 | 2601 |
+----------------------------------------------------------+----------+----------+
| dsps_dotprod_s16 for N=256 points | 448 | 5185 |
+----------------------------------------------------------+----------+----------+
| | | |
+----------------------------------------------------------+----------+----------+
| **FIR Filters** | | |
+----------------------------------------------------------+----------+----------+
| dsps_fir_f32 1024 input samples and 256 coefficients | 1338418 | 3583556 |
+----------------------------------------------------------+----------+----------+
| dsps_fird_f32 1024 samples, 256 coeffs and decimation 4 | 37582 | 82535 |
+----------------------------------------------------------+----------+----------+
| | | |
+----------------------------------------------------------+----------+----------+
| **FFTs** | | |
+----------------------------------------------------------+----------+----------+
| dsps_fft2r_fc32 for 64 complex points | 5451 | 8187 |
+----------------------------------------------------------+----------+----------+
| dsps_fft2r_fc32 for 128 complex points | 12400 | 18756 |
+----------------------------------------------------------+----------+----------+
| dsps_fft2r_fc32 for 256 complex points | 27829 | 42381 |
+----------------------------------------------------------+----------+----------+
| dsps_fft2r_fc32 for 512 complex points | 61755 | 94616 |
+----------------------------------------------------------+----------+----------+
| dsps_fft2r_fc32 for 1024 complex points | 135745 | 209058 |
+----------------------------------------------------------+----------+----------+
| | | |
+----------------------------------------------------------+----------+----------+
| **IIR Filters** | | |
+----------------------------------------------------------+----------+----------+
| dsps_biquad_f32 - biquad filter for 1024 input samples | 17451 | 31778 |
+----------------------------------------------------------+----------+----------+
| | | |
+----------------------------------------------------------+----------+----------+
| **Matrix Multiplication** | | |
+----------------------------------------------------------+----------+----------+
| dspm_mult_f32 - C[16,16] = A[16,16]*B[16,16]; | 24669 | 59690 |
+----------------------------------------------------------+----------+----------+
| dspm_mult_s16 - C[16,16] = A[16,16]*B[16,16]; | 24964 | 114150 |
+----------------------------------------------------------+----------+----------+
| dspm_mult_3x3x1_f32 - C[3,1] = A[3,3]*B[3,1]; | 80 | 242 |
+----------------------------------------------------------+----------+----------+
| dspm_mult_3x3x3_f32 - C[3,3] = A[3,3]*B[3,3]; | 212 | 541 |
+----------------------------------------------------------+----------+----------+
| dspm_mult_4x4x1_f32 - C[4,1] = A[4,4]*B[4,1]; | 112 | 362 |
+----------------------------------------------------------+----------+----------+
| dspm_mult_4x4x4_f32 - C[4,4] = A[4,4]*B[4,4]; | 404 | 1130 |
+----------------------------------------------------------+----------+----------+
The benchmark test could be reproduced by executing test cases found in :repo_file:`test/test_dsp.c`.