feat(ml)!: cuda and openvino acceleration (#5619)

* cuda and openvino ep, refactor, update dockerfile * updated workflow * typing fixes * added tests * updated ml test gh action * updated README * updated docker-compose * added compute to hwaccel.yml * updated gh matrix updated gh matrix updated gh matrix updated gh matrix updated gh matrix give up * remove cuda/arm64 build * add hwaccel image tags to docker-compose * remove unnecessary quotes * add suffix to git tag * fixed kwargs in base model * armnn ld_library_path * update pyproject.toml * add armnn workflow * formatting * consolidate hwaccel files, update docker compose * update hw transcoding docs * add ml hwaccel docs * update dev and prod docker-compose * added armnn prerequisite docs * support 3.10 * updated docker-compose comments * formatting * test coverage * don't set arena extend strategy for openvino * working openvino * formatting * fix dockerfile * added type annotation * add wsl configuration for openvino * updated lock file * copy python3 * comment out extends section * fix platforms * simplify workflow suffix tagging * simplify aio transcoding doc * update docs and workflow for `hwaccel.yml` change * revert docs
2024-01-21 18:22:39 -05:00
parent 6b419a984c
commit 95cfe22866
23 changed files with 962 additions and 460 deletions
@@ -6,10 +6,11 @@
 # Setup

 This project uses [Poetry](https://python-poetry.org/docs/#installation), so be sure to install it first.
-Running `poetry install --no-root --with dev` will install everything you need in an isolated virtual environment.
+Running `poetry install --no-root --with dev --with cpu` will install everything you need in an isolated virtual environment.
+CUDA and OpenVINO are supported as acceleration APIs. To use them, you can replace `--with cpu` with either of `--with cuda` or `--with openvino`.

 To add or remove dependencies, you can use the commands `poetry add $PACKAGE_NAME` and `poetry remove $PACKAGE_NAME`, respectively.
-Be sure to commit the `poetry.lock` and `pyproject.toml` files to reflect any changes in dependencies.
+Be sure to commit the `poetry.lock` and `pyproject.toml` files with `poetry lock --no-update` to reflect any changes in dependencies.


 # Load Testing
@@ -20,4 +21,4 @@ You can change the models or adjust options like score thresholds through the Lo

 To get started, you can simply run `locust --web-host 127.0.0.1` and open `localhost:8089` in a browser to access the UI. See the [Locust documentation](https://docs.locust.io/en/stable/index.html) for more info on running Locust. 

-Note that in Locust's jargon, concurrency is measured in `users`, and each user runs one task at a time. To achieve a particular per-endpoint concurrency, multiply that number by the number of endpoints to be queried. For example, if there are 3 endpoints and you want each of them to receive 8 requests at a time, you should set the number of users to 24.
+Note that in Locust's jargon, concurrency is measured in `users`, and each user runs one task at a time. To achieve a particular per-endpoint concurrency, multiply that number by the number of endpoints to be queried. For example, if there are 3 endpoints and you want each of them to receive 8 requests at a time, you should set the number of users to 24.