Release notes
v1.5.0
warning
Please update the model parameter in your request body. The old parameter (hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4
) is outdated and support will be dropped in the next minor release.
If you always want to use the latest model, please use the new model parameter (latest
). For more information, see Example Prompting.
- Upgrade to the Llama 3.3 70B model (
ibnzterrell/Meta-Llama-3.3-70B-Instruct-AWQ-INT4
) for improved quality. disableUpdate
flag is deprecated. Providing a manifest file via--manifestPath
will automatically disable the update behavior. Refer to Manifest management for more details.
v1.4.0
- Major rewrite of the documentation
- Support token-based billing for Stripe
- Fixes a bug to return errors as type
text/event-stream
if requested by the client
v1.3.1
- Improve stability for cases where the AMD Key Distribution Service is unavailable.
v1.3.0
- Internal changes to license management.
v1.2.2
- Fixes a bug for streaming requests that made optional parameters required if
stream_options: {"include_usage": true}
wasn't set
v1.2.0
- Add
arm64
support for thecontinuum-proxy
. Find information on how to use it in the Continuum-proxy section. - Token tracking is now automatically enabled for streaming requests by transparently setting
include_usage
in thestream_options
.
v1.1.0
- Increase peak performance by more than 40% through improved request scheduling
- Increase performance by about 6% through vLLM upgrade to
v0.6.1