Skip to main content
Version: Next

Release notes

v1.5.0

warning

Please update the model parameter in your request body. The old parameter (hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4) is outdated and support will be dropped in the next minor release. If you always want to use the latest model, please use the new model parameter (latest). For more information, see Example Prompting.

  • Upgrade to the Llama 3.3 70B model (ibnzterrell/Meta-Llama-3.3-70B-Instruct-AWQ-INT4 ) for improved quality.
  • disableUpdate flag is deprecated. Providing a manifest file via --manifestPath will automatically disable the update behavior. Refer to Manifest management for more details.

v1.4.0

  • Major rewrite of the documentation
  • Support token-based billing for Stripe
  • Fixes a bug to return errors as type text/event-stream if requested by the client

v1.3.1

  • Improve stability for cases where the AMD Key Distribution Service is unavailable.

v1.3.0

  • Internal changes to license management.

v1.2.2

  • Fixes a bug for streaming requests that made optional parameters required if stream_options: {"include_usage": true} wasn't set

v1.2.0

  • Add arm64 support for the continuum-proxy. Find information on how to use it in the Continuum-proxy section.
  • Token tracking is now automatically enabled for streaming requests by transparently setting include_usage in the stream_options.

v1.1.0

  • Increase peak performance by more than 40% through improved request scheduling
  • Increase performance by about 6% through vLLM upgrade to v0.6.1