# serving-runtime **Repository Path**: mirrors_ovh/serving-runtime ## Basic Information - **Project Name**: serving-runtime - **Description**: Exposes a serialized machine learning model through a HTTP API. - **Primary Language**: Unknown - **License**: BSD-3-Clause - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-09-25 - **Last Updated**: 2026-06-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Serving Runtime Exposes a serialized machine learning model through a HTTP API written in Java. ![TUs & TIs](https://github.com/ovh/serving-runtime/workflows/TUs%20&%20TIs/badge.svg?branch=master) [![Maintenance](https://img.shields.io/maintenance/yes/2020.svg)]() [![Chat on gitter](https://img.shields.io/gitter/room/ovh/ai.svg)](https://gitter.im/ovh/ai) **This project is under active development** ## Description The purpose of this project is to expose a generic HTTP API from a machine learning serialized models. Supported serialized models are : * [ONNX][ONNX] `1.5` * TensorFlow `<=1.15` SavedModel or HDF5 * [HuggingFace Tokenizer](https://github.com/huggingface/tokenizers) ## Prerequisites * Maven for compiling the project * `Java 11` for running the project `HDF5` serialization format is supported through a conversion into `SavedModel` format. That conversion relies on following dependencies : * Python `3.7` * TensorFlow `<=1.15` (`pip install tensorflow`) For HuggingFace tokenizer : * Cargo (Rust stable) ### HDF5 support (Optional) The Tensorflow module requires the support of HDF5 files through the creation of an executable `h5_converter` wich exports the model from HDF5 file to a Tensorflow SavedModel (`.pb`). To generate the converter simply use the initialize_tensorflow goal of the `Makefile`: ```bash make initialize_tensorflow ``` The generated executable can be found here: `evaluator-tensorflow/h5_converter/dist/h5_converter` ### HuggingFace (Optional) To build the Java binding use the initialize_huggingface goal of the `Makefile`: ```bash make initialize_huggingface ``` ### Torch (Optional) To install libtorch use initialize_torch goal of the `Makefile`: ```bash make initialize_torch ``` [Convert pyTorch model and more](evaluator-torch/README.md) ## Build & Launch the project locally Several profiles are available depending on the support you require for the built project. - `full` which includes both Tensorflow and ONNX, requires the [ONNX support](#onnx-support-optional), [HDF5 support](#hdf5-support-optional) and [Torch support](#torch-optional). - `tensorflow` which only includes Tensorflow, requires the [HDF5 support](#hdf5-support-optional) - `onnx` which only includes ONNX, requires the [ONNX support](#onnx-support-optional). - `torch` which only includes Torch, requires the [Torch support](#torch-optional). Set your desired profile: ```bash export MAVEN_PROFILE= ``` If not specified the default profile is set to `full`. ### Launch tests ```bash make test MAVEN_PROFILE=$MAVEN_PROFILE ``` ### Building JAR ```bash make build MAVEN_PROFILE=$MAVEN_PROFILE ``` The JAR could then be found in `api/target/api-*.jar` ### Launching JAR In the following command, replace `` with the path on your compiled jar and `` with the directory where to find your serialized model. ```bash java -Dfiles.path= -jar ``` If you wish to load a model from a HDF5 model you will need to specify the path to the executable generated in [HDF5 support](#hdf5-support-optional). ```bash java -Dfiles.path= -Devaluator.tensorflow.h5_converter.path= -jar ``` Inside the `` it will look for the first file ending with : * `.onnx` for an ONNX model * `.pb` for a TensorFlow SavedModel * `.h5` for a HDF5 model #### Available parameters On the launch command you can also specify the following parameters : * `-Dserver.port` : the host port to request for the http server * `-Dswagger.title` : The title that will be dispayed on the swagger * `-Dswagger.description` : The description that will be displayed on the swagger ## Build & Launch the project using docker ### Building the docker container ```bash make docker-build-api MAVEN_PROFILE=$MAVEN_PROFILE ``` It will build the docker image `serving-runtime-$MAVEN_PROFILE:latest` ### Running the docker container In the following command, replace `` with the absolute path on directory where to find your serialized model. ```bash docker run --rm -it -p 8080:8080 -v :/deployments/models serving-runtime-$MAVEN_PROFILE:latest ``` ## Using the API By default the API will be running on `http://localhost:8080`. Reaching this URL in your browser will display the SwaggerUI describing the API for your model. There is 2 routes available in each models : * `/describe` : Describe your model (what are the inputs, outputs and transformations) * `/eval` : Send expected inputs on model and receive expected outputs results ### Describe the models inputs and outputs Each serialized model takes a list of named tensors as **inputs** and also returns a list of named tensors as **outputs**. A **named tensors** is a **N-Dimensional array** with : * A identifier name. Example: `my-tensor-name` * A data type. Example: `integer` or `double` or `string` * A shape. Example: `(5)` for a vector of length **5**, `(3, 2)` for a matrix which first dimension is of size **3** and second dimension is of size **2**. Etc. You can get access to the model inputs and outputs by calling the http `GET` method on `/describe` path of the model. #### Example of a describe query with curl ```bash curl \ -X GET \ http:///describe ``` #### Example of a describe response You will get a **JSON** object describing the list of **inputs tensors** that are needed to query your model as well as the list of **outputs tensors** that will be returning. ```json { "inputs": [ { "name": "sepal_length", "type": "float", "shape": [-1] }, { "name": "sepal_width", "type": "float", "shape": [-1] }, { "name": "petal_length", "type": "float", "shape": [-1] }, { "name": "petal_width", "type": "float", "shape": [-1] } ], "outputs": [ { "name": "output_label", "type": "long", "shape": [-1] }, { "name": "output_probability", "type": "float", "shape": [-1, 2] } ] } ``` In this example, the deployed model is waiting for 4 tensors as inputs : * `sepal_length` of shape `(-1)` (i.e. a vector of any size) * `sepal_width` of shape `(-1)` (i.e. a vector of any size) * `petal_length` of shape `(-1)` (i.e. a vector of any size) * `petal_width` of shape `(-1)` (i.e. a vector of any size) It will answer a response with 2 tensors as outputs : * `output_label` of shape `(-1)` (i.e. a vector of any size) * `output_probability` of shape `(-1, 2)` (i.e. a matrix which first dimension is of any size and which second dimension is of size 2) ### Query the model Once you know what kind of **input tensors** are needed by the model, just fill a correct **body** on your **HTTP query** with your wanted representation of tensor (see below) and send it to the model with a `POST` method on the path `/eval`. Two attached headers are available for your query: * The [Content-Type][Content Type Header] header indicating the [media type][Media Type] of your input tensors data contained in your body message. * The (optional) [Accept][Accept Header] header indicating what kind of [media type][Media Type] your want to receive for output tensors in the response body. The default `Accept` header if you don't provide one will be `application/json`. ### Supported Content-Type headers * `application/json` : A json document which **key** are the **input tensors** names and **values** are the n-dimensional json arrays matching your tensors. * `image/png` : A bytes content which representation is a **png** encoded image. * `image/jpeg` : A bytes content which representation is a **jpeg** encoded image. > > `image/png` and `image/jpeg` are only available for models taking a single tensor as input. That tensor's shape should also be compatible with an image representation. > * `multipart/form-data` : A multipart body, each part of which is named by an **input tensor**. > > Each part (i.e. tensor) in the **multipart** should have its own **Content-Type** > ### Supported Accept headers * `application/json` : A JSON document which **key** is the **output tensors** names and **values** are the n-dimensional json arrays matching your tensors. * `image/png` : A bytes content which representation is a **png** encoded image. * `image/jpeg` : A bytes content which representation is a **jpeg** encoded image. > > `image/png` and `image/jpeg` are only available for models returning a single tensor as output. That tensor's shape should also be compatible with an image representation. > * `text/html` : A HTML document displaying the **output tensors** representation. * `multipart/form-data` : A multipart body, each part of which is named by an **output tensor** and the content is the tensor json representation. > > If you want some of the output tensors in `multipart/form-data` and `text/html` header to be interpreted as an image, you can specify it as a parameter in the header. > > **Example** : The header `text/html; tensor_1=image/png; tensor_2=image/png` returns the global response as HTML content. Inside the HTML page, `tensor_1` and `tensor_2` are displayed as **png** images. > ### Tensor interpretable as image For a tensor to be interpretable as image raw data, it should be of a compatible shape in your exported model. Here are the supported ones : * `(x, y, z, 1)` : Batch of **x** grayscale images with **y** pixels height and **z** pixels width * `(x, 1, y, z)` : Batch of **x** grayscale images with **y** pixels height and **z** pixels width * `(x, y, z, 3)` : Batch of **x** RGB images with **y** pixels height and **z** pixels width. The last dimension should be the array of `(red, green, blue)` components. * `(x, 3, y, z)` : Batch of **x** RGB images with **y** pixels height and **z** pixels width. The last dimension should be the array of `(red, green, blue)` components. * `(y, z, 1)` : Single grayscale image with **y** pixels height and **z** pixels width * `(1, y, z)` : Single grayscale image with **y** pixels height and **z** pixels width * `(y, z, 3)` : Single RGB image with **y** pixels height and **z** pixels width. The last dimension should be the array of `(red, green, blue)` components. * `(3, y, z)` : Single RGB image with **y** pixels height and **z** pixels width. The last dimension should be the array of `(red, green, blue)` components. ## Examples ### Example of a query with curl for a single prediction In the following example, we want to receive a prediction from our model for the following item : * `sepal_length` : 0.1 * `sepal_width` : 0.2 * `petal_length` : 0.3 * `petal_width` : 0.4 ```bash curl \ -H 'Content-Type: application/json' \ -H 'Accept: application/json' \ -X POST \ -d '{ "stepal_length": 0.1, "stepal_width": 0.2, "petal_length": 0.3, "petal_width": 0.4 }' \ http:///eval ``` ### Example of response for a single prediction * HTTP Status code: `200` * Header: `Content-Type: application/json` ```json { "output_label": 0, "output_probability": [0.88, 0.12] } ``` In this example, our model predicts the **output_label** for our **input item** to be `0` with the following probabilities : * 88% of chance to be `0` * 12% of chance to be `1` ### Example of query with curl for several predictions in one call In the following example, we want to receive a prediction from our model for the two following items : **First Item** * `sepal_length` : 0.1 * `sepal_width` : 0.2 * `petal_length` : 0.3 * `petal_width` : 0.4 **Second Item** * `sepal_length` : 0.2 * `sepal_width` : 0.3 * `petal_length` : 0.4 * `petal_width` : 0.5 **Query** ```bash curl \ -H 'Content-Type: application/json' \ -H 'Accept: application/json' \ -X POST \ -d '{ "stepal_length": [0.1, 0.2], "stepal_width": [0.2, 0.3], "petal_length": [0.3, 0.4], "petal_width": [0.4, 0.5] }' \ http:///eval ``` ### Example of response for several predictions in one call * HTTP Status code: `200` * Header: `Content-Type: application/json` ```json { "output_label": [0, 1], "output_probability": [ [0.88, 0.12], [0.01, 0.99] ] } ``` In this example, our model predicts the **output_label** for our **first input item** to be `0` with the following probabilities : * 88% of chance to be `0` * 12% of chance to be `1` It also predicts the **output_label** for our **second input item** to be `1` with the following probabilities : * 1% of chance to be `0` * 99% of chance to be `1` # Related links * Contribute: https://github.com/ovh/serving-runtime/blob/master/CONTRIBUTING.md * Report bugs: https://github.com/ovh/serving-runtime/issues # License See https://github.com/ovh/serving-runtime/blob/master/LICENSE [Content Type Header]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type [Accept Header]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept [Media Type]: https://developer.mozilla.org/en-US/docs/Glossary/MIME_type [ONNX]: https://onnx.ai/