# serving-runtime
**Repository Path**: mirrors_ovh/serving-runtime
## Basic Information
- **Project Name**: serving-runtime
- **Description**: Exposes a serialized machine learning model through a HTTP API.
- **Primary Language**: Unknown
- **License**: BSD-3-Clause
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-09-25
- **Last Updated**: 2026-06-21
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Serving Runtime
Exposes a serialized machine learning model through a HTTP API written in Java.
 []() [](https://gitter.im/ovh/ai)
**This project is under active development**
## Description
The purpose of this project is to expose a generic HTTP API from a machine learning serialized models.
Supported serialized models are :
* [ONNX][ONNX] `1.5`
* TensorFlow `<=1.15` SavedModel or HDF5
* [HuggingFace Tokenizer](https://github.com/huggingface/tokenizers)
## Prerequisites
* Maven for compiling the project
* `Java 11` for running the project
`HDF5` serialization format is supported through a conversion into `SavedModel` format. That conversion relies on following dependencies :
* Python `3.7`
* TensorFlow `<=1.15` (`pip install tensorflow`)
For HuggingFace tokenizer :
* Cargo (Rust stable)
### HDF5 support (Optional)
The Tensorflow module requires the support of HDF5 files through the creation of an executable `h5_converter` wich exports the model from HDF5 file to a Tensorflow SavedModel (`.pb`).
To generate the converter simply use the initialize_tensorflow goal of the `Makefile`:
```bash
make initialize_tensorflow
```
The generated executable can be found here: `evaluator-tensorflow/h5_converter/dist/h5_converter`
### HuggingFace (Optional)
To build the Java binding use the initialize_huggingface goal of the `Makefile`:
```bash
make initialize_huggingface
```
### Torch (Optional)
To install libtorch use initialize_torch goal of the `Makefile`:
```bash
make initialize_torch
```
[Convert pyTorch model and more](evaluator-torch/README.md)
## Build & Launch the project locally
Several profiles are available depending on the support you require for the built project.
- `full` which includes both Tensorflow and ONNX, requires the [ONNX support](#onnx-support-optional), [HDF5 support](#hdf5-support-optional) and [Torch support](#torch-optional).
- `tensorflow` which only includes Tensorflow, requires the [HDF5 support](#hdf5-support-optional)
- `onnx` which only includes ONNX, requires the [ONNX support](#onnx-support-optional).
- `torch` which only includes Torch, requires the [Torch support](#torch-optional).
Set your desired profile:
```bash
export MAVEN_PROFILE=
```
If not specified the default profile is set to `full`.
### Launch tests
```bash
make test MAVEN_PROFILE=$MAVEN_PROFILE
```
### Building JAR
```bash
make build MAVEN_PROFILE=$MAVEN_PROFILE
```
The JAR could then be found in `api/target/api-*.jar`
### Launching JAR
In the following command, replace `` with the path on your compiled jar and `` with the directory where to find your serialized model.
```bash
java -Dfiles.path= -jar
```
If you wish to load a model from a HDF5 model you will need to specify the path to the executable generated in [HDF5 support](#hdf5-support-optional).
```bash
java -Dfiles.path= -Devaluator.tensorflow.h5_converter.path= -jar
```
Inside the `` it will look for the first file ending with :
* `.onnx` for an ONNX model
* `.pb` for a TensorFlow SavedModel
* `.h5` for a HDF5 model
#### Available parameters
On the launch command you can also specify the following parameters :
* `-Dserver.port` : the host port to request for the http server
* `-Dswagger.title` : The title that will be dispayed on the swagger
* `-Dswagger.description` : The description that will be displayed on the swagger
## Build & Launch the project using docker
### Building the docker container
```bash
make docker-build-api MAVEN_PROFILE=$MAVEN_PROFILE
```
It will build the docker image `serving-runtime-$MAVEN_PROFILE:latest`
### Running the docker container
In the following command, replace `` with the absolute path on directory where to find your serialized model.
```bash
docker run --rm -it -p 8080:8080 -v :/deployments/models serving-runtime-$MAVEN_PROFILE:latest
```
## Using the API
By default the API will be running on `http://localhost:8080`. Reaching this URL in your browser will display the SwaggerUI describing the API for your model.
There is 2 routes available in each models :
* `/describe` : Describe your model (what are the inputs, outputs and transformations)
* `/eval` : Send expected inputs on model and receive expected outputs results
### Describe the models inputs and outputs
Each serialized model takes a list of named tensors as **inputs** and also returns a list of named tensors as **outputs**.
A **named tensors** is a **N-Dimensional array** with :
* A identifier name. Example: `my-tensor-name`
* A data type. Example: `integer` or `double` or `string`
* A shape. Example: `(5)` for a vector of length **5**, `(3, 2)` for a matrix which first dimension is of size **3** and second dimension is of size **2**. Etc.
You can get access to the model inputs and outputs by calling the http `GET` method on `/describe` path of the model.
#### Example of a describe query with curl
```bash
curl \
-X GET \
http:///describe
```
#### Example of a describe response
You will get a **JSON** object describing the list of **inputs tensors** that are needed to query your model as well as the list of **outputs tensors** that will be returning.
```json
{
"inputs": [
{
"name": "sepal_length",
"type": "float",
"shape": [-1]
},
{
"name": "sepal_width",
"type": "float",
"shape": [-1]
},
{
"name": "petal_length",
"type": "float",
"shape": [-1]
},
{
"name": "petal_width",
"type": "float",
"shape": [-1]
}
],
"outputs": [
{
"name": "output_label",
"type": "long",
"shape": [-1]
},
{
"name": "output_probability",
"type": "float",
"shape": [-1, 2]
}
]
}
```
In this example, the deployed model is waiting for 4 tensors as inputs :
* `sepal_length` of shape `(-1)` (i.e. a vector of any size)
* `sepal_width` of shape `(-1)` (i.e. a vector of any size)
* `petal_length` of shape `(-1)` (i.e. a vector of any size)
* `petal_width` of shape `(-1)` (i.e. a vector of any size)
It will answer a response with 2 tensors as outputs :
* `output_label` of shape `(-1)` (i.e. a vector of any size)
* `output_probability` of shape `(-1, 2)` (i.e. a matrix which first dimension is of any size and which second dimension is of size 2)
### Query the model
Once you know what kind of **input tensors** are needed by the model, just fill a correct **body** on your **HTTP query** with your wanted representation of tensor (see below) and send it to the model with a `POST` method on the path `/eval`.
Two attached headers are available for your query:
* The [Content-Type][Content Type Header] header indicating the [media type][Media Type] of your input tensors data contained in your body message.
* The (optional) [Accept][Accept Header] header indicating what kind of [media type][Media Type] your want to receive for output tensors in the response body. The default `Accept` header if you don't provide one will be `application/json`.
### Supported Content-Type headers
* `application/json` : A json document which **key** are the **input tensors** names and **values** are the n-dimensional json arrays matching your tensors.
* `image/png` : A bytes content which representation is a **png** encoded image.
* `image/jpeg` : A bytes content which representation is a **jpeg** encoded image.
>
> `image/png` and `image/jpeg` are only available for models taking a single tensor as input. That tensor's shape should also be compatible with an image representation.
>
* `multipart/form-data` : A multipart body, each part of which is named by an **input tensor**.
>
> Each part (i.e. tensor) in the **multipart** should have its own **Content-Type**
>
### Supported Accept headers
* `application/json` : A JSON document which **key** is the **output tensors** names and **values** are the n-dimensional json arrays matching your tensors.
* `image/png` : A bytes content which representation is a **png** encoded image.
* `image/jpeg` : A bytes content which representation is a **jpeg** encoded image.
>
> `image/png` and `image/jpeg` are only available for models returning a single tensor as output. That tensor's shape should also be compatible with an image representation.
>
* `text/html` : A HTML document displaying the **output tensors** representation.
* `multipart/form-data` : A multipart body, each part of which is named by an **output tensor** and the content is the tensor json representation.
>
> If you want some of the output tensors in `multipart/form-data` and `text/html` header to be interpreted as an image, you can specify it as a parameter in the header.
>
> **Example** : The header `text/html; tensor_1=image/png; tensor_2=image/png` returns the global response as HTML content. Inside the HTML page, `tensor_1` and `tensor_2` are displayed as **png** images.
>
### Tensor interpretable as image
For a tensor to be interpretable as image raw data, it should be of a compatible shape in your exported model. Here are the supported ones :
* `(x, y, z, 1)` : Batch of **x** grayscale images with **y** pixels height and **z** pixels width
* `(x, 1, y, z)` : Batch of **x** grayscale images with **y** pixels height and **z** pixels width
* `(x, y, z, 3)` : Batch of **x** RGB images with **y** pixels height and **z** pixels width. The last dimension should be the array of `(red, green, blue)` components.
* `(x, 3, y, z)` : Batch of **x** RGB images with **y** pixels height and **z** pixels width. The last dimension should be the array of `(red, green, blue)` components.
* `(y, z, 1)` : Single grayscale image with **y** pixels height and **z** pixels width
* `(1, y, z)` : Single grayscale image with **y** pixels height and **z** pixels width
* `(y, z, 3)` : Single RGB image with **y** pixels height and **z** pixels width. The last dimension should be the array of `(red, green, blue)` components.
* `(3, y, z)` : Single RGB image with **y** pixels height and **z** pixels width. The last dimension should be the array of `(red, green, blue)` components.
## Examples
### Example of a query with curl for a single prediction
In the following example, we want to receive a prediction from our model for the following item :
* `sepal_length` : 0.1
* `sepal_width` : 0.2
* `petal_length` : 0.3
* `petal_width` : 0.4
```bash
curl \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-X POST \
-d '{
"stepal_length": 0.1,
"stepal_width": 0.2,
"petal_length": 0.3,
"petal_width": 0.4
}' \
http:///eval
```
### Example of response for a single prediction
* HTTP Status code: `200`
* Header: `Content-Type: application/json`
```json
{
"output_label": 0,
"output_probability": [0.88, 0.12]
}
```
In this example, our model predicts the **output_label** for our **input item** to be `0` with the following probabilities :
* 88% of chance to be `0`
* 12% of chance to be `1`
### Example of query with curl for several predictions in one call
In the following example, we want to receive a prediction from our model for the two following items :
**First Item**
* `sepal_length` : 0.1
* `sepal_width` : 0.2
* `petal_length` : 0.3
* `petal_width` : 0.4
**Second Item**
* `sepal_length` : 0.2
* `sepal_width` : 0.3
* `petal_length` : 0.4
* `petal_width` : 0.5
**Query**
```bash
curl \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-X POST \
-d '{
"stepal_length": [0.1, 0.2],
"stepal_width": [0.2, 0.3],
"petal_length": [0.3, 0.4],
"petal_width": [0.4, 0.5]
}' \
http:///eval
```
### Example of response for several predictions in one call
* HTTP Status code: `200`
* Header: `Content-Type: application/json`
```json
{
"output_label": [0, 1],
"output_probability": [
[0.88, 0.12],
[0.01, 0.99]
]
}
```
In this example, our model predicts the **output_label** for our **first input item** to be `0` with the following probabilities :
* 88% of chance to be `0`
* 12% of chance to be `1`
It also predicts the **output_label** for our **second input item** to be `1` with the following probabilities :
* 1% of chance to be `0`
* 99% of chance to be `1`
# Related links
* Contribute: https://github.com/ovh/serving-runtime/blob/master/CONTRIBUTING.md
* Report bugs: https://github.com/ovh/serving-runtime/issues
# License
See https://github.com/ovh/serving-runtime/blob/master/LICENSE
[Content Type Header]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type
[Accept Header]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept
[Media Type]: https://developer.mozilla.org/en-US/docs/Glossary/MIME_type
[ONNX]: https://onnx.ai/