LiquidAI Inference server
LiquidAI builds native device foundation models for advanced intelligence outside data centers, with a focus on low latency, privacy, and hardware-constrained environments.
Their LFM is onboard on Clustergate 2 and available for API requests in the same general way as the Ollama server. The main difference is the endpoint URL, which points to LiquidAI instead of Ollama.
The example in examples/liquidai/vlm_infer.py shows how to submit an image plus a natural-language prompt to the server using an OpenAI-compatible chat completions request.
API requests are tracked and chargeable. LiquidAI usage is considered a paid metric.
This is the same basic interaction pattern you would use with other chat-completions APIs:
- send a
POSTrequest - include the model name
- pass a
messagesarray - encode the image as a
data:URL - read the assistant response from the first completion choice
Usage
To use it onboard Clustergate 2, make a POST request to http://liquid-3b.dphi-public/v1/chat/completions.
The endpoint accepts a JSON body and returns a standard chat-completions response. For the example in this repository, the server is used for image understanding rather than plain text-only chat.
The request payload should follow the OpenAI chat format, with the image included in the user message content before the text prompt. The image-first ordering matters in the example because the multimodal message content is structured as an array of typed parts.
Example
The example script examples/liquidai/vlm_infer.py does the following:
- reads an image from disk
- detects the file type with
mimetypes.guess_type - base64-encodes the raw image bytes
- wraps the encoded image in a
data:<mime>;base64,...URL - sends the image and prompt together in a
messagesarray - prints the final answer from
choices[0].message.content
Example request body:
{
"model": "local",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,..."
}
},
{
"type": "text",
"text": "describe this"
}
]
}
],
"max_tokens": 256,
"temperature": 0.2,
"cache_prompt": false
}
The important parts of the payload are:
model: set tolocalin the examplemessages[0].role:usermessages[0].content[0]: the image partmessages[0].content[1]: the text promptmax_tokens: controls the length of the generated answertemperature: controls sampling randomnesscache_prompt: disabled in the example
Request notes
image_url.urlmust be a base64-encodeddata:URL.- The image should come before the text in the
contentarray. max_tokensandtemperatureare optional parameters used by the example, but they are useful knobs if you want to tune answer length or variability.- The response text is available at
choices[0].message.content. - The example uses a
python3client withurllib.request, so no external SDK is required.
Example script
Inside a container, you can run the sample like this:
python3 vlm_infer.py --server http://liquid-3b.dphi-public --image ./image.png --prompt "describe this"
If you want to test a different image, pass any local file path supported by your image tooling.
The script will infer the MIME type from the filename and fall back to image/jpeg if it cannot determine one.
For a full Operation YAML example, use examples/liquidai/liquidai.yaml. It is the YAML you would fill in through the dashboard. This operation does the following:
uplinkuploadsvlm_infer.pyand the image into a working volumepod_run_jobruns the script inside apython:3.10-slimcontainer and pipes the response to a filedownlink_resultsretrieves the generated output file