LiquidAI Inference server

LiquidAI builds native device foundation models for advanced intelligence outside data centers, with a focus on low latency, privacy, and hardware-constrained environments.

Their LFM is onboard on Clustergate 2 and available for API requests in the same general way as the Ollama server. The main difference is the endpoint URL, which points to LiquidAI instead of Ollama.

The example in examples/liquidai/vlm_infer.py shows how to submit an image plus a natural-language prompt to the server using an OpenAI-compatible chat completions request.

warning

API requests are tracked and chargeable. LiquidAI usage is considered a paid metric.

This is the same basic interaction pattern you would use with other chat-completions APIs:

send a POST request
include the model name
pass a messages array
encode the image as a data: URL
read the assistant response from the first completion choice

Usage

To use it onboard Clustergate 2, make a POST request to http://liquid-3b.dphi-public/v1/chat/completions.

The endpoint accepts a JSON body and returns a standard chat-completions response. For the example in this repository, the server is used for image understanding rather than plain text-only chat.

The request payload should follow the OpenAI chat format, with the image included in the user message content before the text prompt. The image-first ordering matters in the example because the multimodal message content is structured as an array of typed parts.

Example

The example script examples/liquidai/vlm_infer.py does the following:

reads an image from disk
detects the file type with mimetypes.guess_type
base64-encodes the raw image bytes
wraps the encoded image in a data:<mime>;base64,... URL
sends the image and prompt together in a messages array
prints the final answer from choices[0].message.content

Example request body:

{
  "model": "local",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,..."
          }
        },
        {
          "type": "text",
          "text": "describe this"
        }
      ]
    }
  ],
  "max_tokens": 256,
  "temperature": 0.2,
  "cache_prompt": false
}

The important parts of the payload are:

model: set to local in the example
messages[0].role: user
messages[0].content[0]: the image part
messages[0].content[1]: the text prompt
max_tokens: controls the length of the generated answer
temperature: controls sampling randomness
cache_prompt: disabled in the example

Request notes

image_url.url must be a base64-encoded data: URL.
The image should come before the text in the content array.
max_tokens and temperature are optional parameters used by the example, but they are useful knobs if you want to tune answer length or variability.
The response text is available at choices[0].message.content.
The example uses a python3 client with urllib.request, so no external SDK is required.

Example script

Inside a container, you can run the sample like this:

python3 vlm_infer.py --server http://liquid-3b.dphi-public --image ./image.png --prompt "describe this"

If you want to test a different image, pass any local file path supported by your image tooling. The script will infer the MIME type from the filename and fall back to image/jpeg if it cannot determine one.

For a full Operation YAML example, use examples/liquidai/liquidai.yaml. It is the YAML you would fill in through the dashboard. This operation does the following:

uplink uploads vlm_infer.py and the image into a working volume
pod_run_job runs the script inside a python:3.10-slim container and pipes the response to a file
downlink_results retrieves the generated output file

Usage​

Example​

Request notes​

Example script​

Usage

Example

Request notes

Example script