Transcript

In Lattica, AI inference is executed through a mechanism called a transcript - a structured JSON file that describes a step-by-step sequence of operations needed to evaluate an AI model. This transcript serves as a deterministic execution plan, processed by the HEAL Runtime.

We generate transcript from an AI model. During this generation process, we determine exactly which operations are needed and which implementation to call for each operation. For instance:

If a hardware backend implements a function like key_switch, the transcript will call the hardware-accelerated version of that function.
If a hardware backend does not implement the function, the transcript will instead point to our implementation built from a minimal set of low-level operations.

This dynamic generation process enables each transcript to be customized for the specific capabilities of the hardware it's targeting.

To support testing and benchmarking, we also provide in HEAL a special "sandbox" transcript. Instead of retrieving real query data from a client, the sandbox transcript contains pre-encrypted input data embedded directly inside the file. This design enables:

Functionality testing of any hardware-accelerated implementation.
Performance evaluation on real AI model workloads.
Repeatable experiments without dependency on live inputs.

Transcript Structure

Each entry in the transcript is a tuple:

(ExecutionTranscriptOpType, payload)

Where:

ExecutionTranscriptOpType: Specifies the type of instruction:
- DEVICE_OP
- SEGMENT_START
- SEGMENT_END
- FREE_DEVICE_TENSOR
payload: Provides the operation's full definition, including input arguments and outputs.

Operation Types

1. `DEVICE_OP`: Hardware-Targeted Functions

Each DEVICE_OP represents a function call defined in the HEAL interface specification - these are the operations that hardware vendors are expected to implement

It consists of:

A function name: The function name, such as "ntt", "host_to_device", or "modmul".
A list of input arguments (args)
A designated output (out)

Each function call in the transcript includes input and output arguments, all wrapped in DeviceOpArg objects. These arguments carry both the data and information about what type of value they represent.

The type of each argument is specified using the DeviceOpArgType enum. This tells the runtime how to interpret the associated value.

Below is a reference table for the possible argument types:

Argument Type

Description

HOST_TENSOR

A tensor that resides on the host (CPU). Typically used when transferring data from host to device.

DEVICE_TENSOR

A pointer (inf_name) to a tensor stored on the device (e.g., GPU, FPGA). Used for device-resident operations.

SHAPE

Shape information for tensors (e.g., dimensions of a matrix).

INT

A scalar integer value - often used for specifying axes, sizes, or indices.

NONE

Represents a null or intentionally missing argument.

TENSOR_TYPE

Metadata that describes the kind of tensor (e.g., int32, int64).

SLICE

A slicing specification to extract a portion of a tensor.

ELLIPSIS

Used for slicing across multiple dimensions (similar to Python’s ...).

These functions operate on tensors and are executed directly on the hardware accelerator.

2. `SEGMENT_START` / `SEGMENT_END`: Device Execution Boundaries

A segment marks a block of operations that are intended to run in sequence and may benefit from hardware-level optimization. Real-world AI models include a mix of:

Low-level, hardware-accelerated tensor operations (e.g., ntt, modmul)
Higher-level operations or control logic that may run on the host (CPU)

By enclosing a sequence of operations between SEGMENT_START and SEGMENT_END, the transcript defines a logical execution block.

This block allows the hardware backend to analyze and optimize the segment as a whole.

3. `FREE_DEVICE_TENSOR`: Explicit Memory Release

In HEAL, we take responsibility for managing device memory explicitly.

The FREE_DEVICE_TENSOR operation is used to indicate when a tensor's memory should be released.

Encrypted Data for Testing

To enable closed-loop testing, the "sandbox" transcript includes the complete contents of tensors expected to be on the host. This includes both data sent to the device via host_to_device and data expected to be received back from the device via device_to_host.

This setup allows comparison between the actual device computation outputs and the expected results.

Since it is embedded directly in the transcript:

Hardware engineers can test functionality and performance without any additional setup.
All parties run on identical inputs, which enhances reproducibility.

Example snippet:

{
  "name": "host_to_device",
  "args": [
    {
      "arg_type": "HOST_TENSOR",
      "value": {
        "tensor_base64": {
          "type": "numpy",
          "data": "kSBVTYBZAQ84Z...EAAAAAAA=="
        }
      }
    }
  ]
}

This ensures that performance and correctness are validated using deterministic input.

How a Transcript Uses Tensors and Pointers

When the transcript runs, each tensor on the device is identified using a unique pointer called inf_name. This lets the runtime (and your hardware) track where each tensor lives in memory.

Here’s how it works in a typical flow:

host_to_device The transcript begins by uploading a tensor from the host (CPU) to the device. This function allocates memory on the device and returns a pointer to it - for example, inf_name: 55.
Operations on Device Next, we perform operations (e.g. ntt, reshape, modmul) using that pointer. Each function receives this pointer as input so it can access the right memory.
device_to_host When we’ve finished processing, we call device_to_host with the same pointer. This reads the result from device memory and returns it as a host tensor.
free_device_tensor Finally, we free the device memory associated with the pointer. This helps avoid memory leaks and lets your hardware manage resources efficiently.

So in the transcript, you’ll often see the same pointer (e.g., inf_name: 55) used across several steps, from memory allocation to final cleanup. This is how we track data through the model.

Example: Pointer Lifecycle in a Transcript

Below is a simplified excerpt from a real transcript. It shows how a tensor is uploaded to the device, used in operations, sent back to the host, and then freed:

[
  // Step 1: Upload a tensor to the device
  {
    "type": "DEVICE_OP",
    "op": {
      "name": "host_to_device",
      "args": [ ... ],
      "out": {
        "arg_type": "DEVICE_TENSOR",
        "value": { "inf_name": 55 }
      }
    }
  },

  // Step 2: Perform operations using inf_name 55
  {
    "type": "DEVICE_OP",
    "op": {
      "name": "ntt",
      "args": [
        { "arg_type": "DEVICE_TENSOR", "value": { "inf_name": 55 } },
        ...
      ],
      "out": {
        "arg_type": "DEVICE_TENSOR",
        "value": { "inf_name": 55 }
      }
    }
  },

  // Step 3: Download the result back to the host
  {
    "type": "DEVICE_OP",
    "op": {
      "name": "device_to_host",
      "args": [
        { "arg_type": "DEVICE_TENSOR", "value": { "inf_name": 55 } },
        ...
      ],
      "out": {
        "arg_type": "HOST_TENSOR",
        "value": { ... }
      }
    }
  },

  // Step 4: Free the device memory
  {
    "type": "FREE_DEVICE_TENSOR",
    "tensor_name": 55
  }
]

This example shows how one tensor (inf_name: 55) is allocated, used across multiple steps, and then properly released. Every operation that interacts with the tensor simply references the pointer, making memory handling straightforward and efficient for the hardware.

Ready-to-Use Transcript for Testing

To help hardware teams get started quickly, we provide a ready-made transcript in our GitHub repository. It’s designed for sandbox testing and includes:

A real AI model execution sequence.
Encrypted input data is already embedded in the file.
Calls to a wide range of HEAL functions.

This transcript allows you to test functionality and measure performance of your hardware implementation without needing to connect a client or handle live input. It’s a self-contained example that can run from start to finish using the HEAL Runtime.

PreviousTensor NextRuntime

Last updated 9 days ago

Transcript

If a hardware backend implements a function like key_switch, the transcript will call the hardware-accelerated version of that function.
If a hardware backend does not implement the function, the transcript will instead point to our implementation built from a minimal set of low-level operations.

This dynamic generation process enables each transcript to be customized for the specific capabilities of the hardware it's targeting.

Functionality testing of any hardware-accelerated implementation.
Performance evaluation on real AI model workloads.
Repeatable experiments without dependency on live inputs.

Transcript Structure

Each entry in the transcript is a tuple:

(ExecutionTranscriptOpType, payload)

Where:

ExecutionTranscriptOpType: Specifies the type of instruction:
- DEVICE_OP
- SEGMENT_START
- SEGMENT_END
- FREE_DEVICE_TENSOR
payload: Provides the operation's full definition, including input arguments and outputs.

Operation Types

1. `DEVICE_OP`: Hardware-Targeted Functions

Each DEVICE_OP represents a function call defined in the HEAL interface specification - these are the operations that hardware vendors are expected to implement

It consists of:

A function name: The function name, such as "ntt", "host_to_device", or "modmul".
A list of input arguments (args)
A designated output (out)

The type of each argument is specified using the DeviceOpArgType enum. This tells the runtime how to interpret the associated value.

Below is a reference table for the possible argument types:

Argument Type

Description

HOST_TENSOR

A tensor that resides on the host (CPU). Typically used when transferring data from host to device.

DEVICE_TENSOR

A pointer (inf_name) to a tensor stored on the device (e.g., GPU, FPGA). Used for device-resident operations.

SHAPE

Shape information for tensors (e.g., dimensions of a matrix).

INT

A scalar integer value - often used for specifying axes, sizes, or indices.

NONE

Represents a null or intentionally missing argument.

TENSOR_TYPE

Metadata that describes the kind of tensor (e.g., int32, int64).

SLICE

A slicing specification to extract a portion of a tensor.

ELLIPSIS

Used for slicing across multiple dimensions (similar to Python’s ...).

These functions operate on tensors and are executed directly on the hardware accelerator.

2. `SEGMENT_START` / `SEGMENT_END`: Device Execution Boundaries

A segment marks a block of operations that are intended to run in sequence and may benefit from hardware-level optimization. Real-world AI models include a mix of:

Low-level, hardware-accelerated tensor operations (e.g., ntt, modmul)
Higher-level operations or control logic that may run on the host (CPU)

By enclosing a sequence of operations between SEGMENT_START and SEGMENT_END, the transcript defines a logical execution block.

This block allows the hardware backend to analyze and optimize the segment as a whole.

3. `FREE_DEVICE_TENSOR`: Explicit Memory Release

In HEAL, we take responsibility for managing device memory explicitly.

The FREE_DEVICE_TENSOR operation is used to indicate when a tensor's memory should be released.

Encrypted Data for Testing

This setup allows comparison between the actual device computation outputs and the expected results.

Since it is embedded directly in the transcript:

Hardware engineers can test functionality and performance without any additional setup.
All parties run on identical inputs, which enhances reproducibility.

Example snippet:

{
  "name": "host_to_device",
  "args": [
    {
      "arg_type": "HOST_TENSOR",
      "value": {
        "tensor_base64": {
          "type": "numpy",
          "data": "kSBVTYBZAQ84Z...EAAAAAAA=="
        }
      }
    }
  ]
}

This ensures that performance and correctness are validated using deterministic input.

How a Transcript Uses Tensors and Pointers

When the transcript runs, each tensor on the device is identified using a unique pointer called inf_name. This lets the runtime (and your hardware) track where each tensor lives in memory.

Here’s how it works in a typical flow:

host_to_device The transcript begins by uploading a tensor from the host (CPU) to the device. This function allocates memory on the device and returns a pointer to it - for example, inf_name: 55.
Operations on Device Next, we perform operations (e.g. ntt, reshape, modmul) using that pointer. Each function receives this pointer as input so it can access the right memory.
device_to_host When we’ve finished processing, we call device_to_host with the same pointer. This reads the result from device memory and returns it as a host tensor.
free_device_tensor Finally, we free the device memory associated with the pointer. This helps avoid memory leaks and lets your hardware manage resources efficiently.

So in the transcript, you’ll often see the same pointer (e.g., inf_name: 55) used across several steps, from memory allocation to final cleanup. This is how we track data through the model.

Example: Pointer Lifecycle in a Transcript

Below is a simplified excerpt from a real transcript. It shows how a tensor is uploaded to the device, used in operations, sent back to the host, and then freed:

[
  // Step 1: Upload a tensor to the device
  {
    "type": "DEVICE_OP",
    "op": {
      "name": "host_to_device",
      "args": [ ... ],
      "out": {
        "arg_type": "DEVICE_TENSOR",
        "value": { "inf_name": 55 }
      }
    }
  },

  // Step 2: Perform operations using inf_name 55
  {
    "type": "DEVICE_OP",
    "op": {
      "name": "ntt",
      "args": [
        { "arg_type": "DEVICE_TENSOR", "value": { "inf_name": 55 } },
        ...
      ],
      "out": {
        "arg_type": "DEVICE_TENSOR",
        "value": { "inf_name": 55 }
      }
    }
  },

  // Step 3: Download the result back to the host
  {
    "type": "DEVICE_OP",
    "op": {
      "name": "device_to_host",
      "args": [
        { "arg_type": "DEVICE_TENSOR", "value": { "inf_name": 55 } },
        ...
      ],
      "out": {
        "arg_type": "HOST_TENSOR",
        "value": { ... }
      }
    }
  },

  // Step 4: Free the device memory
  {
    "type": "FREE_DEVICE_TENSOR",
    "tensor_name": 55
  }
]

Ready-to-Use Transcript for Testing

To help hardware teams get started quickly, we provide a ready-made transcript in our GitHub repository. It’s designed for sandbox testing and includes:

A real AI model execution sequence.
Encrypted input data is already embedded in the file.
Calls to a wide range of HEAL functions.

PreviousTensor NextRuntime

Last updated 9 days ago

Transcript Structure

Operation Types

1. DEVICE_OP: Hardware-Targeted Functions

2. SEGMENT_START / SEGMENT_END: Device Execution Boundaries

3. FREE_DEVICE_TENSOR: Explicit Memory Release

Encrypted Data for Testing

How a Transcript Uses Tensors and Pointers

Example: Pointer Lifecycle in a Transcript

Ready-to-Use Transcript for Testing

Transcript Structure

Operation Types

1. DEVICE_OP: Hardware-Targeted Functions

2. SEGMENT_START / SEGMENT_END: Device Execution Boundaries

3. FREE_DEVICE_TENSOR: Explicit Memory Release

Encrypted Data for Testing

How a Transcript Uses Tensors and Pointers

Example: Pointer Lifecycle in a Transcript

Ready-to-Use Transcript for Testing

1. `DEVICE_OP`: Hardware-Targeted Functions

2. `SEGMENT_START` / `SEGMENT_END`: Device Execution Boundaries

3. `FREE_DEVICE_TENSOR`: Explicit Memory Release

1. `DEVICE_OP`: Hardware-Targeted Functions

2. `SEGMENT_START` / `SEGMENT_END`: Device Execution Boundaries

3. `FREE_DEVICE_TENSOR`: Explicit Memory Release