# Transcript

In **Lattica**, AI inference is executed through a mechanism called a **transcript** - a structured JSON file that describes a **step-by-step sequence of operations** needed to evaluate an AI model. This transcript serves as a **deterministic execution plan**, processed by the **HEAL Runtime.**

**We generate transcript** from an AI model. During this generation process, we determine exactly **which operations are needed** and **which implementation to call for each operation**. For instance:

* If a hardware backend **implements** a function like `key_switch`, the transcript will **call the hardware-accelerated version** of that function.
* If a hardware backend does **not** implement the function, the transcript will instead point to our implementation built from a minimal set of low-level operations.

This dynamic generation process enables each transcript to be **customized** for the specific capabilities of the hardware it's targeting.

To support **testing and benchmarking**, we also provide in HEAL a special **"sandbox" transcript**. Instead of retrieving real query data from a client, the sandbox transcript contains **pre-encrypted input data embedded directly inside the file**. This design enables:

* **Functionality testing** of any hardware-accelerated implementation.
* **Performance evaluation** on real AI model workloads.
* **Repeatable experiments** without dependency on live inputs.

***

## Transcript Structure

Each entry in the transcript is a tuple:

```python
(ExecutionTranscriptOpType, payload)
```

Where:

* `ExecutionTranscriptOpType`: Specifies the type of instruction:
  * DEVICE\_OP&#x20;
  * SEGMENT\_START
  * SEGMENT\_END
  * FREE\_DEVICE\_TENSOR
* `payload`: Provides the operation's full definition, including input arguments and outputs.

### Operation Types

#### **1. `DEVICE_OP`: Hardware-Targeted Functions**

Each `DEVICE_OP` represents a **function call** defined in the HEAL interface specification - these are the operations that **hardware vendors are expected to implement**&#x20;

It consists of:

* A **function `name`**: The function name, such as `"ntt"`, `"host_to_device"`, or `"modmul"`.
* A list of **input arguments** (`args`)&#x20;
* A designated **output** (`out`)&#x20;

Each function call in the transcript includes **input and output arguments**, all wrapped in `DeviceOpArg` objects. These arguments carry both the data and information about what type of value they represent.

The **type** of each argument is specified using the `DeviceOpArgType` enum. This tells the runtime how to interpret the associated value.

Below is a reference table for the possible argument types:

| Argument Type   | Description                                                                                                     |
| --------------- | --------------------------------------------------------------------------------------------------------------- |
| `HOST_TENSOR`   | A tensor that resides on the host (CPU). Typically used when transferring data from host to device.             |
| `DEVICE_TENSOR` | A pointer (`inf_name`) to a tensor stored on the device (e.g., GPU, FPGA). Used for device-resident operations. |
| `SHAPE`         | Shape information for tensors (e.g., dimensions of a matrix).                                                   |
| `INT`           | A scalar integer value - often used for specifying axes, sizes, or indices.                                     |
| `NONE`          | Represents a null or intentionally missing argument.                                                            |
| `TENSOR_TYPE`   | Metadata that describes the kind of tensor (e.g., int32, int64).                                                |
| `SLICE`         | A slicing specification to extract a portion of a tensor.                                                       |
| `ELLIPSIS`      | Used for slicing across multiple dimensions (similar to Python’s `...`).                                        |

These functions operate on tensors and are executed directly on the hardware accelerator.&#x20;

#### **2. `SEGMENT_START` / `SEGMENT_END`: Device Execution Boundaries**

A **segment** marks a block of operations that are intended to run in sequence and may benefit from hardware-level optimization. Real-world AI models include a mix of:

* Low-level, hardware-accelerated tensor operations (e.g., `ntt`, `modmul`)
* Higher-level operations or control logic that may run on the host (CPU)

By enclosing a sequence of operations between `SEGMENT_START` and `SEGMENT_END`, the transcript defines a **logical execution block**.&#x20;

This block allows the hardware backend to **analyze and optimize the segment as a whole**.

#### **3. `FREE_DEVICE_TENSOR`: Explicit Memory Release**

In HEAL, we take responsibility for managing device memory explicitly.&#x20;

The `FREE_DEVICE_TENSOR` operation is used to indicate when a tensor's memory should be released.

***

## Encrypted Data for Testing

To enable closed-loop testing, the "sandbox" transcript includes the complete contents of tensors expected to be on the host. This includes both data sent to the device via host\_to\_device and data expected to be received back from the device via device\_to\_host.&#x20;

This setup allows comparison between the actual device computation outputs and the expected results.

Since it is embedded directly in the transcript:

* Hardware engineers can test functionality and performance **without any additional setup**.
* All parties run on **identical inputs**, which enhances reproducibility.

Example snippet:

```json
{
  "name": "host_to_device",
  "args": [
    {
      "arg_type": "HOST_TENSOR",
      "value": {
        "tensor_base64": {
          "type": "numpy",
          "data": "kSBVTYBZAQ84Z...EAAAAAAA=="
        }
      }
    }
  ]
}
```

This ensures that performance and correctness are validated using **deterministic input**.

***

## How a Transcript Uses Tensors and Pointers

When the transcript runs, each tensor on the device is identified using a unique pointer called `inf_name`. This lets the runtime (and your hardware) track where each tensor lives in memory.

Here’s how it works in a typical flow:

1. **host\_to\_device**\
   The transcript begins by uploading a tensor from the host (CPU) to the device.\
   This function allocates memory on the device and returns a **pointer** to it - for example, `inf_name: 55`.
2. **Operations on Device**\
   Next, we perform operations (e.g. `ntt`, `reshape`, `modmul`) using that pointer.\
   Each function receives this pointer as input so it can access the right memory.
3. **device\_to\_host**\
   When we’ve finished processing, we call `device_to_host` with the same pointer.\
   This reads the result from device memory and returns it as a host tensor.
4. **free\_device\_tensor**\
   Finally, we free the device memory associated with the pointer.\
   This helps avoid memory leaks and lets your hardware manage resources efficiently.

So in the transcript, you’ll often see the same pointer (e.g., `inf_name: 55`) used across several steps, from memory allocation to final cleanup. This is how we track data through the model.

### Example: Pointer Lifecycle in a Transcript

Below is a simplified excerpt from a real transcript. It shows how a tensor is uploaded to the device, used in operations, sent back to the host, and then freed:

```json
[
  // Step 1: Upload a tensor to the device
  {
    "type": "DEVICE_OP",
    "op": {
      "name": "host_to_device",
      "args": [ ... ],
      "out": {
        "arg_type": "DEVICE_TENSOR",
        "value": { "inf_name": 55 }
      }
    }
  },

  // Step 2: Perform operations using inf_name 55
  {
    "type": "DEVICE_OP",
    "op": {
      "name": "ntt",
      "args": [
        { "arg_type": "DEVICE_TENSOR", "value": { "inf_name": 55 } },
        ...
      ],
      "out": {
        "arg_type": "DEVICE_TENSOR",
        "value": { "inf_name": 55 }
      }
    }
  },

  // Step 3: Download the result back to the host
  {
    "type": "DEVICE_OP",
    "op": {
      "name": "device_to_host",
      "args": [
        { "arg_type": "DEVICE_TENSOR", "value": { "inf_name": 55 } },
        ...
      ],
      "out": {
        "arg_type": "HOST_TENSOR",
        "value": { ... }
      }
    }
  },

  // Step 4: Free the device memory
  {
    "type": "FREE_DEVICE_TENSOR",
    "tensor_name": 55
  }
]
```

This example shows how one tensor (`inf_name: 55`) is allocated, used across multiple steps, and then properly released. Every operation that interacts with the tensor simply references the pointer, making memory handling straightforward and efficient for the hardware.

***

## Ready-to-Use Transcript for Testing

To help hardware teams get started quickly, we provide a **ready-made transcript** in our GitHub repository. It’s designed for sandbox testing and includes:

* A real AI model execution sequence.
* Encrypted input data is already embedded in the file.
* Calls to a wide range of HEAL functions.

This transcript allows you to **test functionality and measure performance** of your hardware implementation without needing to connect a client or handle live input. It’s a self-contained example that can run from start to finish using the HEAL Runtime.

***
