Memory and Execution Model

Overview

HEAL operates in a heterogeneous execution environment, where computations are distributed between two types of processors:

Component

Role

Host (CPU)

Acts as the control unit and manages data flow.

Device (Accelerator / FHE Hardware)

Executes computationally expensive homomorphic operations on tensors. It processes data according to instructions received from the host.

Execution Flow

Below we explain the three-phase execution model used to manage data across host and device memory.

Upload – Transfer input to device

The host prepares input data in tensor form and transfers it to the device memory using:

host_to_device<T>(host_tensor);

host_tensor is a tensor in CPU memory.
The function returns a DeviceTensor<T>, a pointer-based wrapper referencing allocated memory on the device.

No computation is done yet; the tensor is simply staged for processing.

Work – Run computations on the device

The host invokes computational functions on the device, which operate entirely within device memory.

Example functions include:

modmul_ttc<T>(a, b, p_scalar, result);
ntt<T>(input, p, twiddles, result);
expand<T>(a, axis, repeats, result);

Each function receives pointers to input and output tensor.
The hardware implementation processes these inputs and writes results directly to the output tensor memory location provided by HEAL.

If additional memory is required to store the results, the host is responsible for allocating this memory and providing the correct pointer before calling device functions.

✓ Intermediate results remain on the device. ✘ They are not automatically visible to the host after each function.

Download – Transfer Results to Host

Once computations are complete, the host explicitly transfers results back using:

device_to_host<T>(device_tensor);

This step copies the computed tensor back into host memory for further use, such as decryption, visualization, or exporting.

PreviousCore Concepts Explained NextTensor

Last updated 1 month ago