Memory and Execution Model
Overview
HEAL operates in a heterogeneous execution environment, where computations are distributed between two types of processors:
Host (CPU)
Acts as the control unit and manages data flow.
Device (Accelerator / FHE Hardware)
Executes computationally expensive homomorphic operations on tensors. It processes data according to instructions received from the host.
Execution Flow
Below we explain the three-phase execution model used to manage data across host and device memory.
Upload – Transfer input to device
The host prepares input data in tensor form and transfers it to the device memory using:
host_to_device<T>(host_tensor);
host_tensor
is a tensor in CPU memory.The function returns a
DeviceTensor<T>
, a pointer-based wrapper referencing allocated memory on the device.
No computation is done yet; the tensor is simply staged for processing.
Work – Run computations on the device
The host invokes computational functions on the device, which operate entirely within device memory.
Example functions include:
modmul_ttc<T>(a, b, p_scalar, result);
ntt<T>(input, p, twiddles, result);
expand<T>(a, axis, repeats, result);
Each function receives pointers to input and output tensor.
The hardware implementation processes these inputs and writes results directly to the output tensor memory location provided by HEAL.
✓ Intermediate results remain on the device. ✘ They are not automatically visible to the host after each function.

Last updated