# Memory and Execution Model

## Overview

HEAL operates in a **heterogeneous execution environment**, where computations are distributed between two types of processors:

| Component                               | Role                                                                                                                                      |
| --------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| **Host** (CPU)                          | Acts as the control unit and manages data flow.                                                                                           |
| **Device** (Accelerator / FHE Hardware) | Executes computationally expensive homomorphic operations on tensors. It processes data according to instructions received from the host. |

***

## Execution Flow&#x20;

Below we explain the three-phase execution model used to manage data across host and device memory.

{% stepper %}
{% step %}

### **Upload** – *Transfer input to device*

The host prepares input data in tensor form and transfers it to the device memory using:

```cpp
host_to_device<T>(host_tensor);
```

* `host_tensor` is a tensor in CPU memory.
* The function returns a `DeviceTensor<T>`, a pointer-based wrapper referencing allocated memory on the device.

No computation is done yet; the tensor is simply staged for processing.
{% endstep %}

{% step %}

### &#x20;**Work** – *Run computations on the device*

The host invokes computational functions on the device, which operate entirely within device memory.&#x20;

Example functions include:

```cpp
modmul_ttc<T>(a, b, p_scalar, result);
ntt<T>(input, p, twiddles, result);
expand<T>(a, axis, repeats, result);
```

* Each function receives pointers to input and output tensor.&#x20;
* The hardware implementation processes these inputs and writes results directly to the output tensor memory location provided by HEAL.&#x20;

{% hint style="info" %}
If additional memory is required to store the results, the host is responsible for allocating this memory and providing the correct pointer before calling device functions.
{% endhint %}

✓  Intermediate results remain on the device.\
✘  They are **not automatically visible** to the host after each function.
{% endstep %}

{% step %}

### Download – *Transfer Results to Host*

Once computations are complete, the host explicitly transfers results back using:

```cpp
device_to_host<T>(device_tensor);
```

This step copies the computed tensor back into host memory for further use, such as decryption, visualization, or exporting.
{% endstep %}
{% endstepper %}

<figure><img src="https://72044036-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FMRAVIxKQ9hlGIcjpg3pD%2Fuploads%2FIWFcPcUvQtKsdhwwx5Uz%2FHealTensorFlow3.png?alt=media&#x26;token=52cb97d9-296c-4f74-9aa4-354e914ca5e3" alt=""><figcaption><p>HEAL memory management diagram </p></figcaption></figure>

***
