Each function receives pointers to input and output tensor.
The hardware implementation processes these inputs and writes results directly to the output tensor memory location provided by HEAL.
If additional memory is required to store the results, the host is responsible for allocating this memory and providing the correct pointer before calling device functions.
✓ Intermediate results remain on the device.
✘ They are not automatically visible to the host after each function.
3
Download – Transfer Results to Host
Once computations are complete, the host explicitly transfers results back using:
device_to_host<T>(device_tensor);
This step copies the computed tensor back into host memory for further use, such as decryption, visualization, or exporting.