.. _how_to_performance:

Optimize Performance with Parallel Execution
============================================

``pyrauli`` is designed for high-performance simulation, and its C++ backend
includes a parallel execution engine powered by OpenMP. This guide shows how
to leverage this feature to significantly speed up simulations, especially
when working with many observables.

The `RuntimePolicy`
-------------------

The execution strategy is controlled by a :py:class:`~pyrauli.RuntimePolicy` object.
``pyrauli`` provides two policies:

* ``pyrauli.seq``: A sequential (single-threaded) execution policy.
* ``pyrauli.par``: A parallel (multi-threaded) execution policy.

You can pass these policies to the ``runtime`` parameter of the simulation methods.

Running a Single Observable in Parallel
---------------------------------------

You can achieve a performance boost even when simulating just a single observable
by using the parallel runtime. The internal operations for evolving the observable—especially
during splitting gates like Rz—are parallelized across multiple CPU cores.

While this is faster than sequential execution, it is not as efficient as parallel
batching. The overhead of managing threads is better amortized when distributed
across many independent observables.

.. literalinclude:: /../tests/snippets/test_performance_guide.py
   :language: python
   :start-after: # [single_parallel]
   :end-before: # [single_parallel]
   :dedent: 4

Batching Observables for Maximum Throughput
-------------------------------------------

The greatest performance benefit of the parallel engine comes from batching—that is,
running the circuit simulation for a *list* of observables in a single call.
``pyrauli`` will automatically distribute the work across multiple CPU cores.

The example below computes the expectation values for three different observables
in one parallel call.

.. literalinclude:: /../tests/snippets/test_performance_guide.py
   :language: python
   :start-after: # [batch_observables]
   :end-before: # [batch_observables]
   :dedent: 4

.. TIP::
    Always prefer passing a list of observables to a single ``.expectation_value()``
    or ``.run()`` call over looping in Python. The performance gain from
    processing the batch in C++ is substantial.

Enabling Parallelism in Qiskit
-------------------------------

When using ``pyrauli`` with Qiskit, you can enable parallel execution by default
by passing the policy during the backend or estimator's initialization.

.. literalinclude:: /../tests/snippets/test_performance_guide.py
   :language: python
   :start-after: # [qiskit_parallel]
   :end-before: # [qiskit_parallel]
   :dedent: 4