Array API Compatibility#

ezmsg-learn uses the Array API standard to allow processors to operate on arrays from different backends — NumPy, CuPy, PyTorch, and others — without code changes.

How It Works #

Modules that support the Array API derive the array namespace from their input data using array_api_compat.get_namespace():

from array_api_compat import get_namespace

def process(self, data):
    xp = get_namespace(data)       # numpy, cupy, torch, etc.
    result = xp.linalg.inv(data)   # dispatches to the right backend
    return result

This means that if you pass a CuPy array, all computation stays on the GPU. If you pass a NumPy array, it behaves exactly as before.

Helper utilities from ezmsg.sigproc.util.array handle device placement and creation functions portably:

array_device(x) — returns the device of an array, or None
xp_create(fn, *args, dtype=None, device=None) — calls creation functions (zeros, eye) with optional device
xp_asarray(xp, obj, dtype=None, device=None) — portable asarray

Module Compatibility #

The table below summarises the Array API status of each module.

Fully compatible #

These modules perform all computation in the source array namespace.

Module	Notes
`process.ssr`	LRR / self-supervised regression. Full Array API.
`model.cca`	Incremental CCA. Replaced `scipy.linalg.sqrtm` with an eigendecomposition-based inverse square root using only Array API ops.
`process.rnn`	PyTorch-native; operates on `torch.Tensor` throughout.

Mostly compatible (with NumPy boundaries)#

These modules use the Array API for data manipulation but fall back to NumPy at specific points where a dependency requires it.

Module	NumPy boundary	Reason
`model.refit_kalman`	`_compute_gain()`	`scipy.linalg.solve_discrete_are` has no Array API equivalent. Matrices are converted to NumPy for the DARE solver, then converted back.
`model.refit_kalman`	`refit()` mutation loop	Per-sample velocity remapping uses `np.linalg.norm` on small vectors and scalar element assignment.
`process.refit_kalman`	Inherits boundaries from model	State init and output arrays use the source namespace.
`process.slda`	`predict_proba`	sklearn `LinearDiscriminantAnalysis` requires NumPy input.
`process.adaptive_linear_regressor`	`partial_fit` / `predict`	sklearn and river models require NumPy / pandas input.
`dim_reduce.adaptive_decomp`	`partial_fit` / `transform`	sklearn `IncrementalPCA` and `MiniBatchNMF` require NumPy input.

Not converted #

These modules use NumPy directly. Conversion would provide little benefit because the underlying estimator is the bottleneck.

Module	Reason
`process.linear_regressor`	Thin wrapper around sklearn `LinearModel.predict`. Could be made compatible if sklearn’s `array_api_dispatch` is enabled (see below).
`process.sgd`	sklearn `SGDClassifier` has no Array API support.
`process.sklearn`	Generic wrapper for arbitrary models; cannot assume Array API support.
`dim_reduce.incremental_decomp`	Delegates to `adaptive_decomp`; trivial numpy usage (`np.prod` on Python tuples).

sklearn Array API Dispatch #

scikit-learn 1.8+ has experimental support for Array API dispatch on a subset of estimators. Two estimators used in ezmsg-learn are on the supported list:

Estimator	Used in	Constraint
`LinearDiscriminantAnalysis`	`process.slda`	Requires `solver="svd"` (the `"lsqr"` solver with `shrinkage` is not supported)
`Ridge`	`process.linear_regressor`	Requires `solver="svd"`

To use dispatch, enable it before creating the estimator:

from sklearn import set_config
set_config(array_api_dispatch=True)

Warning

array_api_dispatch is marked experimental in sklearn.
Solver constraints (solver="svd") may produce slightly different numerical results compared to other solvers.
Enabling dispatch globally may affect other sklearn estimators in the same process.
ezmsg-learn does not enable dispatch by default.

Estimators that do not support Array API dispatch:

IncrementalPCA, MiniBatchNMF — only batch PCA is supported
SGDClassifier, SGDRegressor, PassiveAggressiveRegressor
All river models

Writing Array API Compatible Code #

When adding or modifying processors in ezmsg-learn, follow these patterns.

Deriving the namespace #

Always derive xp from the input data, not from a hardcoded numpy:

from array_api_compat import get_namespace
from ezmsg.sigproc.util.array import array_device, xp_create

def _process(self, message):
    xp = get_namespace(message.data)
    dev = array_device(message.data)

Transposing matrices #

The Array API does not support .T. Use xp.linalg.matrix_transpose():

# Before (numpy-only)
result = A.T @ B

# After (Array API)
_mT = xp.linalg.matrix_transpose
result = _mT(A) @ B

Creating arrays #

Use xp_create to handle device placement portably:

# Before
I = np.eye(n)
z = np.zeros((m, n), dtype=np.float64)

# After
I = xp_create(xp.eye, n, device=dev)
z = xp_create(xp.zeros, (m, n), dtype=xp.float64, device=dev)

Handling sklearn boundaries #

When calling into sklearn (or other NumPy-only libraries), convert at the boundary and convert back:

from array_api_compat import is_numpy_array

# Convert to numpy for sklearn
X_np = np.asarray(X) if not is_numpy_array(X) else X
result_np = estimator.predict(X_np)

# Convert back to source namespace
result = xp.asarray(result_np) if not is_numpy_array(X) else result_np

Checking for NaN #

Use xp.isnan instead of np.isnan:

if xp.any(xp.isnan(message.data)):
    return

Norms #

Use xp.linalg.matrix_norm (Frobenius by default) instead of np.linalg.norm for matrices. For vectors, use xp.linalg.vector_norm.

Array API Compatibility#

This Page