## HybridBuffer The HybridBuffer is a stateful, FIFO buffer that combines a deque for fast appends with a contiguous circular buffer for efficient, advancing reads. The synchronization between the deque and the circular buffer can be immediate, upon threshold reaching, or on demand, allowing for flexible data management strategies. This buffer is designed to be agnostic to the array library used (e.g., NumPy, CuPy, PyTorch) via the Python Array API standard. ### Basic Reading and Writing Behaviour The following diagram illustrates the states of the HybridBuffer across data writes and reads when `update_strategy="on_demand"`: ![HybridBuffer Basic States](img/HybridBufferBasic.svg) **Figure 1** A. In the initial state, the buffer is empty, with no data in either the deque or the circular buffer. * deq_len=0; available=0, tell=0 B. After we `write()` 4 samples, the deque contains the new data, but the circular buffer is still empty. * deq_len=4; available=4, tell=0 C. After we `write()` 4 more samples, the deque now has 2 messages, each with 4 samples, and the circular buffer remains untouched. * deq_len=8; available=8, tell=0 D. Panels D-F depict a single call to `read(4)` which is implemented as calls to other methods. If we don't have 4 unread samples in the circular buffer, but we do have >= 4 samples 'available' (i.e., including the deque), then a `flush()` is performed: the entirety of the data in the deque are copied to the circular buffer and the deque is cleared. * deq_len=0; available=8, tell=0 * TODO: Currently `flush()` copies the data twice, once from the deque to a contiguous array, and then from that contiguous array to the circular buffer. This should be optimized to copy directly from the deque to the circular buffer. E. Next we `peek(4)` which returns the first 4 samples from the circular buffer; the return value may be a view on the data if the data are contiguous in the circular buffer, or a copy if the data are not contiguous. Note that the tail (read pointer) does not advance with `peek()`. * deq_len=0; available=8, tell=0 F. Finally, we `seek(4)` to advance the tail. * deq_len=0; available=4, tell=4 G. We `write()` 4 more samples, which are appended to the deque, leaving the circular buffer unchanged from the previous step. * deq_len=4; available=8, tell=4 H. We then `read(4)` again. This time, a `flush()` is not triggered because we have enough unread samples in the circular buffer, but `peek(4)` and `seek(4)` are still called. The read pointer advances by 4, leaving 0 unread samples in the circular buffer and 4 in the deque. * deq_len=4; available=0, tell=8 Note: `peek(n)` and `seek(n)`, where `n` > `n_available` will raise an error. However, `peek(None)` will return all available samples without error, and `seek(None)` will advance the tail to the end of the available data. ### Overflow Behaviour The criteria to trigger an overflow are as follows: * the deque has more data than there is space in the circular buffer, where space is the combination of previously read samples and unwritten samples in the circular buffer. * the caller triggers a flush either manually (`flush()`) or by requesting (via `read`, `peek`, or `seek`) more samples than are available in the circular buffer but not more than the total size of the available samples in the buffer + available samples in the deque. ![HybridBuffer Overflow Behaviour](img/HybridBufferOverflow.svg) **Figure 2** A. We start with a circular buffer that has been running for a while (it has wrapped around several times). At this particular moment, we have more data in the deque (12) than we have room in the buffer (8). The remaining figures describe what happens when `flush()` is called with different overflow strategies. The samples are labeled to make it easier to follow the flow of data. * deq_len=12; available=20, tell=1 B. "warn-overwrite": If the overflow_strategy is set to 'warn-overwrite', the HybridBuffer will log a warning and overwrite the oldest data in the circular buffer with the new data from the deque. Here, samples 'a-d' are lost. * deq_len=0; available=16, tell=0 C. "drop": As much as possible of the data from the deque are copied into the circular buffer, but remaining data are dropped. In this case, samples 'q-t' are lost. * deq_len=0; available=16, tell=0 D. "grow": The HybridBuffer will attempt to grow the circular buffer to the lesser of double its current size or the size required to accommodate all read + unread + deque data. If the buffer cannot grow (e.g., due to memory constraints; default max_size is 1GB), it will raise an error. * deq_len=0; available=20, tell=8 Additionally, one can configure the HybridBuffer overflow_strategy to 'raise', which will raise an error if there is insufficient space (empty or read samples) in the buffer to perform the flush. There are a few mitigations to defer flushing to help prevent overflows: * If the requested number of samples to read, peek, or seek is less than the number of unread samples in the circular buffer, then no flush is performed. * Helper methods `peek_at(k, allow_flush=False)` (False is default), and `peek_last()` will retrieve the target sample from the buffer-OR-deque without flushing. * Be cautious relying on repeated calls to `peek_at(k, allow_flush=False)` as it scans over the items in the deque which can be slow. * When calling `read(n)`, if a flush is necessary, and it will cause an overflow, and the overflow could be prevented with a pre-emptive read up to `n`, then it will do the read in 2 parts. First it will call `peek(n_unread_in_buffer)` and `seek(n_unread_in_buffer)` to read the unread samples in the circular buffer. Second, it will call `peek(n_remaining)` and `seek(n_remaining)` to trigger a flush -- which should no longer cause an overflow -- then read the remaining requested samples and stitch them together. ### Advanced Pointer Manipulation The previous section describes how `read`, `peek`, `seek`, and `peek_at` function in normal use cases. It is also possible to call `seek` with a negative value, which will attempt to move the tail pointer backwards over previously-read (or previously sought-over) data by that many samples. `seek` returns the number of samples that were actually moved, which may be less than the requested value if there was insufficient room. Negative seeks can only rewind into previously read data, and positive seeks can only advance into unread data, possibly including data that gets flushed from the deque. ## HybridAxisBuffer The `HybridAxisBuffer` carries the semantics of the `HybridBuffer` but it is designed to handle either a `LinearAxis` or a `CoordinateAxis`. Its `write` method expects an axis object and its `peek` and `read` methods return an axis, not just the data. For a `LinearAxis`, the `HybridAxisBuffer` simply maintains the `gain`, the `offset`, and the 'number of samples available'. Since this does not store actual data, it has no capacity. If this object is intended to be synchronized with another `HybridBuffer`-using object that does have a capacity, then the other object should be manipulated first and then the number of samples actually moved should be used to call the `HybridAxisBuffer`'s methods, otherwise these objects will be out of sync. For a `CoordinateAxis`, the `HybridAxisBuffer` maintains the `data` in a `HybridBuffer` and thus behaves like a `HybridBuffer` with respect to the capacity. The returned `CoordinateAxis` object might have its `.data` field as a view on the data in the buffer, so it should not be modified in place. ## HybridAxisArrayBuffer This is a convenience class that combines the `HybridAxisBuffer` and `HybridBuffer` into a single object that can be used to manage both axis and data in a single object. This class is particularly useful when you need to manage both the axis information and the data samples together, as is the case for an `AxisArray` object. Its `write` method expects an `AxisArary` object and its `peek` and `read` methods return an `AxisArray` object. Note that the return object's `.data` field might be a view on the data in the buffer so it should not be modified in place. Similarly so for the `CoordinateAxis` data.