When dealing with distributed systems, the unsung hero frequently happens to be the unglamorous, yet omnipresent concept of queuing. In this write-up, we're about to get our hands dirty exploring the intimate details of different queuing strategies, namely: First-In, First-Out (FIFO), Last-In, First-Out (LIFO), and Random Access. We will also introduce a novel, hybrid queuing strategy which we believe is optimal for AI inference scenarios.
The Landscape of Queuing Strategies
Queues form the backbone of inter-process communication in distributed systems. The choice of a queuing strategy has far-reaching implications on performance characteristics and trade-offs. We'll start by providing a high-level overview of the three most commonly used strategies.
- First-In, First-Out (FIFO)
- FIFO queues are the epitome of fairness in queuing theory. The task that enters the queue first is the one that gets processed first. This predictability lends to smoother processing times, but there's a cost: recovery times. In the event of operational incidents, or a sudden influx of requests, FIFO queues might struggle to regain balance.
- Last-In, First-Out (LIFO)
- In stark contrast, LIFO queues operate on the principle of recency. The latest task to enter the queue is the first to be processed. The result is higher variability in processing times but superior recovery performance following an operational incident. However, as of writing, no major cloud provider offers a managed LIFO queuing service, hence one must undertake a custom implementation, for example, with DynamoDB.
- Random Access Queues
- Random Access Queues represent the middle-ground between FIFO and LIFO. They don't strictly adhere to a chronological processing order, and thus, they offer moderate processing times. Managed services like Amazon SQS provide robust random access queuing solutions.
FIFO Queues: Benevolent Dictator of Orderliness
FIFO Queues, while intuitively fair, impose a stringent order of operations, leading to more predictable processing times. The trade-off, however, comes in the form of high recovery times post an operational incident. When request volume suddenly skyrockets, or there's a need to rapidly scale up, FIFO queues often take longer to regain their equilibrium. This can be better understood through the following simulation.
Overprovisioned FIFO Queue
Near Capacity FIFO Queue
Simulation notes: incoming tasks are modeled as a Poisson process, with each task having a uniformly distributed processing time with mean 1 and standard deviation 0.2. The processing time percentiles chart is based on the time taken to finish processing the last 200 tasks. The queue depth chart shows the number of tasks in the queue over the last 200 time steps. Tasks that are over SLA are colored in red.
LIFO Queues: Champion of Recency
LIFO Queues, in contrast, prioritize the latest tasks, resulting in higher variability during normal operations. However, LIFO's key selling point is its resilience in recovery post an operational incident. The reasoning is simple: LIFO queues don't need to empty an entire backlog before processing new requests, leading to quicker restoration of service. Keep in mind though, that setting up a LIFO queue requires a more hands-on approach.
Overprovisioned LIFO Queue
Near Capacity LIFO Queue
Random Access Queues: The Middle Ground
Random Access Queues, unlike FIFO or LIFO, offer a mix of both worlds. By not enforcing a strict order, they provide average task processing times. This flexibility makes them easy to implement, especially with services like Amazon SQS offering robust managed solutions.
Overprovisioned Random Access Queue
Near Capacity Random Access Queue
Introducing the SLA-Aware Queue: a Hybrid Queuing Strategy
We propose a new hybrid strategy, which we call the SLA-aware queue. This innovative approach draws from the strengths of both FIFO and LIFO while mitigating their shortcomings.
So, how does this work? The hybrid approach maintains the orderly procession characteristic of FIFO queues, ensuring the processing of tasks within the Service Level Agreement (SLA) deadline, in the order they arrive. Meanwhile, tasks that have surpassed the SLA threshold are deferred, to be processed after the completion of tasks within the SLA timeline.
Contrary to pure LIFO, where the most recent task always takes precedence, the hybrid strategy puts the emphasis on handling tasks within their SLA first. Once these are cleared, any task past its SLA is attended to. This approach combines FIFO's fairness and predictability during regular operation with LIFO's advantage of faster recovery time following an operational incident.
However, this strategy doesn't come without its challenges. Unlike the readily available managed FIFO and Random Access queues, this approach will require a custom solution, similar to LIFO queues, likely leveraging a database system like DynamoDB. But, for systems with specific demands—like AI inference applications—it's worth the effort. The ability to provide low processing time and variability, as well as rapid recovery post-incident, makes the hybrid queuing strategy a compelling solution.
Overprovisioned SLA-Aware Queue
Near Capacity SLA-Aware Queue
Interactive Experimentation: Understand through Action
In order to gain an intuitive understanding of these queuing strategies, we provide an interactive widget. Here, you can simulate different operational scenarios and visualize the performance characteristics of each queue under various conditions. Try simulating a surge in tasks or loss of processing capacity and see how each queue responds.
Random Access Queue
Choosing a queuing strategy is not a one-size-fits-all affair. It's an intricate decision that heavily depends on your system's requirements, tolerances, and objectives. FIFO queues, the hallmark of order, can provide a stable and predictable environment at the cost of slower recovery times. LIFO queues, though resulting in higher variability, prove resilient in recovery scenarios. Random Access queues, while moderate in their approach, provide a balanced and easily implementable solution.
Our newly introduced SLA-aware queuing strategy, although challenging to implement, offers a potentially transformative approach. By combining the best traits of FIFO and LIFO, it provides a balanced solution, especially for high-demand, cost-sensitive applications such as AI inference.
As distributed systems continue to evolve, so will the need for effective and efficient queuing strategies. We hope this exploration has provided you with valuable insights to aid in making informed decisions about queuing in your distributed systems. Remember, every queue has a purpose; the key is to understand and align it with your system's needs.
We look forward to continuing this journey of exploration and discovery in the realm of distributed systems, and we are always here to engage in a deeper dialogue. If you have any comments, questions, or insights, don't hesitate to reach out! Until next time, happy queuing!