A New Theory for Scheduling: Beyond Idealized Policies
People
Adam Wierman
Mor Harchol-Balter
Takayuki Osogami
Bert Zwart
Misja Nuyens
Motivation
Computer system designers are often guided by analytic results about scheduling policies, but designers hardly ever actually implement the idealized policies studied by theoreticians. For instance, recent designs of web servers and wireless access points have been motivated by the fact that Shortest-Remaining-Processing-Time (SRPT) optimizes mean response time. However, though the heuristic of “giving priority to small jobs” served as a guiding principle in these applications, in none of these cases was pure SRPT implemented. Instead the designers needed to adjust SRPT due to a wide variety of real-world factors such as fairness concerns and overheads in maintaining the remaining sizes of jobs. In order to develop a theory that can provide results for the policies used in practice, one cannot focus on individual scheduling policies, as is done in traditional queueing theory. Instead, our goal is to introduce a new framework for studying scheduling policies where the focus is on classifications of scheduling policies as opposed to individual policies. For example, instead of studying SRPT, classes formalizing the scheduling heuristic of “giving priority to small jobs” and the scheduling technique of “prioritizing based on remaining size” are studied.
| |
Results
To this point, we have introduced and analyzed a number of scheduling classifications including. Some are based on scheduling techniques, such as “remaining size based policies” and “age based policies”, while others are based on scheduling heuristics, such as “prioritizing small/large jobs”. We can prove a number of interesting results about these classes. As an example, the SMART class was introduced in Sigmetrics 2005 and captures the heuristic of “prioritizing small jobs.” It is defined by three simple axioms that formalize the intuitive notion of prioritizing small jobs in a way that includes a wide range of policies, but still allows tight performance guarantees to be proven for the class. In particular, we have proven that all SMART policies have a mean response time within a factor of 2 of optimal. Further, we have proven that the tail of the response time distribution under SMART policies is asymptotically equivalent to that of SRPT in both the large buffer and many sources large deviations regimes. These results serve as bounds on the effect of the small tweaks made to SRPT in practice.
|
|
| Figure 2. An illustration of the performance improvements of any SMART policy over PS. Thus, as long as you "prioritize small jobs" you obtain big performance gains, even if you don't use pure SRPT. |
Impact
Following the introduction of the SMART class, other researchers have also become interested in scheduling classifications. This led to collaborations with Bert Zwart, Misja Nuyens, and Sanjay Shakkottai on further analyses of the SMART class. In addition, other researchers have started to introduce their own scheduling classifications. For example, Friedman & Hurley, Feng, Misra, & Rubenstein, and Nunez-Queija & Kherani have all introduced interesting classifications of other scheduling techniques and heuristics.
Publications
-
Under submission.[show/hide abstract]Scheduling policies that are biased toward small jobs have received growing attention due to their superior mean delay performance. Such policies include Shortest-Remaining-Processing-Time (SRPT), Preemptive-Shortest-Job-First (PSJF), and Least-Attained-Service (LAS). In this paper, we study the delay distribution of LAS and the class of scheduling policies called SMART-LD (SMAll-Response- Time for Large-Deviations) that included SRPT, PSJF, and their variants to understand policies that prioritized short jobs. We study the delay distribution (rate function) of the SMART-LD class and LAS in a discrete-time queueing system under the many sources large deviations regime. We prove that all SMART-LD policies have the same asymptotic delay distribution as SRPT and illustrate the improvements SMART-LD policies and LAS make over First-Come-First-Served (FCFS). Furthermore, we show that the delay distribution of SMART-LD policies stochastically improves upon the delay distribution of LAS across all job sizes.
-
Operations Research, 2008. 56(1):88-101.[show/hide abstract]Recently, the class of SMART scheduling policies (disciplines) has been introduced in order to formalize the common heuristic of ``biasing toward small jobs.'' We study the tail of the sojourn-time (response-time) distribution under both SMART policies and the Foreground-Background policy (FB) in the GI/GI/1 queue. We prove that these policies behave very well under heavy-tailed service times, but behave poorly under light-tailed service times. Specifically, for heavy-tailed service times, we show that the sojourn-time tail under FB and all SMART policies are equal to that of the service time tail, up to a constant. In contrast, for light-tailed service times with no mass in the endpoint of the distribution, we show that, on a logarithmic scale, the sojourn-time tail of FB and all SMART policies is as large as possible.
-
Proceedings of ACM Sigmetrics, 2008.[show/hide abstract]Motivated by the optimality of Shortest Remaining Processing Time (SRPT) for mean response time, in recent years many computer systems have used the heuristic of favoring small jobs in order to dramatically reduce user response times. However, rarely do computer systems have knowledge of exact remaining sizes. In this paper, we introduce the class of epsilon-SMART policies, which formalizes the heuristic of favoring small jobs in a way that includes a wide range of policies that schedule using inexact job-size information. Examples of epsilon-SMART policies include (i) policies that use exact size information, e.g., SRPT and PSJF, (ii) policies that use job-size estimates, and (iii) policies that use a finite number of size-based priority levels. For many epsilon-SMART policies, e.g., SRPT with inexact jobsize information, there are no analytic results available in the literature. In this work, we prove four main results: we derive upper and lower bounds on the mean response time, the mean slowdown, the response-time tail, and the conditional response time of epsilon-SMART policies. In each case, the results explicitly characterize the tradeoff between the accuracy of the job-size information used to prioritize and the performance of the resulting policy. Thus, the results provide designers an understanding of how accurate job-size information must be in order to achieve desired performance guarantees.
-
Invited paper in Performance Evaluation Review, 2007. 34(4):4 - 12.[show/hide abstract]The growing trend in computer systems towards using scheduling policies that prioritize jobs with small service requirements has resulted in a new focus on the fairness of such policies. In particular, researchers have been interested in whether prioritizing small job sizes results in large jobs being treated
-
Co-recipient of the CMU SCS Distinguished Dissertation Award
Finalist receiving Honorable Mention for the INFORMS OR in Telecommunications Dissertation Award Ph.D. Thesis, 2007. -
Performance Evaluation, 2007. 64:1009-1028. Also appeared in the Proceedings of IFIP Performance 2007.[show/hide abstract]We present a simple mean value analysis (MVA) framework for analyzing the effect of scheduling within queues in classical asymmetric polling systems with gated or exhaustive service. Scheduling in polling systems finds many applications in computer and communication systems. Our framework leads not only to unification but also to extension of the literature studying scheduling in polling systems. It illustrates that a large class of scheduling policies behaves similarly in the exhaustive polling model and the standard M/GI/1 model, whereas scheduling policies in the gated polling model behave very differently than in an M/GI/1.
-
Proceedings of Allerton, 2007.[show/hide abstract]In recent years, the response times experienced by large job sizes have been the focus of a growing number of papers. Though results about many scheduling disciplines have appeared, to this point, results characterizing the response time of large job sizes have been limited to either mean value analysis or law of large numbers scalings. We will present a novel framework that unifies these results and provides new results characterizing the distributional behavior of large job sizes. Also, we will illustrate the impact these new results have (i) for the analysis of the response time tail, (ii) for the analysis of busy periods, and (iii) for predicting response times of customers upon arrival.
-
Performance Evaluation Review, 2006. 34(3):21-23.
-
Proceedings of ACM Sigmetrics, 2006.[show/hide abstract]Scheduling policies that prioritize short jobs have received growing attention in recent years. The class of SMART policies includes many such disciplines, e.g. Shortest Remaining Processing Time (SRPT) and Preemptive Shortest Job First (PSJF). In this work, we study the delay distribution of SMART policies and contrast this distribution with that of the Least-Attained-Service (LAS) policy, which indirectly favors short jobs by prioritizing jobs with the least attained service (age). We study the delay distribution (rate function) of LAS and the SMART class in a discrete-time queueing system under the many flows regime. Our analysis in this regime (large capacity and large number of flows) is enabled by the introduction of a two dimensional queue representation, which creates tie-break rules. These additional rules do not alter the policies, but greatly simplify their analysis. We demonstrate that the queue evolution of all the above policies can be described under this single two dimensional framework. We prove that all SMART policies have the same delay distribution as SRPT and illustrate the improvements SMART policies make over First-Come-First-Served (FCFS). Furthermore, we show that the delay distribution of SMART policies stochastically improves upon the delay distribution of LAS. However, the delay distribution under LAS is not too bad -- the distribution of delay under LAS for most jobs sizes still provides improvement over FCFS. Our results are complementary to prior work that studies delay-tail behavior in the large buffer regime under a single flow.
-
Proceedings of ACM Sigmetrics, 2005.[show/hide abstract]We define the class of SMART scheduling policies. These are policies that bias towards jobs with small remaining service times, jobs with small original sizes, or both, with the motivation of minimizing mean response time and/or mean slowdown. Examples of SMART policies include PSJF, SRPT, and hybrid policies such as RS (which biases according to the product of the remaining size and the original size of a job). For many policies in the SMART class, the mean response time and mean slowdown are not known or have complex representations involving multiple nested integrals, making evaluation difficult. In this work, we prove three main results. First, for all policies in the SMART class, we prove simple upper and lower bounds on mean response time. Second, we show that all policies in the SMART class, surprisingly, have very similar mean response times. Third, we show that the response times of SMART policies are largely insensitive to the variability of the job size distribution. In particular, we focus on the SRPT and PSJF policies and prove insensitive bounds in these cases.
-
Proceedings of ACM Sigmetrics, 2005.[show/hide abstract]In addition to providing small mean response times, modern applications seek to provide users predictable service and, in some cases, Quality of Service (QoS) guarantees. In order to understand the predictability of response times under a range of scheduling policies, we study the conditional variance in response times seen by jobs of different sizes. We define a metric and a criterion that distinguish between contrasting functional behaviors of conditional variance, and we then classify large groups of scheduling policies. In addition to studying the conditional variance of response times, we also derive metrics appropriate for comparing higher conditional moments of response time across job sizes. We illustrate that common statistics such as raw and central moments are not appropriate when comparing higher conditional moments of response time. Instead, we find that cumulant moments should be used.
-
Performance Evaluation Review, 2004. 32(2):12-13.
-
Best Student Paper Award recipient[show/hide abstract]
Proceedings of ACM Sigmetrics, 2003.It is common to evaluate scheduling policies based on their mean response times. Another important, but sometimes opposing, performance metric is a scheduling policy's fairness. For example, a policy that biases towards small job sizes so as to minimize mean response time may end up being unfair to large job sizes. In this paper we define three types of unfairness and demonstrate large classes of scheduling policies that fall into each type. We end with a discussion on which jobs are the ones being treated unfairly.
Figure 1. A diagram of the scheduling classifications that we have defined and studied so far.