Capacity Planning for Server Farms:
How many servers are optimal?
People
Adam Wierman
Takayuki Osogami
Mor Harchol-Balter
Alan Scheller-Wolf
| |
| Figure 1. An diagram of a multiserver priority queue. |
Motivation
A fundamental design decision in server farms is whether to use a large number of slow servers or a few fast servers. This question is difficult to answer even when arriving jobs are scheduled using the simplest policy: First Come First Served (FCFS). However, more commonly, jobs have different priority levels, e.g. High/Low, and jobs with higher priority levels are allowed to serve first. The analysis of these multiserver priority queues is a long standing open problem. The difficulty is that the Markov chain underlying the model of this system is infinite in m-dimensions when there are m priority classes, and while one-dimensionally infinite Markov chains with repeating structure are analytically tractable, two (and higher) dimensionally infinite Markov chains are typically not.
| |
| Figure 2. A diagram illustrating the optimal server configuration (given a fixed total service capacity) for high and low priority customers in a dual priority server farm as a function of the load and variability of the workload. Notice that low priority customers always prefer to have more servers than high priority customers. |
Results
We provide the first near-exact analysis of the m-class priority queue with k servers under general (phase-type) job sizes. The analysis uses a new technique called Recursive Dimensionality Reduction (RDR), which serves to recursively use the solution to an n−1 dimensional Markov chain in order to solve an n dimensional Markov chain. Though the technique is approximate, it can be made as accurate as desired. Using this new analysis, we address (for the first time) important design questions about how prioritization behaves in server farm architectures, e.g. “how does the impact of prioritization in multiserver systems compare to that in single server systems?” Further, we provide a detailed exploration of the question of how many servers are optimal given a fixed total service capacity.
Impact
In addition to its usefulness in studying multiserver systems, RDR has turned out to be useful in many other application domains, and has led to more than a dozen papers from our research group at Carnegie Mellon. The papers study a wide variety of different multiserver systems including: cycle stealing and task assignment. You can find a thorough summary in Taka's thesis.
Publications
-
Proceedings of INFOCOM, 2009.[show/hide abstract]Load balancing is a common approach to task assignment in distributed architectures. In this paper, we show that the degree of inefficiency in load balancing designs is highly dependent on the scheduling discipline used at each of the back-end servers. Traditionally, the back-end scheduler can be modeled as Processor Sharing (PS), in which case the degree of inefficiency grows linearly with the number of servers. However, if the back-end scheduler is changed to Shortest Remaining Processing Time (SRPT), the degree of inefficiency can be independent of the number of servers, instead depending only on the heterogeneity of the speeds of the servers. Further, switching the back-end scheduler to SRPT can provide significant improvements in the overall mean response time of the system as long as the heterogeneity of the server speeds is small.
-
Performance Evaluation Review, 2008. 36(2):110-112.[show/hide abstract]Load balancing is a common approach to distributed task assignment. In this paper we study the degree of inefficiency in load balancing designs. We show that the degree of inefficiency is highly dependent on the back-end scheduler.
-
Co-recipient of the CMU SCS Distinguished Dissertation Award
Finalist receiving Honorable Mention for the INFORMS OR in Telecommunications Dissertation Award Ph.D. Thesis, 2007. -
Performance Evaluation, 2006. 63(12):1253-1272.[show/hide abstract]We ask the question, ``for minimizing mean response time, which is preferable: one fast server of speed 1, or k slow servers each of speed 1/k?'' Our setting is the M/PH/k system with two priority classes of customers, high priority and low priority, where PH is a phase-type distribution. We find that multiple slow servers are often preferable, and we demonstrate exactly how many servers are preferable as a function of the load and service time distribution. In addition, we find that the optimal number of servers with respect to the high priority jobs may be very different from that preferred by low priority jobs, and we characterize these preferences. We also study the optimal number of servers with respect to overall mean response time, averaged over high and low priority jobs. Lastly, we ascertain the effect of the service demand variability of high priority jobs on low priority jobs.
-
Queueing Systems, 2005. 51:331-360.[show/hide abstract]We present the first near-exact analysis of an M/PH/k queue with m > 2 preemptive-resume priority classes. Our analysis introduces a new technique, which we refer to as Recursive Dimensionality Reduction (RDR). The key idea in RDR is that the m-dimensionally infinite Markov chain, representing the m class state space, is recursively reduced to a 1-dimensionally infinite Markov chain, that is easily and quickly solved. RDR involves no truncation and results in only small inaccuracy when compared with simulation, for a wide range of loads and variability in the job size distribution. Our analytic methods are then used to derive insights on how multi-server systems with prioritization compare with their single server counterparts with respect to response time. Multi-server systems are also compared with single server systems with respect to the effect of different prioritization schemes -- ``smart'' prioritization (giving priority to the smaller jobs) versus ``stupid'' prioritization (giving priority to the larger jobs). We also study the effect of approximating m class performance by collapsing the $m$ classes into just two classes.
-
Performance Evaluation Review, 2004. 32(2):3-5.

