My Research
I am a part of the Rigorous System Design Group (RSRG, pronounced "resurge") at Caltech. RSRG is not your ordinary systems group -- it is distinguished by it's rigorous/analytic approach to design. Everyone in the group both derives new theory and builds systems.
The particular focus of my research is:
"Better design through modeling and measurement"
My research focuses on using mathematical models to provide insight into the design of computer systems. I apply analytic models and tools that are traditionally used in the operations research community in order to evaluate the impact of design decisions in systems such as web servers, routers, databases, wireless networks, and beyond. My work applies and often extends techniques in stochastic modeling, queueing theory, scheduling theory, and game theory.
Some examples of questions that I like to think about are:
-
What mechanisms cause the web and social networks to look so much alike?
There are many important characteristics that are common across the web and a wide variety of social networks. For example, in many cases, power law degree distributions and eigenvalues have been observed, as have small world properties such as high clustering and small diameters (which can be found by decentralized agents). Amazingly, despite decades of research we still don't have a simple model that captures all of these properties! Further, the usefulness of these properties for routing algorithms and information propagation are only beginning to be understood. Our answers are coming soon... -
Can we benefit from taking a non-cooperative approach to cooperative control problems?
Distributed cooperative control problems are everywhere, and are especially important in wireless and sensor network applications. Designing robust distributed control algorithms is especially challenging. We propose an inherently robust approach to distributed control that is based on formulating the distributed agents as selfish players in a non-cooperative, engineered game. But, then the question is how to design the game so that the performance of the system is near-optimal? Our answers... -
When balancing energy usage and delay, what is the optimal speed to run the server?
No longer is "faster" always "better." Modern systems must trade off speed and energy concerns: running faster lowers delays but increases power usage. So, the question is, how fast should the system run? and how much improvement does dynamic speed scaling provide? Our answers... -
How do the load balancer and the back-end scheduler interact in server farms? ...and should this impact the system architecture?
Server farms are now ubiquitous and almost all use some form of load balancing dispatcher. But, how far is load balancing from optimal dispatching, and how does the answer depend on the back-end scheduler? Further, how should the server farm architecture change depending on the back-end scheduler? Our answers... -
Can a scheduler optimize the response-time tail across all workloads?
We have long known that SRPT optimizes the mean response time. But, when it comes to the tail of the response time distribution, SRPT is optimal when the job size distribution is heavy-tailed, but is bad when job sizes are light-tailed. In contrast, FCFS is optimal when job sizes are light-tailed, but is bad when job sizes are heavy-tailed. Is it possible for a policy to be optimal across all workloads? Our answers are coming soon... -
How unfairly are large jobs treated when small jobs are given priority?
Modern designs for web servers, routers, etc often improve user response times by giving priority to small job sizes. But, this leads to worries about whether large job sizes receive fair performance. So, we need to ask: How much does giving priority to requests for small files starve large files? Our answers... Do analytical results about scheduling hold in practice?
The policies used in real systems often differ significantly from the idealized policies studied analytically. Thus, the results proven about idealized policies cannot be immediately applied to real systems. So, we need to understand: How close is the performance of the policies used in practice to the performance of the idealized policies studied analytically? Our answers...How many servers are optimal in a multiserver system?
Given the choice between many (cheap) slow servers and a few (expensive) fast servers, how can you decide which is better? What properties of the workload determine the answer? Our answers...Should I use an open or closed workload generator?
Workload generators are an invaluable resource for evaluating the performance of new designs, and you probably use one all the time. Do you know whether the one you use is open or closed? How big a difference does this make? And, how can you decide whether to use an open, closed, or partly-open model? Our answers...