Traffic optimizations (TOs, e.g. flow scheduling, load balancing) in datacenters are difficult online decision-making problems. Previously, they are done with heuristics relying on operators’ understanding of the workload and environment. Designing and implementing proper TO algorithms thus take at least weeks. Encouraged by recent successes in applying deep reinforcement learning (DRL) techniques to solve complex online control problems and leveraging the long-tail distribution of datacenter tr…
Read moreTraffic optimizations (TOs, e.g. flow scheduling, load balancing) in datacenters are difficult online decision-making problems. Previously, they are done with heuristics relying on operators’ understanding of the workload and environment. Designing and implementing proper TO algorithms thus take at least weeks. Encouraged by recent successes in applying deep reinforcement learning (DRL) techniques to solve complex online control problems and leveraging the long-tail distribution of datacenter traffic, we develop a two-level DRL system, AuTO, mimicking the Peripheral and Central Nervous Systems in animals, to solve the scalability problem. Peripheral systems (PSs) reside on end-hosts, collect flow information, and make TO decisions locally with minimal delay for short flows. PSs decisions are informed by a central system (CS), where global traffic information is aggregated and processed. CS further makes individual TO decisions for long flows. With CS&PS, AuTO is an end-to-end automatic TO system that can collect network information, learn from past decisions, and perform actions to achieve operator-defined goals. We implement AuTO with popular machine learning frameworks and commodity servers, and deploy it on a 32-server testbed. Extensive experimental results show AuTO significantly reduce TO turn-around time and demonstrated up to 48.14 reduction in average FCT.