Reinforcement Learning for Cloud Resource Management

Стоимость

2000 руб.

Содержание

Теория

Объем

33 лист.

Год написания

Купить за 2000

Описание работы

Работа пользователя Н. Симакин

Добрый день! Уважаемые студенты, Вашему вниманию представляется дипломная работа на тему: «Reinforcement Learning for Cloud Resource Management»

Аннотация

Задача оптимизации использования вычислительных ресурсов в облаке интересна как потребителям, так и облачным провайдерам. Клиенты могут арендовать виртуальные машины, и облачный провайдер должен следовать некоторым обязательствам ’service level agreement’ (SLA). С другой стороны, датацентры стремятся к минимизации потребления электричества, чтобы уменьшить операционные расходы. Возникает задача консолидации виртуальных машин, используя физические. Слишком плотное размещение будет выливаться в слишком большое количество нарушений SLA, и клиенты будут не довольны предоставляемыми услугами. Поэтому, провайдер ставит перед собой задачу найти баланс между этими двумя составляющими.

В данной работе, мы рассматриваем стандартную постановку задачи консолидации виртуальных машин (vectorbin packing) и усложняем ее формулировку, приближая новую сформулированную задачу к настоящей облачной среде. Далее, мы описываем подход к ее решению, основанный на обучении с под-креплением с использованием глубинного обучения (Deep Q-network), а также альтернативные эвристические подходы.

Мы сравниваем реализованные эвристики с алгоритмом обучения с подкреплением и показываем, что в некоторых экспериментах алгоритм может уменьшить количество миграций в 2.5 раза, при таком же качестве консолидации, как и лучшая из реализованных альтернатив. Также, мы предоставляем opensource реализацию всех экспериментов и инфраструктуры для их проведения и дальнейших исследований.

Keywords

Cloud Resource Management; Reinforcement Learning; Deep Q-learning Net-works; Vector bin-packing problem; Cloud scheduling algorithms; Virtual machines consolidation problem; Cloud power consumption;

Introduction

Today cloud providers are becoming more popular, replacing the traditional way of management and use of manually configured clusters of computers, as remote management is more flexible and resources can be easily scaled on demand. This elasticity is the result of a complex internal cloud system, which is supported by many engineers and big infrastructure, composed of tens of data centers across the globe. There are complicated scheduling algorithms, which decide where and how to execute clients’ jobs. There is an agreement between client and cloud platform — service level agreement (SLA), which guarantees some qualities on provided resources and bounds chances of violations. Cloud platforms usually try to minimize such violations as much as possible to provide service of better quality. However, there are power consumption bills for electricity, which also should be minimized to achieve a better economy. So, there is a trade-oﬀ between quality and support cost. Many studies show, that cloud data centers approximately consume about 1% electricity worldwide.
Many fields exist, where machine learning and reinforcement learning were suc-cessfully been applied. In reinforcement learning, the Alpha-Zero algorithm is famous due to its ability to beat the world’s best players in Chess and Go. Reinforcement learning now is the cutting-edge approach in modern robotics, which proves, that concept is quite useful not only in board games. These ideas could be successfully applied in other compute-intensive fields, especially which could be approximated as

dynamic environments with Markov Decision Process (MDP).
A complex cloud computing system can be decomposed into many components or small problems. One of the components of resource management is the virtual machines consolidation problem. Given the entire cloud environment state, we need to reconfigure virtual machines to physical machines mapping, improving resource utilization metrics, loading idle machines, or unloading overloaded machines. This problem can be strictly defined, introducing some optimization objectives. Such an objective can be a linear combination of service level agreement violations and the number of unloaded physical machines. Physical machines, not serving any virtual machines, can be safely powered oﬀ, reducing power consumption.
The virtual machine consolidation problem is NP-hard, many works tried to approximately solve this problem: genetic algorithms [1], heuristic algorithms [2], [4], reinforcement learning algorithms [3]. Genetic algorithms generally have better results, than heuristic-based, but reinforcement learning algorithms seem much more promising, as they exploit the same idea in a more general way and do not depend on some predefined genetic rules. Paper [3] (2014) showed improvement in both energy consumption and SLA violations by a few percent, but this improvement is negligible, comparing to results, achieved by the application of reinforcement learning in online placement algorithms, such as DRL-Cloud (2017) [5].
DRL-Cloud formulates more general problem than virtual machine consolidation. However, this formulation is overcomplicated by dependencies in jobs (usually cloud clients request independent virtual machines, such as web servers). Besides that, DRL-Cloud neural network architecture is unknown and results are not reproducible.
Algorithms that capture more general problem formulations can utilize more information about the cloud environment and behave better in real workloads. So, we decided to design a new framework for the consolidation problem, which accounts for SLA violations as well as power consumption.
We propose new ’consolidation and placement’ problem formulation and propose

new RL-based algorithm to this problem. We evaluate this algorithm on synthetic load, implement with simulator and compare with simple heuristics. Our exper-iments show, that RL-algorithm could capture dynamic load change and extract these patterns to reduce optimization objective, using some mixed non-trivial strat-egy between consolidation and optimal placement.
The work is organized as follows. Section 3 describes some related works — other approaches to solve similar problems or using Reinforcement learning in cloud resource optimization. Section 4 briefly describes the goals of the accomplished work. Section 5 strictly describes the extension for consolidation problem statement and optimization objective. Section 6 describes an applied reinforcement learning-based solution for the formulated problem, explaining all algorithm aspects in detail. Section 7 explains implementation details, the structure of implemented heuristics, Deep Q-network training details, and optimizations. Section 8 shows motivation behind the chosen framework. Section 9 provides a few experiments and comparisons.

References

H. Hallawi1, J. Mehnen1, H. He: ‘Multi-Capacity Combinatorial Ordering GA in Application to Cloud Resources Allocation and Eﬃcient Virtual Machines Consolidation‘, 2017 Future Generation Computer Systems - Elsevier.

A. Marotta, S. Avallone, ‘A Simulated Annealing Based Approach for Power Ef-ficient Virtual Machines Consolidation‘ 2015 IEEE 8th International Conference on Cloud Computing.

F. Farahnakian, P. Liljeberg, J. Plosila ‘Energy-Eﬃcient Virtual Machines Con-solidation in Cloud Data Centers using Reinforcement Learning‘, 2014 22nd Eu-romicro International Conference on Parallel, Distributed, and Network-Based Processing.

X Sun, Y Liu, W Wei, W Jing, C Zhao, 2019 “Based on QoS and energy eﬃ-ciency virtual machines consolidation techniques in cloud” Journal of Internet Technology

M Cheng, J Li, S Nazarian, 2018 ‘DRL-cloud: Deep reinforcement learning-basedresource provisioning and task scheduling for cloud service providers‘, 23rd Asia and South Pacific Design Automation Conference.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. ’Human-level control through deep reinforcement learning’. Nature, 518(7540):529–533, 02 2015.

Rachael Shaw, Enda Howley, Enda Barrett, ’An Advanced Reinforcement Learn-ing Approach for Energy-Aware Virtual Machine Consolidationin Cloud Data Centers’, 2017, https://www.researchgate.net/publication/323139482

Купить за 2000
Покупая готовую работу, Вы соглашаетесь с Публичной офертой сервиса "Курсар. Магазин готовых работ"

или напишите нам прямо сейчас:

Написать в MAX Написать в Telegram Написать в WhatsApp

Reinforcement Learning for Cloud Resource Management

Описание работы

Пожалуйста, заполните поля и нажмите кнопку "продолжить"

Найди свой город

Как мы рабоатем:

Узнайте стоимость Работы

мы приступили к оценке работы!

Подтвердите заказ

Сейчас Ваш заказ оценивается

76% заказов становятся дешевле после уточнения требований