In-memory transactional data grids have revealed extremely suited for cloud based environments, given that they well fit elasticity requirements imposed by the pay-as-you-go cost model. Particularly, the non-reliance on stable storage devices simplifies dynamic resize of these platforms, which typically only involves setting up (or shutting down) some data-cache instance. On the other hand, defining the well suited amount of cache servers to be deployed, and the degree of replication of slices of data, in order to optimize reliability/availability and performance tradeoffs, is far from being a trivial task. As a example, scaling up/down the size of the underlying infrastructure might give rise to scarcely predictable secondary effects on the side of the synchronization protocol adopted to guarantee data consistency while supporting transactional accesses. In this paper we investigate on the usage of machine learning approaches with the aim at providing a means for automatically tuning the data grid configuration, which is achieved via dynamic selection of both the well suited amount of cache servers, and the well suited degree of replication of the data-objects. The final target is to determine configurations that are able to guarantee specific throughput or latency values (such as those established by some SLA), under some specific workload profile/intensity, while minimizing at the same time the cost for the cloud infrastructure. Our proposal has been integrated within an operating environment relying on the well known Infinispan data grid, namely a mainstream open source product by the Red Had JBoss division. Some experimental data are also provided supporting the effectiveness of our proposal, which have been achieved by deploying the data platform on top of Amazon EC2. © 2012 IEEE.
Auto-tuning of Cloud-based In-memory Transactional Data Grids via Machine Learning
DI SANZO, PIERANGELO;
2012-01-01
Abstract
In-memory transactional data grids have revealed extremely suited for cloud based environments, given that they well fit elasticity requirements imposed by the pay-as-you-go cost model. Particularly, the non-reliance on stable storage devices simplifies dynamic resize of these platforms, which typically only involves setting up (or shutting down) some data-cache instance. On the other hand, defining the well suited amount of cache servers to be deployed, and the degree of replication of slices of data, in order to optimize reliability/availability and performance tradeoffs, is far from being a trivial task. As a example, scaling up/down the size of the underlying infrastructure might give rise to scarcely predictable secondary effects on the side of the synchronization protocol adopted to guarantee data consistency while supporting transactional accesses. In this paper we investigate on the usage of machine learning approaches with the aim at providing a means for automatically tuning the data grid configuration, which is achieved via dynamic selection of both the well suited amount of cache servers, and the well suited degree of replication of the data-objects. The final target is to determine configurations that are able to guarantee specific throughput or latency values (such as those established by some SLA), under some specific workload profile/intensity, while minimizing at the same time the cost for the cloud infrastructure. Our proposal has been integrated within an operating environment relying on the well known Infinispan data grid, namely a mainstream open source product by the Red Had JBoss division. Some experimental data are also provided supporting the effectiveness of our proposal, which have been achieved by deploying the data platform on top of Amazon EC2. © 2012 IEEE.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.