In-memory (transactional) data stores, also referred to as data grids, are recognized as a first-class data management technology for cloud platforms, thanks to their ability to match the elasticity requirements imposed by the pay-as-you-go cost model. On the other hand, determining how performance and reliability/availability of these systems vary as a function of configuration parameters, such as the amount of cache servers to be deployed, and the degree of in-memory replication of slices of data, is far from being a trivial task. Yet, it is an essential aspect of the provisioning process of cloud platforms, given that it has an impact on the amount of cloud resources that are planned for usage. To cope with the issue of predicting/analysing the behavior of different configurations of cloud in-memory data stores, in this article we present a flexible simulation framework offering skeleton simulation models that can be easily specialized in order to capture the dynamics of diverse data grid systems, such as those related to the specific (distributed) protocol used to provide data consistency and/or transactional guarantees. Besides its flexibility, another peculiar aspect of the framework lies in that it integrates simulation and machine-learning (black-box) techniques, the latter being used to capture the dynamics of the data-exchange layer (e.g. the message passing layer) across the cache servers. This is a relevant aspect when considering that the actual data-transport/networking infrastructure on top of which the data grid is deployed might be unknown, hence being not feasible to be modeled via white-box (namely purely simulative) approaches. We also provide an extended experimental study aimed at validating instances of simulation models supported by our framework against execution dynamics of real data grid systems deployed on top of either private or public cloud infrastructures. Particularly, our validation test-bed has been based on an industrial-grade open-source data grid, namely Infinispan by JBoss/Red-Hat, and a de-facto standard benchmark for NoSQL platforms, namely YCSB by Yahoo. The validation study has been conducted by relying on both public and private cloud systems, scaling the underlying infrastructure up to 100 (resp. 140) Virtual Machines for the public (resp. private) cloud case. Further, we provide some experimental data related to a scenario where our framework is used for on-line capacity planning and reconfiguration of the data grid system.

A flexible framework for accurate simulation of cloud in-memory data stores

DI SANZO, PIERANGELO;
2015

Abstract

In-memory (transactional) data stores, also referred to as data grids, are recognized as a first-class data management technology for cloud platforms, thanks to their ability to match the elasticity requirements imposed by the pay-as-you-go cost model. On the other hand, determining how performance and reliability/availability of these systems vary as a function of configuration parameters, such as the amount of cache servers to be deployed, and the degree of in-memory replication of slices of data, is far from being a trivial task. Yet, it is an essential aspect of the provisioning process of cloud platforms, given that it has an impact on the amount of cloud resources that are planned for usage. To cope with the issue of predicting/analysing the behavior of different configurations of cloud in-memory data stores, in this article we present a flexible simulation framework offering skeleton simulation models that can be easily specialized in order to capture the dynamics of diverse data grid systems, such as those related to the specific (distributed) protocol used to provide data consistency and/or transactional guarantees. Besides its flexibility, another peculiar aspect of the framework lies in that it integrates simulation and machine-learning (black-box) techniques, the latter being used to capture the dynamics of the data-exchange layer (e.g. the message passing layer) across the cache servers. This is a relevant aspect when considering that the actual data-transport/networking infrastructure on top of which the data grid is deployed might be unknown, hence being not feasible to be modeled via white-box (namely purely simulative) approaches. We also provide an extended experimental study aimed at validating instances of simulation models supported by our framework against execution dynamics of real data grid systems deployed on top of either private or public cloud infrastructures. Particularly, our validation test-bed has been based on an industrial-grade open-source data grid, namely Infinispan by JBoss/Red-Hat, and a de-facto standard benchmark for NoSQL platforms, namely YCSB by Yahoo. The validation study has been conducted by relying on both public and private cloud systems, scaling the underlying infrastructure up to 100 (resp. 140) Virtual Machines for the public (resp. private) cloud case. Further, we provide some experimental data related to a scenario where our framework is used for on-line capacity planning and reconfiguration of the data grid system.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11697/160348
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 8
social impact