VMWARE CAPACITY PLANNING: WE'RE CONVERTING 100 MILLION RECORDS A DAY INTO OPTIMAL CONFIGURATION OF VIRTUAL MACHINES
RAIFFEISENBANK CZ & ORBIT
recommendations per month
ㅤㅤㅤmiles. load recordsㅤㅤㅤ
Czech Raiffeisenbank needed to gain maximum control over the capacity planning of its VMware platform. On the one hand, there was a need to find savings over individual clusters and virtual machines, on the other hand there was a requirement to ensure high availability. How did we handle it and what did it bring to the customer?
“ORBIT’s capacity planning service solves two obligations for us – in addition to capacity planning and resource optimization, we also have to have a good current and historical overview of computing platforms.”
Jiří Koutník, Head of System Administration
Time for capacity planning
As mentioned, Raiffesenbank’s VMware farm configuration did not provide detailed reports that would provide the client with savings opportunities within clusters and virtual machines.
The bank also wanted to look for savings in thoughtful licensing of core software. Finally, it was concerned that the audit finding lacked capacity planning and regular reporting of available resources in relation to high availability.
Raiffeisenbank therefore decided to take advantage of ORBIT’s experience in optimizing and consolidating virtual platforms and our expert knowledge of the VMware platform. They could also count on our detailed knowledge of the bank’s internal systems, which we had gained in previous projects.
The first phase required several months of collecting load metrics and configurations from the VMware platform. At the same time we tuned the engine for configuration.
Graphical representation of one of the collected metrics as a statistical analysis of behaviour
It was only after three months that we saw the real long-term situation. This allowed us to start inferring workload trends at the hardware and virtual system level and generate initial recommendations.
Throughout the process, we were able to rely on the ORBIT vResControl tool. We developed it ourselves to help us determine optimal virtual machine configurations based on statistical resource load behavior.
Sample statistical evaluation of the burden of basic metrics
What to watch out for
When optimizing a virtual machine configuration, we consider one important thing: is the virtual machine involved in a cluster at the operating system level or application middleware level with another machine? And if so, is the wiring in active/passive mode, or active/active? How many nodes are possibly connected in the whole of such a cluster?
At stake is the unpleasant possibility that we evaluate passive machines ready to take over the load as having too many resources, reconfigure them to lower values, and destroy the cluster’s ability to achieve high availability. Therefore, information about the application clusters had to be collected and supplied to the vResControl tool.
Overall view of virtual farm resource utilization percentage under high total load
However, the biggest problem with a service like this is not the technical part and the collection of data from the platforms. The key is the logic of high availability at the virtual platform or application cluster level. Without its knowledge, technical data is useless. In a project like this, it pays to push the customer to continuously deliver quality metadata (e.g. from the configuration database) right from the start.
“With the bank’s ever-growing needs and ongoing acquisitions, a good prediction of the virtual platform load is key information for our decision-making.”
Jiří Koutník, Head of System Administration
Capacity planning in numbers
Raiffeisenbank is now able to optimally configure its 3,500 virtual machines based on 1,500 recommendations per month according to the actual load. And with a growing database of hundreds of millions of load records, it now anticipates and plans well in advance for the additional capacity needed.
If you were to store one load record every second in the database,
You would collect 100 million records continuously for 3 years, 2 months and 1.5 days.
It takes us 24 hours.
The deployment of the ongoing VMware platform capacity planning service was thus carried out to the required extent and within the pre-agreed deadline. We are therefore further extending the service to the customer with reporting for other platforms.