VCAP-CID Study Notes: Objective 3.3
October 10, 2014 Leave a comment
This is Objective 3.3 in the VCAP-CID blueprint Guide 2.8. The rest of the sections/objectives can be found here.
Bold items that have higher importance and copied text is in italic.
Knowledge
- Identify allocation model characteristics
- Explained in detail on Objective 2.4.
- Explain the impact of providing a dedicated cluster to a Provider VDC.
- Its better to use a dedicated cluster because then you don’t have to carve up the same DRS cluster into smaller resource pools, each with different allocation models which have different ways of guaranteeing resource to their workloads.
- This is explained in detail in this post from Frank Denneman:
Skills and Abilities
- Determine the impact of a chosen allocation model on cluster scalability.
- When using Reservation pools you can only scale to 32 ESXi hosts per cluster since they can only use the resources available in a single provider vDC.
- Here there are two options, either the organization vDC is elastic or it’s not.
- Elastic organization vDC allows you to scale to multiple clusters within the same provider vDC. This allows these to scale to multiple clusters.
- Non elastic organization vDCs only allow Allocation Pool workloads to consume resource from single cluster.
- These are elastic as they only work on per VM based reservation
- Reservation Pool
- Allocation Pool
- Pay-as-you-Go
- Give application performance requirements, determine an appropriate CPU and memory over-commitment configuration.
- CPU overcommitment ratios are mostly based on vCPU:pCPU ratios. 6:1 vCPU to pCPU is considered a maximum in most cases, but this is just a recommendation from VMware. CPU vary greatly in performance so make sure to have that in mind in calculating these ratios.
- The number of vCPU’s that can run on available physical cores determines the amount of VM’s an environment can run. This ratio is determined in the design phase as a technical requirement with design decision similar to this one: “The risk associated with an increase in CPU overcommitment is that mainly degraded overall performance that can result in higher than acceptable vCPU ready times.”
- This ratio can be used as a part of a performance service agreement to make sure certain tier-1 workloads will not have potential CPU contention.
- CPU Overcommitment
- Memory Overcommitment
- Memory overcommitment is based on configuring VM’s running on a host with more memory than it has to offer. This is commonly used in VMware vSphere environment as most workload do not use all their memory at the same time. vSphere ESXi has various features to help with overcommiting memory
- TPS – Transparent page sharing
- Memory ballooning
- Memory Compression
- Swapping to disk
- Many of the allocation models have reservation of memory build in, but unlike reservations of CPU, when memory is reserved it is not used by other VM’s.
- Reservation Pool creates a static memory pool where the workload fight for the resources, but certain workloads can be have individual reservations (and limits and shares)
- Allocation Pool create a resource pool with % of reserved memory for running VM’s. These VM’s then have that pool of reserved resource to fight over as well for resources configured for the pool.
- PAYG create a per VM reservation (and resource pool as well) and can use the rest of the resource available in the cluster.
- Given service level requirements, determine appropriate failover capacity.
- Failover capacity is most likely based on disaster recovery scenarios.
- SRM is not vCloud aware so storage replication is the only way to “protect” consumer workloads.
- The steps required could be automated using Storage system API in addition to using automation features in vSphere (PowerCLI, Orchestrator)
- Operationally, recovery point objectives support must be determined for consumer workloads and included in any consumer Service Level Agreements (SLAs). Along with the distance between the protected and recovery sites, this helps determine the type of storage replication to use for consumer workloads: synchronous or asynchronous.
- For more information about vCloud management cluster disaster recovery, see http://www.vmware.com/files/pdf/techpaper/vcloud-director-infrastructure-resiliency.pdf.
- Appendix C in the vCAT Architecting a VMware vCloud document is something that is worth reading.
- To determine the appropriate failover capacity you will need to have determined the SLA for DR service, and to which service offering it will map to. A Gold provider vDCD cluster might have that as a feature.
- Then you have the required capacity you will need to have on the recovery site as that will based on the resources used by the organizations using the Provider vDC Performance Tier that has DR features.
- Given a Provider VDC design, determine the appropriate vCloud compute requirements.
- A prodiver vDC design will include allocation models, with requirements for availability SLA’s and other related requirements (DR, DRS, HA)
- This information is used to determine the appropriate Host logical design, with number of CPU and cores, memory, HBA, NICs, local storage and additional information if needed (boot of USB etc.)
- A great example of the process is on pages 32-34 in the Private VMware vCloud Implementation Example document
- Given workload requirements, determine appropriate vCloud compute requirements for a vApp
- Compute requirements for a vApp involve deciding how many vCPUs will be used, how much memory is allocated and which allocation model would fit the application in question. A Tier-1 application should run on a Reservation Pool, Tier-2-3 on Allocation Pool and Dev/Test/Transient workloads on Pay-as-you-Go.
- Given capacity, workload and failover requirements, create a private vCloud compute design.
- Capacity requirements are 700 VM’s and 500 vApps.
- Workloads include Development, Pre-production, Demonstration, Training and Tier-2-3 IT Infrastructure applications and Tier-1 Database applications
- Failover requirements are to support failover of essential Tier-2-3 Workloads and all Tier-1 workloads.
- Create two tiers of Compute clusters, Gold and Silver.
- Gold will include 8 host with N+2 HA configuration at 26% resource reservation. DRS configured at Fully automated at Moderate Migration Threshold.
- Silver will include 8 host with N+1 HA configuration at 13% resource reservation. DRS configured at Fully automated at Moderate Migration Threshold.
- Storage will include 3 separte tiers, Tier-1, Tier-2 and Tier-3.
- Tier-1 is based on 10K SAS disks and SSD disks and Easy-tiering solution to move hot blocks to SSD automatically. Workloads are replicated to a second site for Disaster recovery.
- Tier-2 is based on 10K SAS disks.
- Tier-3 is based on 7.2 NL-SAS disks and 10K disks, to move hot blocks to 10K disks automatically.
- Allocations model used are
- Run Tier1-2-3 workloads
- Run the rest of the workloads with various amount of % reservations between workloads.
- Gold Compute cluster: Reservation Pool
- Silver Compute clsuter: Allocation Pool
- Let’s use an example:
- The Private vCloud compute design includes both Management and Resource clusters, but most Management cluster are similar, both for Private or Public Clouds so we will just design the Resource cluster (since the workloads will reside there)
- This is just an example how you would create a logical compute layout. I added both storage and consumer constructs but I think you really need to have all the required information to get the idea.
- This information was gathered from the Private VMware vCloud Implementation Example document. Even though based on vCloud 1.5 it’s a great document to use as a reference for future designs.
- Given capacity, workload and failover requirements, create a public vCloud compute design.
- The same applies to Public vCloud compute designs, but instead of Reservation or Allocation Pool you use Pay-as-you-Go (if the requirements are to charge each VM separately etc.)
- Capacity requirements in Public vCloud instances are really guess-work. You design around predicted capacity based on number of VM’s, their setup and size of their storage. As seen in this picture from the Public VMware vCloud Implementation Example document.
- Most public vCloud recommended use-cases are Test/Dev and other transient workloads. But this is really based on the requirement of the hosting company. Can be a mix of all the allocation models.
- Failover requirements of Public vClouds are most likely a part of a more expensive offering for higher tier workloads, but you can of course failover any kind of workloads, even the “no-so-important” ones. All workloads are someones production workloads 🙂
- For creating a design for a public vCloud I recommend reading the Public VMware vCloud Implementation Example document as a great reference.