Archive

Posts Tagged ‘capacity planning’

First Impressions of VMware CapacityIQ

October 25, 2010 Leave a comment

I’ve always wondered how good of a job am I doing with my virtualization project.  Yes, I know that I have saved my organization a few hundred thousand dollars by NOT having to purchase over 100 new servers.  But could I do better?  Am I sizing my hosts and guests correctly?  To answer that question, I downloaded an evaluation copy of VMware’s CapacityIQ and have been running it for a bit over a week now.

My overall impression is that CapacityIQ needs some work.  Visually, the product is fine.  The product is also easy to use.  I’m just a bit dubious of the results though.

Before I get into the details, here are some details about my virtual environment.

  • Hypervisor is vSphere 4.0 build 261974.
  • CapacityIQ version is CIQ-ovf-1.0.4.1091-276824
  • Hosts are Cisco B250-M2 blades with 96GB RAM,  dual Xeon X5670 CPU, and Palo

 

So what results do I see after one week’s run?  All my virtual servers are oversized.   It’s not that I don’t believe it; it’s just that I don’t believe it.

I read, and then re-read the documentation and noticed that using a 24hr time setting was not considered a best practice since all the evening idle time would be factored into the sizing calculations.  So I adjusted the time calculations to be based on a 6am – 6pm Mon-Thurs schedule, which are our core business hours.  All other settings were left at the defaults.

The first thing I noticed is that by doing this, I miss all peak usage events that occur at night for those individual servers that happen to be busy at night.  The “time” setting is a global setting so it can’t set it on a per-vm basis.  Minus 1 point for this limitation.

The second item I noticed between reading the documentation, a few whitepapers, and posts on the VMware Communities forums is that CapacityIQ does not take peak usage into account (I’ll come back to this later).  The basic formula for sizing calculations is fairly simple.  No calculus used here.

The third thing I noticed is that the tool isn’t application aware.  It’s telling me that my Exchange mailbox cluster servers are way over provisioned when I am pretty sure this isn’t the case.  We sized our Exchange mailbox cluster servers by running multiple stress tests and fiddling with various configuration values to get to something that was stable.  If I lower any of the settings (RAM and/or vCPU), I see failover events, customers can’t access email, and other chaos ensues.   CapacityIQ is telling me that I can get by with 1 vCPU and 4GB of RAM for a server hosting a bit over 4500 mailboxes.  That’s a fair-sized reduction from my current setting of 4 vCPU and 20GB of RAM.

It’s not that CapacityIQ is completely wrong in regards to my Exchange servers.  It’s just that the app occasionally wants all that memory and CPU and if it doesn’t get it and has to swap, the nastiness begins.  This is where application awareness  comes in handy.

Let’s get back to peak usage.  What is the overreaching, ultimate litmus test of proper vm sizing?  In my book, the correct answer is “happy customers”.  If my customers are complaining, then something is not right.   Right or wrong, the biggest success factor for any virtualization initiative is customer satisfaction.  The metric used to determine customer satisfaction may change from organization to organization.  For some it may be dollars saved.  For my org, it’s a combination of dollars saved and customer experience.

Based on the whole customer experience imperative, I cannot noticeably degrade performance or I’ll end up with business units buying discrete servers again.  If peak usage is not taken into account, then it’s fairly obvious that CapacityIQ will recommend smaller than acceptable virtual server configurations.  It’s one thing to take an extra 5 seconds to run a report, quite another to add over an hour or two, yet based on what I am seeing, that is exactly what CapacityIQ is telling me to do.

I realize that this is a new area for VMware so time will be needed for the product to mature.  In the meantime, I plan on taking a look at Hyper9.  I hear the sizing algorithms it uses are a bit more sophisticated so I may get more realistic results.

Anyone else have experience with CapacityIQ ?  Let me know.  Am I off in what I am seeing?  I’ll tweak some of the threshold variables to see what affects they have on the results I am seeing.  Maybe the defaults are just impractical.