Before reading this post, please read the following two posts first:
So far I’ve talked a bit about methods for determining when to replace equipment, what equipment is to be used as the replacements, and what factors may go into the overall decision. I also mentioned that this post would cover some of our strategic initiatives and how they factored into the overall product choice. I lied. I missed an important piece of history in the background.
Let’s recap: I was prepared to order traditional rack-mounts in June 2009. Our management team asked for a capacity analysis. This process of getting this analysis took so long that we decided to look at blades. Since we are an HP shop, we definitely had HP on our short list. However, since we were taking the opportunity to rethink our architecture we decided to step out a bit and look at other unique products that existed in the blade market. The only real requirement is that we had to have a mindset that the ‘unique’ vendor was viable from our perspective. Who came to mind? Cisco.
Cisco? Yes, Cisco. When we started looking at blades, Cisco had been shipping their UCS product for a few months already. Press was good, reviews were good, etc. Not only were we seeing positive news, Cisco offered a very unique architecture. Look at all the differences between UCS and a traditional blade system. I am not going to list them here because it’s already been listed a number of times out there in the blogosphere. Go check out Scott Lowe’s blog or the By the Bell blog. Both have excellent articles on UCS characteristics.
Moving along…We couldn’t just go and say, “Let’s buy UCS.” I don’t work that way. I am very happy with HP rack-mount servers and would not hesitate to recommend them to anyone if asked. If I am going to choose a different vendor, I need to have good reasons that I can objectively point to.
Thus began the epic saga that culminated in many months of research into HP and Cisco blade offerings. I can’t say it was enjoyable. Part of the problem stems from the vendor HP brought in. The sales rep, whom represents all the tier 1 companies, didn’t/doesn’t believe in UCS. Every time we asked for something, we got massive amounts of FUD in response. Now to be fair to HP, the sales rep knows someone in our upper management. I am speculating this sales rep approached management about performing a capacity analysis and that since we already use HP equipment, they brought in HP to work with them.
So a forward we go and develop a list of criteria which is as objectionable as we can get it. Items on the list: complexity, number of cables, number of management point, RAM capacity, etc. Some were just technical check-box items, others related to our strategic initiatives. When all was said and done, a few of the criteria really stood out. Two were complexity and the other was support for our strategic initiatives. I don’t mean to bag on HP, but their blade system is complex. We went back and forth with the vendor on developing a workable configuration for quite some time. It wasn’t the big items that tripped us up, but rather the little things. Unfortunately, it was these little things that would make or break the entire solution. I am guessing that a lot of the complexity in developing a configuration is the sheer breadth of accessories that HP offers. Which switches in the chassis are needed, which HBA, which this, which that…
The more we looked at Cisco, the more we liked their solution. Imagine being able to have 20/20 hindsight when developing a new product. That’s what Cisco had in this case. Cisco was able to look at all the other blade systems out there, see what was good and bad about them, and design a new solution. Think of all the bad the comes with most blade systems. I mentioned in a previous post that cable management was a pain point for us. Well, you can’t get much cleaner than UCS. How about complexity? I am not saying Cisco is perfect, but their solution is pretty easy to work with. Some of it has to do with the fact that there is no legacy equipment for them to be compatible with. Some of it has to do with the fact that UCS is managed at the Fabric Interconnect vs the chassis level.
Seems like a done deal then, doesn’t it? Cisco has a solution that meets our needs better than HP. Simple. Not really. Management wanted us to consider other vendors, notably IBM. Why IBM. They support multiple processor families (Intel, PowerPC, Sparc), have a good track record, and have a fair amount of market share. So in come the IBM folks to discuss their offerings. Personally, I wasn’t impressed. While there was some interesting technology there, it just seemed ‘old’. Judging by some other blog posts I have read, IBM agrees and will be coming out with some new offering over the next few months…
Are we there yet, are we there yet, are we there yet? Nope. HP/vendor had one more trick up their sleeve. They managed to get some info on our criteria and then stated that they proposed the wrong product. Instead of just a blade system, they felt that they should have proposed the BladeSystem Matrix. Well if they weren’t complex before, they sure were then. We went through a demo of the Matrix software and all I can say is the complexity score shot through the roof (in a bad way). I don’t think bolting on software to SIM was the right way to go. Even then, it was obvious that some components were not tightly integrated and were just being launched by SIM. However, some of the new functionality did support our strategic initiatives more so than just the plain blade system as originally proposed.
In the end, we chose Cisco. It is a done deal? No. There is still some jockeying going on. All I can say is that Cisco has stepped up to the plate and has taken on the challenge of proving to us that they offer the best solution to meet our needs.
And for those strategic initiatives…next post, maybe :)
Please read http://itvirtuality.wordpress.com/2010/04/01/the-hardware-refresh-cycle/ first.
So how does one go about choosing a blade system? I don’t have an answer for you. I can tell you what we did, but it may not be the proper methodology for your organization.
When all is said and done, most blade systems appear nearly identical. After all, what is a blade server? It’s a rack mount turned vertical. Yes, there is more to them but that is the essence of it. If you are 100% satisfied with your incumbent provider and they meet your needs, then stick with them. Or you can do what we did and take the opportunity to envision what your ideal data center environment would look like and try to find the provider that comes closest to that vision.
Now I am not going to pull one over on you and say that we knew what our ideal data center would look like from day one. We didn’t. Our vision evolved as we reviewed the various offering from the tier 1 vendors. Our vision evolved as we learned the strategic plans of our various business units. Our vision evolved as …. Yes, our vision is ever changing. My team and I will never 100% know where our organization is going since we work in an ever changing world. So what did we do? We courted multiple providers.
Yes, you read correctly. We came up with a basic configuration based on the Capacity Planner analysis (and other factors) and then asked multiple vendors to provide a Bill of Materials (BOM). Then we sent the BOMs to various resellers for pricing. The main reason we did was to have all our paperwork ready for our purchasing department to take to the City Council for approval once we made a decision on product. Getting all our ducks in a row takes time so even though we didn’t have a final choice of product, we could at least keep the process moving along. Besides, getting the BOM and pricing helped us develop a five-year cost model. Unless you are funded extremely well, price has to play a factor in the decision-making. The best product in the world is not going to find a spot in my data center if it is ten times more expensive than everyone else and does not make me ten time more productive, reduce other costs by 10x, etc..
While all this was taking place, we finally reached a point where we could articulate our data center vision and defend it. It’s one thing to say we want to reduce power consumption, reduce our data center footprint, blah, blah, blah. Everyone says that and all providers can meet those requirements. These bullet points were not going to help us decide on a product. Besides addressing the various strategic initiatives, we needed to address what causes my team the most pain: cabling and ease of hardware management.
Just for giggles, go do a Google image search on the phrase “a network cable is unplugged”. While our data center was nowhere near as bad as that one, we do have some cable management nightmares. When a rack is put in, everything is nice and neat. In a dynamic data center, the cabling becomes a nightmare one cable at a time. If I had to come up with a movie title for it it would probably be: “Adds, moves, and changes: The bane of the data center.”
Ease of hardware management was our second greatest pain. We currently use HP servers so our primary server management tool is Systems Insight Manager (SIM). SIM isn’t bad, but it isn’t great either. It’s offers a fair amount of functionality for the price (free with server purchase). However, it has some niggling quirks which drive us crazy. For starters, it uses reverse DNS lookups to determine system name. What happens if a server has fourteen aliases? Thirteen are marked as down. Instead of querying the host’s primary name, it picks up whatever DNS spits out at the time of the reverse lookup. Sort of makes alerting/alarming harder than it has to be.
Of course, all this assumes it can discover every server. We’ve had times when a server had to manually entered into SIM and still couldn’t be seen.
The final issue we have with SIM is its interface. It’s just not as friendly as it could be. To give you an idea what I am talking about… There are some blogs out there that seem to think that the HP BladeSystem Matrix management console can only manage 250 servers. The real answer is that it can manage over 1300 servers. The 250 number comes from the HP system engineers due to SIM’s interface. SIM just doesn’t visually handle large numbers of objects very well.
That’s it for this entry. My next post will cover some strategic initiatives and how they factored into our product choice.