Archive

Archive for the ‘datacenter’ Category

Book Review: Administering VMware Site Recovery Manager 5.0 by Mike Laverick

February 23, 2012 Leave a comment

I’ve decided that I am going to review at least one book per quarter.  Four book reviews per year are not that much when you consider that I probably read four books (various subjects) per month.

So for my first book review of 2012, I am going to start with Administering VMware Site Recovery Manager 5.0 by Mike Laverick.  Many of you already know Mike (the guy is everywhere) or are at least familiar with his blog (RTFM-ed).

Let me start out by saying that I did not like this book after first reading it.  I felt that it was missing something; something that I couldn’t quite put my finger on.  Then it hit me.  Concepts are not necessarily discussed as concepts.  Yes, there are one/two page discussions on concepts, but most often they are discussed as working knowledge.  This should not have been a surprise to me because Mike clearly states (multiple times) that he expects his readers to have read the various manuals for detailed concept and background info on vSphere, Site Recovery Manager (SRM), your storage array, etc. He can’t teach everything needed to get SRM working so you have do some work on your own.   In other words, RTFM.  Once I came to grips with this, I re-evaluated the book in a new light and have decided that I like it.

As for the book itself, it has an interesting layout.  You get a little bit of history concerning vSphere’s DR and HA features and what SRM is, and is not.  Then comes a little detour into setting up a number of different storage arrays from Dell, EMC, HP, and NetApp.    This detour does serve a purpose in that it sets a baseline storage configuration for installing and configuring SRM, albeit in the most simple configuration possible.  It’s actually a smart move on his part because he is able to show how he setup his lab.  It also prompts the reader to go check various items in order to ensure a successful install of SRM.

Then you get to the good stuff: installing, configuring, and using SRM. There are plenty of screenshots and step-by-step instructions for doing a lot of the configuration tasks.  In fact, you could think of this book along the lines of a cookbook.   Follow along and you should end up with a usable (in a lab) install of SRM.

It’s clear that after reading this book, Mike knows SRM.  Peppered throughout the chapters are the various problems and errors he encountered as well as what he did to fix them.  In a few cases, he does a mea culpa for not following his own advice of RTFM.  If he had, a few problems would have been avoided.

Mike also hits home on a few simple truths.  For those involved with Active Directory in the early days, there was a truth that went something like this: “The question is irrelevant, the answer is DNS”.  In the case of SRM, substitute “Network and storage configuration” for “DNS”.  So many problems that may be encountered are the result of a network or storage configuration issue.  vSwitches need to be setup correctly, hosts need to see storage, vCenter needs to see hosts, etc.

I especially liked the bits of wisdom that he shares in regards to doing what I call “rookie maneuvers” (others call them stupid mistakes).  For example, once you have SRM up and running, it’s too easy to hit the button without realizing what it all really entails.  Mike warns you about this many times and prompts you to think about your actions ahead of time.

The later chapters of the book introduce customizations, scripting, more complex configurations, and how to reverse a failover.  There is a lot going on here and worth re-reading a few times.  A surprising amount of this info can be applied to basic disaster recovery principals regardless of whether or not SRM is in the picture.

Lastly, Mike walks you through upgrading from vSphere 4.1 to vSphere 5 and from SRM 4.1 to SRM 5.  Upgrading vSphere may sound a bit odd, but not when you take into account that it’s required in order to upgrade SRM.

All-in-all, this book is a worthy read and should be in your library if your shop uses (or plans to use) SRM.

Why I Am Going to VMworld

August 25, 2011 Leave a comment

People go to VMworld for many reasons.  Some go because it’s their job to ”man the booth”.  Others go to party.  And still others go “just because”.  However, the most common reason why people go to VMworld is to learn about VMware products and its ecosystem.  If I were still in the position of IT Architect, that would have been my primary reason too.   This year is different.   I changed jobs at the beginning of 2011 and went from an IT position that held responsibility for the care and feeding of the virtual infrastructure platform to a Product Management position.   As such, my VMworld focus has changed from learning about VMware products to learning about VMware’s customers.

 

One of the basic tenets of Product Management/Development is to build products that customers want/need to buy.  So how does one go about finding out what customers want and/or need?  Simple.  Ask them.  I’ll be roaming the Solutions Exchange talking to attendees about their jobs, roadmaps, challenges, and desires (within the context of the datacenter).  I want to gather as much information as I can to help me excel in my new”ish” position.  I want to collect contact info so that I can reach out to folks later and see how things change as time passes.  I want to know if your efforts are successful or not.  Basically, I want to “know” and “learn” about you.

 

So if you happen to see me, introduce yourself.  Tell me about your company, your datacenter challenges, and more.  Help me develop a better product.

 

If you can’t find me, send a me a tweet  –  @ITVirtuality   – and let’s schedule a time to meet.

.

What about Tintri?

August 2, 2011 3 comments

I attended the Phoenix VMUG meeting this week.  The two main sessions were about vSphere5 and Tintri’s VMstore.  While vSphere5 is interesting, I have been working with it for over 5 months now so it wasn’t a “must see” presentation for me.  I was actually at the event to see Tintri and I have to say that the Tintri VMstore product intrigues me quite a bit.  For those who haven’t heard of this product, think of it as a purpose built storage appliance for your VMware environment.   This “appliance” is roughly 8.5TB (usable) and is only accessed via NFS.  The entire device presents itself as one large datastore to your hosts.  If you think about it, this really does simplify things quite a bit.  There is no zoning, no LUN creation, no disk grouping, etc.  Basically, all of your standard storage creation tasks have been removed.  Time to add capacity? Just add another appliance and add it to your vCenter as another datastore.  It’s that simple.

Management of the appliance is performed through a web interface and via a vCenter Plug-in.  The bulk of what you expect in a management tool is there with a few notable exceptions (discussed later in this post).

One of the VMstore design goals is performance.  To that end, Tintri has equipped the VMstore with 1TB of SSD storage.  Through their own internally developed magic, the bulk of “hot” data is kept in SSD.  The rest is stored on SATA disks.  You can imagine the kind of IOPS possible given the heavy use of SSD.  BTW, the SSD is deduped so you get more bang for your buck.

The folks at Tintri gave the standard “who we are” and “why we are different” presentation that we all expect at open events like this.  After talking about the product and walking us through the mgmt. interface the Tintri folks took questions from the audience.  All-in-all, a good showing.

There were no hard questions asked at the VMUG, but the after meeting was completely different.  I am also a member of Advanced Technology Networking Group (ATNG) and we met up with the Tintri folks a few hours later.  ATNG consist of hardcore techies and since many of our members are responsible for acquisitions and managing data centers, our meeting with vendors tend to be “No holds barred”, but in a friendly way.  Our goal is to get to know the product (warts and all) as much as we can during our meetings.

We questioned a lot of design choices and where the product is going.  One are of particular interest to me was the use of SATA drives.  Yes, the appliance uses RAID6 and has hot spares, but that did not alleviate my concern.  Drive quality continues to improve so only time will tell if this was a good design choice or not.

Another area questioned was the use of a single controller.  The general rule of enterprise storage is to have two controllers.  VMstore currently has one.  Notice that I say “currently”.  Future product will have two controllers.

There were a few questions and suggestions regarding the management interface.  One suggestion was to rename the VMStore snapshot function.  It is not the same snapshot feature as in vCenter.  vCenter has no visibility into VMstore native snapshots and vice-versa.  This can be a source of confusion if you consider that the target audience for this product is VM admin.

The lack of some enterprise features also came up in our discussions.  Notably, the lack of SNMP support and the lack of replication support.  The only way to get notified of something going wrong with the appliance is to either receive an email alert or see something in vCenter.    As for replication, the only option available is to perform a standard vm backup and restore the data to another appliance or storage device of your choice.

However, all is not doom and gloom.  Tintri is working on updates and improvements.  SNMP support, replication capabilities, and more are coming soon.   Keep in mind that Tintri recently came out of stealth mode and is on 1.0 of their product.   For a 1.0 product, it’s pretty good.  Just to give an idea of the performance and quality of VMstore, Tintri has a reference customer that will attest that they have been running a beta version since November 2010 without any issues.  In fact, that customer is still on the beta code and has not upgraded.  That’s a pretty good reference if you ask me.

So what do I think of VMStore?  I think Tintri is on the right track.  Purpose built storage for VMware is a great concept.  It shows a laser like focus on a particular market and it lets the company focus on capabilities and features that are specific to that market.  Generic storage has to cater to many masters and sometimes gets lost in the process.

I am going to predict that Tintri will either be copied by other storage vendors or that they will be acquired by one of them.  The product/concept is just too unique and spot-on that it can’t be ignored.

Links of interest:

Does the Storage Tecchnology Really Matter?

November 15, 2010 4 comments

This article is really more of a rant.  Take it for what it is.  I’m just a frustrated infrastructure admin trying to choose a storage product.

I am not a storage admin (and I don’t play one on TV), but I am on our storage replacement project representing the server infrastructure area.  In preparation for this project, I started reading a number of the more popular blogs and following some of the storage Tweeters.  One thing I noticed is that all the banter seems to be about speeds and feeds as opposed to solving business problems.  In the case of Twitter, I am guessing it’s due to the 140 character limit, but I would expect to see more in the blogs.  Some of the back & forth reminds me of the old elementary school bravado along the lines of “My dad can beat up your dad”.

I must admit that I am learning a lot, but does it really matter if it supports iSCSI, FC, FCoE, NFS, or other acronyms?  As long as it fits into my existing infrastructure with minimal disruption and provides me options for growth (capacity and features), should I care?   If so, why should I care? We recently moved the bulk of our data center to Cisco UCS so you would think that FCoE would be a highly valued feature of our new solution.  But it’s not.  We don’t run Cisco networking gear and our current gear provider has no short term plans for FCoE.  Given that we just finished a network upgrade project, I don’t forsee FCoE in our environment for at least three years unless funding magically appears.  It doesn’t mean that it isn’t on our radar; it just means that it won’t be here for at least three years.  So stop trying to sell me on FCoE support.

So who has the better solution?  I am going to use EMC and NetApp in my example just because they blog/tweet a lot.

I think if one were to put a chart together, both EMC and NetApp could be at the heading of any column.  Their products look the same to me.  Both have replication software, both support snapshots, both support multiple protocols, and so on and so on and so on.  The features list is pages long and each vendor seems to match the other.

There are technical differences in how these features are implemented and in how the back-end arrays work, but should I care?  Tell me how these features will help my business grow.  Tell me how these features will protect my business.  Tell me how these features will save my business money. Tell me how they can integrate into my existing infrastructure without having to change my infrastructure.  And when I say “tell me”, don’t just say “it can do this”, or “it can do that”.  Give me case studies more than six pages long, give me processes and procedures, and give me REAL metrics that I can replicate/validate (assuming I had the equipment and time) in a real-world scenario which information telling me how they affect my apps and customers.

This is an area where companies need to do a better job of marketing.  EMC started down this path with the vBlock.  Techies aren’t really interested because the specs are blasé.  C-level folks love it because it marketed towards them and the marketing focuses on the solution from a business perspective.   NetApp is starting to do the same with their recently announced FlexPod.  The main downside to these new initiatives is that they seem to forget about the SMB.  I think it’s great from a techie POV that a FlexPod can handle 50,000 VDI sessions.  But as an IT Architect for my organization, so what?  We only have 4200 employees or so.

Right now, I’m sort of in-between in what type of information I need: technical vs business.  I am technical at heart, but have been looking at things from a business perspective the last few yrs.  I am in the process of trying to map what our mgmt team wants to accomplish over the next few years to the storage feature sets out there in the market.  This is where both types come together.  Now if I can just get past the FUD.

Week Two of Cisco UCS Implementaion Completed

Progress has been made!!

The first few days of the week involved a number of calls back to TAC, the UCS business unit, and various other Cisco resources without much progress.  Then on Thursday I pressed the magic button and all the sudden our fabric interconnects came alive in Fabric Manager (MDS control software).  What did I do? I turned on SNMP.  No one noticed that it was turned off (default state).    Pretty sad actually given the number of people involved in troubleshooting this.

This paragraph subject to change based on confirmation of accuracy from Cisco. So here’s the basic gist of what was going on.  We are running an older version of MDS firmware and the version of Fabric Manager that comes with this firmware is not really “UCS aware”.  It needs a method of communicating with the fabric interconnects to fully see all the WWNs.  The workaround is to use SNMP.   I created an SNMP user in UCS and our storage admin created the same username/password in Fabric Manager.  Of course having the accounts created does nothing if the protocol they need to use is not active.  Duh.

The screenshot below shows the button I am talking about.  The reason no one noticed that SNMP was turned off was because I was able to add traps and users without any warnings about SNMP not being active.  Also, take a look at the HTTP and HTTPS services listed above SNMP.  They are enabled by default.  Easy to miss.

.

.

With storage now presented, we were able to complete some basic testing.  I must say that UCS is pretty resilient if you have cabled all your equipment wisely.  We pulled power plugs, fibre to Ethernet, fibre to storage,  etc and only a few did times did we lose a ping (singular PING!).   All our data transfers kept transferring, pings kept pinging, RDP sessions stayed RDP’ing.

We did learn something interesting in regards to the Palo card and VMware.  If you are using the basic Menlo card (standard CNA), then failover works as expected.  Palo is different.  Suffice it to say that for every vNIC you think you need, add another one.  In other words, you will need two vNICS per vSwitch. When creating vNICs, be sure to balance them across both fabrics and note down the MAC addresses.  Then when you are creating your vSwitches (or DVS) in VMware, apply two vNICs to each switch using one from fabric A and one from fabric B.  This provides the failover capabilities.    I can’t provide all the details because I don’t know them, but it was explained to me by one of the UCS developers that this is a difference in UCS hardware (Menlo vs Palo).

Next up: testing, testing, and more testing with some VLANing thrown in to help us connect up to two disjointed L2 networks.

.

Week One of Cisco UCS Implementation Complete

July 5, 2010 2 comments

The first week of Cisco UCS implementation has passed.  I wish I could say we were 100% successful, but I can’t.  We’ve encountered two sticking points which are requiring some rethinking on our part.

The first problem we have run into revolves around our SAN.  The firmware on our MDS switches is a bit out of date and we’ve encountered a display bug in the graphical SAN management tool (Fabric Manager).  This display bug won’t show our UCS components as “zoneable” addresses.  This means that all SAN configurations relating to UCS have to be done via command line.   Why don’t we update our SAN switch firmware?  That would also entail updating the firmware on our storage arrays and it is not something we are prepared to do right now.  It might end up occurring sooner rather than later if doing everything via command line is too cumbersome.

The second problem involves connecting to two separate L2 networks.  This has been discussed on various blogs such as BradHedlund.com and the Unified Computing Blog.  Suffice it to say that we have proven that UCS was not designed to directly connect to two different L2 networks at the same time.  While there is a forthcoming firmware update that will address this, it does not help us now.  I should clarify that this is not a bug and that UCS is working as designed.  I am going to guess that either Cisco engineers did not think that customers would want to connect in to two L2 networks or that it was just a future roadmap feature.  Either way, we are working on methods to get around the problem.

For those who didn’t click the links to the other blogs, here’s a short synopsis:  UCS basically treats all uplink ports equally.  It doesn’t know about the different networks so it will assume any VLAN can be on any uplink port.  ARPs, broadcasts, other terms and how they all work apply here.  If you want a better description, please go click the links in the previous paragraph.

But the entire week was not wasted and we managed to accomplish quite a bit.  Once we get passed the two hurdles mentioned above, we should be able to commence our testing.  It’s actually quite a bit of work to get this far.  Here’s how it pans out:

  1. Completed setup of policies
  2. Completed setup of Service Profile Templates
  3. Successfully deployed a number of different server types based on Service Profiles and Server Pool Policy Qualifications
  4. Configured our VM infrastructure to support Palo
  5. Configure UCS to support our VM infrastructure
  6. Successfully integrated UCS into our Windows Deployment system

Just getting past numbers 1 and 2 was a feat.  There are a number of policies that you can set so it is very easy to go overboard and create/modify way too many.   The more you create, the more you have to manage and we are trying to follow the K.I.S.S principle as much as possible.   We started out by having too many policies, but eventually came to our senses and whittled the number down.

One odd item to note: when creating vNIC templates, a corresponding port profile is created under the VM tab of UCS Manager.  Deleting vNIC templates does not delete the corresponding port profiles so you will have to manually delete them.  Consistency would be nice here.

And finally, now that we have a complete rack of UCS I can show you the just how “clean” the system looks.

Before

The cabling on a typical rack

After

A full rack of UCS - notice the clean cabling

.

Let’s hope week number two gets us into testing mode…..

.