Archive

Posts Tagged ‘Strategy’

Book Review: Enterprise Network Testing

May 6, 2011 1 comment

For my next trick, I am going to review another Cisco Press book titled “Enterprise Network Testing”.  I think I can sum this up in two sentences:  “Holy Crap!” and  “This book has PLENTY of cowbell!”.

Now I am not currently a network guy by profession, but if I was, this book would be on my desk with copies on my teammates’ desks too.  It is literally THE blueprint for how to test your network.

The journey into network testing begins with a discussion on why you need to test your network.  Most people only think of one or two reasons.  This book provides a few more to help you make your business case.  BTW, the authors make it very clear that testing your network is not a one-time event.  Testing should be done whenever changes are made, for compliance, introduction of new technologies, etc.  In other words, plan on testing regularly.

One area where this book and I completely agree is where testing should first take place: in the lab.  There is whole chapter devoted to lab strategy.  Topics covered include staffing, facilities planning, test methodologies, power, and more.  I must say that I was surprised at how good this chapter turned out to be.  Most books give basic guidance on lab setup, but like I said at the beginning of this review, this book has plenty of cowbell.

So now you have your lab setup, what are you going to do?  Simple, read this book because it provides guidance for “crafting the test approach” (actual chapter title).  Briefly, this chapter discusses several reasons/objectives for testing and how to craft your strategies to set you up for success.  This includes setting your test targets, what tools are you going to use, writing a test plan, allocating resources, etc.  It’s a very well thought out approach.

Business case approved? Check.  Lab resources allocated? Check.  Test plan created? Check.  Great, now go execute your plan.  Need help?  No problem, this book will walk you through a sample lab setup, finding the appropriate tools, and a few different methodologies for measuring different network characteristics.  This is the point in the book where the authors stress the need to understand what you are testing, the tools you are using, and how to interpret the results.  In other words, if you don’t know what you are doing you will not be successful.

Speaking of knowing your tools, this book does a credible job discussing network toolsets that are available for free and for purchase.  Even non-Cisco products are covered which is something I am not used to seeing in a Cisco Press book.  Usually, these books are oblivious to other companies’ products.  Kudos to the authors for being thorough.

The next six chapters are where you will find plenty of test case examples.  There are individual chapters devoted to six types of testing.  They are: Proof of concept testing, network readiness testing, design verification testing, migration plan testing, new platform and code certification testing, and network ready for use testing.  They are written in a case study format and are quite readable.

Nerdgasm time.  This is where the book gets hairy…Are you too lazy to develop your own plans from scratch?  You want to cheat?  Just borrow the DETAILED test plans that are in the next seven chapters.  There is enough meat here that Cisco Press could copy & paste into a shorter book to sell.  We are talking over 200 pages of test plans covering seven areas.  That’s a lot of cowbell!

The book ends on a high note.  Since you went through the trouble of setting up a lab, why not use it for training/learning purposes. Step-by-step instructions are provided to setup a lab. This chapter may not be useful to a large number of folks since the equipment covered is pure Cisco, including UCS.  In fact, many of the directions provided center around setting up a UCS environment. I happen to like this chapter because one of my last major implementations before joining VCE was installing UCS for the organization at which I worked.  Sort of brings back memories.

To sum this review up:  If you are in the network field, you need this book.

Use Cases for Cisco UCS Network Isolation

October 4, 2010 Leave a comment

Based on my last post, a couple of people have emailed me asking, “what is the value of keeping UCS links alive when the network has gone down?”  The answer is: It Depends.  It depends on your applications and environment.  In my case, I have a number of multi-tiered apps that are session oriented, batch processors, etc.

The easiest use case to describe involves batch processing.  We have a few applications that do batch processing late at night.  It just so happens that “late at night” is also the window for performing network maintenance.  When the two bump (batch jobs and maintenance), we either reschedule something (batch or maintenance), take down the application, or move forward and hope nothing goes awry.  Having these applications in UCS and taking advantage of the configuration in my previous post  means I can do network maintenance without having to reschedule batch jobs, or take down the application.

I could probably achieve similar functionality outside of UCS by having a complex setup that makes use of multiple switches and running NIC teaming drivers at the o/s level. However, some of my servers are using all of their physical NICs for different uses, with different IP addresses.  In these cases, teaming drivers may add unnecessary complexity.  Not to mention that the premise of this use case is the ability to do network maintenance.  Any way to avoid relying on the network is a plus in my book in regards to this use case.

Now let’s consider session oriented applications.  In our case, we have a multi-tiered app that requires that open sessions are maintained from one tier to the next.  If there is a hiccup in the connection, the session closes and the app has to be restarted.  Typically, this means rebooting.  Fabric failover prevents the session from closing so the app keeps running.  In this particular case, UCS isolation would prevent this app from doing any work since no clients will be able get to it.  Where it helps us is in restoring service faster when the network comes back due to removing the need for a reboot.

I am going to guess that this can be done with other vendor’s blade systems, but with additional equipment.  What I mean is that with other blade systems, the unit of measure is the chassis.  You can probably configure the internal switches to pass traffic from one blade to another without having to go to another set of switches.  But if you need a blade in chassis A to talk to a blade in chassis B, you will probably need to involve an additional switch, or two, mounted either Top-of-Rack or End-of-Row.  In the case of UCS, the unit of measure is the fabric.  Any blade can communicate with any other blade, provided they are in the same VLAN and assuming EHV mode.  Switch mode may offer more features, but I am not versed in it.

I hope this post answers your questions.  I am still thinking over the possibilities that UCS isolation can bring to the table.  BTW, I made up the term “UCS isolation”.  If anyone has an official term, or one that better describes the situation, please let me know.

Categories: cisco, UCS Tags: , ,

Can UCS Survive a Network Outage?

September 29, 2010 1 comment

Part of our UCS implementation involved the use of Cisco Advanced Services (AS) to help with the initial configuration and testing.  Do to our integration issues, time ran out and we never completed some items related to our implementation plan.  AS was back out this week for a few days in order to complete their portion of the plan.    Due to timing, we worked with a different AS engineer this time.  He performed a health-check of our UCS environment and suggested a vSphere configuration change to help improve performance.

Before I get into what we changed, let me give a quick background on our vSphere configuration.  We are using the B250-M2 blade with a single Palo adapter.  We are not taking advantage of the advanced vNIC capabilities of the Palo adapter.  What I mean by that is that we are not assigning a vNIC to each guest and using dVswitches.  Instead, we are presenting two vNICs for the Service Console, two vNICs for the VMkernel, and two vNICs for virtual machines and using them as we would if we were on a standard rackmount server.  Each vswitch is configured with one vNIC from fabric A, one vNIC from fabric B, and teamed together in an active/active configuration.

Recommended Change: Instead of active/active teaming, set the service console and VMkernel ports to active/standby.  When doing this, ensure that the active NICs are all on the same fabric interconnect.  This will keep service console/VMkernel traffic from having to hit our northbound switches and keep the traffic isolated to a single fabric interconnect.

.

Here is where it gets interesting.

Once this was done, possibilities came to mind and I asked the $64,000 question.  “Is there a way to keep everything in UCS up and running properly in the event we lose all our northbound links”?  It’s was more of a theoretical question, but we spent the next 6hrs working on it anyway.  🙂

Disclaimer: not all of what you are about to read is fully tested.  This was a theoretical exercise that we didn’t finish testing due to time constraints.  We did test this with two hosts on the same subnet and it worked as theorized.

Here’s what we came up with:

First of all, when UCS loses its northbound links it can behave in two ways.  Via the Network Control Policy – see screen shot below  – the ports can be marked either “link-down” or “warning”.  When northbound ports are marked” link-down”, the various vNICs presented to the blades go down.   This will kick in fabric failover as well if enabled at the vNIC level.  If you are not using the Fabric Failover feature on a particular vNIC, you can achieve the same functionality by running the NIC Teaming drivers at the operating system level.   We are using NIC Teaming at the vswitch level in vSphere and Fabric Failover for bare metal operating systems.

Setting the Network Control Policy to “warning” keeps the ports alive as far as the blades are concerned and no failovers take place.  The beauty of this policy is that it can be applied on a per vNIC basis so you can cherry pick which vNIC is affected by which policy (Link-down or warning).  Using a combination of the Network Control Policy settings and vswitch configurations, it’s possible to keep workloads on UCS up and running, with all servers (virtual or otherwise) communicating without having any external connectivity.  This could be used to prevent massive outages, boot storms due to outages, etc.  In our case, since the bulk of our data center will be on UCS, it basically prevents me from having to restart my datacenter in event of a massive network switch outage.

Here is a table detailing our vSphere switch configuration:

Port Group Service Console NIC1 Service Console NIC2 VMkernel NIC1 VMkernel NIC2 Virtual Machine NIC1 Virtual Machine NIC2
Fabric A B A B A B
Teaming Config Active Standby Active Standby Active Active
Network Control Policy (in UCS) Link-Down Warning Link-Down Warning Link-Down Warning
Network Failover Detection (at vSwitch level) Link Status Only Link Status Only Link Status Only Link Status Only Link Status Only Link Status Only

As far as bare metal blades, go:

NIC1 NIC2
Fabric A B
Teaming Config Active Active or Standby (depends on app)
Network Control Policy (in UCS) Link-Down Warning

Digression: This looks like we are heavily loading up Fabric A, which is true from an overall placement point of view.  However, most of our workloads are in vm, which is configured for active/active, thus providing some semblance of load balancing.  We could go active/active for bare metal blades since the operative feature for them is the Network Control Policy.  With vSphere, we are trying to keep the Service Console and VMkernel vNICs operating on the same fabric interconnects in order to reduce northbound traffic.  Not so with bare metal systems.

Back on track: As previously stated (before tables),   what all this does in affect is to force all my blade traffic onto a single fabric interconnect in case I lose ALL my northbound links.  Since the ports on fabric B are not marked “link-down”, the blades do not see any network issues and continue communicating normally.

.

And now the “BUT”: But this won’t work completely in my environment due to the fact that I am connected to two disjointed L2 networks.  See Brad Hedlund’s blog and The Unified Computing blog for more details.  In order for this to completely work, I will need to put in a software router of some sort to span the two different networks (VLANS in this case).

.

So what do you think?  Anyone out there with a lab that can fully test this?  If so, I would interested in seeing your results.

.

VMworld voting has begun

The folks who run the VMworld tradeshow have taken a new path to deciding session content this year.  Instead of a panel making all the choices, the powers-that-be have decided to open it up to the public to vote on those topics that interest them.  I’ve submitted a proposal to speak about our soon-to-be UCS implementation and how it is going to transform our business.  If interested, please go here:  http://vmworld.com/community/conferences/2010/cfpvote/tarchitecture and vote for my session.  My session is entitled:

Session ID: TA7081, Session Title: Case Study: vSphere on Cisco UCS – How the City of Mesa changed strategic direction

.


Prepping for our Cisco UCS Implementation

The purchase order has finally been sent in.  This means our implementation is really going to happen.  We’ve been told there is a three week lead time to get the product, but Cisco is looking to reduce it to two weeks.  A lot has to happen before the first package arrives.  Two logistical items of note are:

  • Stockroom prep
  • Datacenter prep

What do I mean by “Stockroom prep?”  A lot actually.  While not a large UCS implementation by many standards, we are purchasing a fair amount of equipment.  We’ve contacted Cisco for various pieces of logistical information such as box dimensions and the  number of boxes we can expect to receive.   Once it gets here, we have to store it.

Our stockroom is maybe 30×40 and houses all our non-deployed IT equipment.  It also houses all our physical layer products (think cabling) too.    A quick look at the area dedicated to servers reveals parts for servers going back almost ten years.  Yes, I have running servers that are nearly ten years old <sigh>.    Throw in generic equipment such as KVM, rackmount monitors, rackmount keyboards, etc and it adds up.   Our plan is to review our existing inventory of deployed equipment and their service histories.  We’ll then bump up that info with our stockroom inventory to see what can be sent to disposal.   Since we don’t have a lot of room, we’ll be really cutting down to the bone which introduces an element of risk.  If we plan correctly, we’ll have a minimum number of parts in our stockroom to get us through our migration.  If we are wrong and something fails, I guess we’ll be buying some really old parts off eBay…

As for prepping the data-center, it’s a bit less labor but a lot more complex.  Our data-center PDUs are almost full so we’ll be doing some re-wiring.  As a side note, the rack PDU recommended by our Cisco SE has an interesting connector to say the least.  These puppies run about $250 each.  The PDUs run over $1200 each.   Since we’ll be running two 42U racks of equipment, that equals four of each component.  That’s almost $6K in power equipment!!

As another data-center prep task, we will need to do some server shuffling.  Servers in rack A will need to move to a different rack.  No biggie, but it takes some effort to pre-cable, schedule the downtime, and then execute the move.

All-in-all, a fair amount of work to do in a short time-frame.

A Trend in Technology Provider Marketing Techniques

The recession has brought about a few major changes in sales/marketing techniques in the technology industry.  There was a time when only executive management was wined and dined and the common man was left out in the cold.  Well my friends, that time is no more.

Over the last 18 months or so, I have been invited to more lunches and activity-based events that I have in my 20+ years in the IT industry.  The two (lunches and activity based events) can be broken down into two categories of providers: those selling really expensive products and those with not-so expensive products.

Those in the really expensive product category usually are storage providers.  Since these systems can easily reach into the hundreds of thousands of dollars, the sales/marketing experience has to be equally impressive.  As such, the event most often chosen is lunch at an upscale steak restaurant such as Ruth’s Chris or Fleming’s.    The typical event consists of a small presentation (usually under 30 minutes) followed by a lunch from a scaled-down menu.  Even though the menu is scaled down, the quality of the food is not;  the reputation of the restaurant is still on display.

In the not-so expensive category, we typically find VARs and small product vendors.  The event of choice in this category is entrance to something with mass appeal such as a blockbuster movie’s opening day.   As with the lunches, the event begins with a 30 minutes presentation and then the movie begins.   This type of event has become so pervasive that I recently had three invitations to see Iron Man 2 at the same theater on the same day (all at different times).

I don’t go to the lunches very often because I feel it is disingenuous to take advantage of something so expensive for no return.   I only attend when I have a budgeted project.  I’m also careful to keep track of the “promoter”.  Some promoters are very good at setting up the presentations so that real information is imparted.  Others are there just to get butts in the seats and the presentations tend to suffer for it.  While I enjoy a good meal, I don’t want to waste my time.  However, I do partake in some of the movies since they usually take place on a Friday (my day off) and I use them to network with the VAR and other IT professionals.

Other events in the expensive category:

  • Tickets to major golf tournaments
  • Tickets to basketball games
  • Tickets to concerts

Other events in the not-so-expensive category:

  • Tickets to baseball games (many can be bought in volume for under $10 each)
  • Kart racing (fast go-karts)
  • Lunch and games at a large entertainment venue such as Dave & Busters

What else have you seen?   Anything outrageous?

.

Battle of the Blades – Part III

April 27, 2010 Leave a comment

So earlier on I posted that I would list our strategic initiatives and how they led us down the path of choosing Cisco UCS as our new server platform as opposed to HP or IBM blades.  Before I begin, let me state that all three vendors have good, reliable equipment and that all three will meet our needs to some degree.  Another item of note is that some of our strategies/strategic direction may really be tactical in nature.  We just sort of lumped both together as one item in our decision matrix (really a fancy spreadsheet).  The last item to note is that all facts and figures (numbers) are based on our proposed configurations (vendor provided) so don’t get hung up on all the specifics.  With that out of the way, let’s begin…

If we go way back, our initial plan was just to purchase more HP rack-mount servers.  I have to say that DL380 server is amazing.  Rock solid.  But given our change in strategic direction, which was to move from rack-mounts to blades, we were given the option of going “pie-in-the-sky” and develop a wish list.  It’s this wish list, plus some specific initiatives,  that started us down the path at looking at Cisco UCS (hereafter referred to as UCS).

Item one:  Cabling.  Now all blade systems have the potential to reduce the number of cables needed when compared to rack-mount systems.  Overall, UCS requires the least amount of cables outside of the equipment rack due to the fact that the only cables to leave the rack are the uplinks from the fabric interconnect.  With HP and IBM, each chassis is cabled back to your switch of choice.  That’s roughly 16 cables per chassis leaving the rack.  With UCS, we have a TOTAL of 16 cables leaving the rack.  Now you might say that a difference of 32 cables per rack (assume 3 HP or IBM chassis in a rack) might not be much, but for us it is.  Cable management is a nightmare for us.  Not because we are bad at it, we just don’t like doing it so less cabling is a plus for us.  We could mitigate the cable issue by adding top of rack switches (which what a fabric interconnect sort of is), but we would need a lot more of them and they would add more management points, which leads us to item two.

Item two:  Number of management points.  Unless you have some really stringent, bizarre, outlandish requirements the chances are that you will spend your time managing the UCS system at the fabric interconnect level.  I know we will.  If we went with HP or IBM, we would have to manage down to the chassis level and then some.  Not only is each chassis managed separately, think of all the networking/storage gear that installed into each chassis.  Each of those is a separate item to manage.  Great, let’s just add in X more network switches and X more SAN switches that need to be managed, updated, secured, audited, etc.  Not the best way to make friends with other operational teams.

Item 3:  Complexity.  This was a major item for us.  Our goal is to simplify where possible. We had a lot of back&forth getting a VALID configuration for HP and IBM blade systems.  This was primarily the fault of the very large VAR representing both HP and IBM.  We would receive a config, question it, get a white-paper from VAR in rebuttal, point VAR to same white-paper showing that we were correct, and then finally get a corrected config.  If the “experts” were having trouble configuring the systems, what could we look forward to as “non-experts”?

Talking specifically about HP,  let’s add in HP SIM as the management tool.  As our HP rep is fond of stating, he has thousands of references that use SIM.  Of course he does, it’s free!   We use it too because we can’t afford Openview or Tivoli.  And for basic monitoring functions, it works fine albeit with a few quirks.  Add in Blade System Matrix on top of it, and you have a fairly complex management tool set.  We spent a few hours in a demo of the Matrix in which the demoer, who does this every day, had trouble showing certain basic tasks.  The demoer had to do the old tech standby: click around until you find what you are looking for.

Item 4: Multi-tenancy.  We plan on becoming a service provider, of sorts.  If you read my brief bio, you would remember that I work for a municipal government.  We want to enter into various relationships with other municipalities and school districts in which we will host their hardware, apps, DR, etc and vice-versa.   So we need a system that easily handles multiple organizations in the management tool set.   Since we are an HP shop, we gave a very strong looksy to how HP SIM would handle this.  It’s not pretty.  Add in the Matrix software and it’s even uglier.  Now don’t get me wrong.  HP’s product offerings can do what they claim, but it not drag-and-drop to setup for multi-tenancy.

Item 5: Converged Architecture. When we made our initial decision to go with UCS, it was the only converged architecture in town.  I know we are not going to be totally converged end-to-end for a few years, but UCS gets us moving in the right direction starting with item 1: cabling.    All the other vendors seemed to think it was the wrong way to go and once they saw the interest out there, they decided to change direction and move toward convergence too.

Item 6: Abstraction:  You could also call this identity, configuration, or in UCS parlance, service profiles.  We really like the idea of a blade just being a compute node and all the properties that give it an identity (MAC, WWN, etc) are abstracted and portable.  It’s virtualization to the next level.   Yes, HP and IBM have this capability to but it’s more elegant with UCS.  It’s this abstraction that will open up a number of possibilities in the high-availability and DR realms for us further down the road.  We have plans….

So there you have it.  Nothing earth shattering as for as tactics and strategy go.  UCS happened to come out ahead because Cisco got to start with a clean slate when developing the product.  They also didn’t design it for today, but for tomorrow.

Questions, comments?