Fourth in a series on EMC’s new Durham Cloud Data Center. Click here to read part three.
If you are struggling to sort out decades of intertwined databases and mission critical applications to move them to a brand new data center, take heart, you’re not alone. In this blog I’ll discuss our struggles to come up with a migration plan.
As soon as EMC’s Durham Data Center Migration Program to move six petabytes of data and hundreds of applications to our new cloud data center was underway, we initiated the discovery and planning efforts. These work streams ran in parallel to our Architecture Design (Part 1) and our First 90 Days (Part 2) work streams.
I had never migrated a data center before and I had no idea how complex the effort would be. Discovery? Why would we need to do that? We know what’s running where….right?
As I said in my last blog, IT has historically been a “village of artisans,” a group of highly-skilled individuals producing incredible works of art — applications that run our companies and today’s modern economy. Typically, the original design documentation is nearly as elaborate as the resulting architecture, 40, 50, 60 pages filled with flowcharts and network topologies.
Unfortunately, the systems and processes to record this information and keep up with change have been sorely lacking. Also, in IT, we need to build ONE production environment. In contrast the engineering discipline and systems required by high volume complex manufacturing operations, like automobiles, airplanes and smart phones are much more sophisticated and mature. I’m pretty sure Apple has a Bill of Material for the iPhone that is under strict Engineering Change Control.
Why don’t we have that in IT? It’s a function of the low volume and rate of production. As a result, as time progresses, without discipline, systems, robust processes, personnel changes and infrastructure changes slowly added uncertainty to our IT tracking systems.
We thought we were in pretty good shape. We had a system tracking every device in the data center. We had another system tracking what devices made up each application. We had discovery running in the background populating our Configuration Management Database (CMDB).
The first step in the discovery process was to compare all of the systems to each other. The results were shocking as many devices in the data center were orphans. We didn’t know what application and business service they were supporting. What are they?
We had no standard infrastructure stack. Each application was unique and could be using parts and pieces from different clusters for each tier based on the OS and development languages deployed on each application farm.
The most important step in the discovery process was to interview application owners to validate their infrastructures. The interview process was extensive. EMC Consulting led that effort with a team of Migration Analysts and Projects Managers. Data from our legacy tracking systems was extracted, compared and cleaned up as much as possible prior to the interview. The application owners then reviewed their list of applications and servers. As expected the interviews discovered even more interdependencies.
Migrating database grids
Now what? A move event of 1,000+ servers and hundreds of applications could never work. It would be chaos. So we kept looking at the data, and kept looking at the data. After several heated debates, we decided that the database was the heart and lungs for every application. We would plan the migrations around the database grids.
We would remediate the application and web tier multi-tenancy by building new VMs (virtual machines) that aligned with each database grid. We would just copy the application code onto the new servers before the migration. The remaining applications that didn’t have multi-tenant infrastructures would be grouped together and migrated in other move events.
How many move events? Here, we leaned heavily on EMC Consulting. Based on their experience they advised us that 10 weeks would be required for detailed planning and execution of each move event and we shouldn’t have more than three in process at any given time. We also decided that we wanted to move Production on weekends and Dev/Test during the week. And we decided to move Dev/Test well before Production, so that bumps in the road would be experienced on the less critical environments.
In parallel to the migration, EMC IT was also aggressively executing an ERP upgrade to SAP. We created a pause in the schedule for several weeks around the go-live to minimize resource constraints and risk.
We drafted up the first iteration of the plan and then communicated it out with the application owners and senior management. As expected there were some additional discussions and negotiations resulting in some shuffling of the plan. The final result was 20 move events with eight contingencies.
DURHAM MOVE EVENT PLAN
It took more than a year to complete the discovery and planning. In that time, the new facility was completed, and the new network, compute, storage and backup infrastructure was designed, procured and installed. We also were wrapping up installing new instances of about 50 foundation applications into Durham on new VMs. Applications like Network Time Protocol (NTP), Active Directory, and Domain Name Service (DNS) would be required before any business applications could be migrated and run.
It had been an incredibly productive year. I have a vivid memory the day we signed off on the plan. Late that evening I was flying home to North Carolina, I leaned back in the seat and began drifting off to sleep filled with pride on how much our team had accomplished. It seemed like an impossible task last year to get to this point and now we were ready to starting migrating. Heck, we were just about on plan we had six and a half quarters left before we needed to start decommissioning the old datacenter. I bolted upright in the seat, startling the woman next to me.
“Are you alright Dear.” she asked.
“Yes. Thank you. Sorry, I just remembered I have something to do.” I replied.
Six and a half quarters to go, hundreds of applications, more than 2,000 servers. How were we going to get that done? Typically it takes a quarter or two to build an application infrastructure. In six and a half quarters we had to build and migrate everything! It was a long flight.
I’ll cover the migration technology and challenges, failures, breakthroughs and ultimate success in my following blogs.