Engineering projects that are either directly technology migrations or shaped like them are exceedingly common, so learning to do them well is an important skill as an engineering leader. These projects tend to generate a lot of detail-oriented work and often lack the glamour of shipping product features. Connecting the importance of migrations to business impact and demonstrating their value can be a challenge. Supporting teams through these projects can be even harder. In this post, I’ll cover some framing techniques and tools I learned to make migrations more successful, their business case clearer and also talk about supporting teams and individuals through them.
I support the Compute team at Stripe, and we often have multiple infrastructure migrations in progress at the same time. A previous audit of ongoing migrations in the Compute team by Charles Hooper noted that we had 5 simultaneous migrations in flight at various stages of completion (the team was < 20). As an infrastructure team, our migrations can be planned or unplanned and range from regular OS and software upgrades to critical security vulnerabilities e.g. Spectre-Meltdown. In my experience these migrations are common and necessary on infrastructure teams.
Framing and communicating
Migration projects are often critical to the business. However, due to the appearance of being somewhat predictable and controlled, relative to shipping new products they can be perceived as less important for a few reasons. The complexity, particularly if lower in the stack e.g. infrastructure migrations, is often hard to communicate. As a metaphor, they are seen as traversing an existing well-trodden path vs exploring brand new territory. Many are often long running and run counter to the popular “shipping quickly to deliver value” narrative which adds to the drag and lack of glamour.
As leaders, it is our job to fill in the missing gap in perception and draw clear lines between the work, it’s impact and business needs. Some tactical ways to do this.
- Always be clear on the “why” between the business goals and the migration. Our team started a trend led by Julia Evans creating a landing page explaining why we were migrating to Envoy a few years ago, that was literally titled “Why Envoy?” that detailed all the reasons we were embarking on the project. It was an immensely helpful way to describe a relatively large migration to a broad audience and a very succinct manner and we now have such documents for most large migrations.
- Use metrics that are tied to team and business level metrics. The effort made towards the migration should show up in long-running metrics, that are tracked by the team and used to gauge their success. These metrics can in turn be useful in driving the progress of the migration. An example of such a metric would be a breakdown of percentage of the fleet on a particular version and its change through the migration. Metrics can also be an important tool when you need other teams to do work on your behalf. E.g. dashboards showing individual team’s progress towards migrations (essentially leaderboards) enables organization leaders to incentive the right teams to unblock critical migrations. Some examples of good migration metrics:
- Separate dashboards showing the overall progress allow communication of progress
- Goals based dashboards showing time saved by developers post migration incentives teams to adopt new tooling
- Engineering hours saved as part of adopting a tooling migration.
To summarize, carefully chosen metrics can communicate impact much better than any other tool.
Communicate progress in multiple ways. Maintain multiple communication channels with the organization and provide updates in different formats. This signals the importance of the work and keeps it top of mind for everyone and not just the team working on it. Tailoring updates to different audiences is important. E.g. engineers need information on how the migration impacts them, leadership teams need business impact and progress, line managers need to know their own teams progress etc.
- Learn your organization’s preferred style and use it, e.g. email, slack, dashboards, meetings are all good channels and can be used in various ways.
- For particularly important migrations leveraging a company wide all hands meeting to provide updates can be a tool used infrequently for high impact.
- Communication channels can also be a feedback loop for the team and can be leveraged for recognition for the team doing the migration e.g. a tap back from the head of engineering in response to a milestone email can go a long way in helping the team feel recognized.
The previously mentioned metrics make communications easier and more impactful as well.
Executing on migrations
Some strategies on the nitty-gritty of the actual migrations and running them I’ve found useful.
Having a kick-off checklist
My current opinion is that too often teams embark on large projects without asking the right questions. These questions vary by the nature of the project but some generalizable ones I’ve found for migration projects are:
- Why are we doing this? The previously mentioned “why” document is a good artifact to create here.
- What does “done” look like? This is a useful point at which to evaluate the business impact and trade-offs
- What is a reasonable point at which we’d stop and still declare success? E.g. is it acceptable to have a certain percentage complete that is less then 100%
- What is the priority of this migration relative to other business needs?
- How will we measure and track the different states? A good answer to this would have some early metrics.
- What are the constraints? Some examples of constraints are:
- First degree constraints can be external deadlines for security vulnerabilities, compliance information, regulations etc.
- Second degree constraints are needing other teams to participate, waiting on vendors etc
Seek dedicated expertise to manage the complexity and interactions across teams. Having dedicated people with specialized skills such as a Technical Program Managers can help facilitate large and critical migrations. This gives engineers additional help and a good partnership can help with organizational road-blocks.
This is obvious but is also often easier said than done, hence is worth repeating. If you have repeated migrations, each incremental migration should be less effort. Often, a combination of data, communications and a dedicated expertise can help make a business case for investing in such efforts.
Supporting teams and maintaining morale
Something often overlooked on projects that are long running is the morale of the team. It doesn’t have to be a slog that everyone grins and bears. Being aware of the team mood is important on teams where there are many such migrations or you will inevitably create a culture where they are pushed to the side until they are a burning issue.
Reduce toilsome work and ensure the remaining is equally distributed
It is important to ensure that toilsome and unglamourous work is not always done by the same set of people for any reason e.g. lack of social capital, lack of organizational tenure, or because they happen to notice the work. This is table stakes for a healthy and well-run team. My framework for toilsome work as a manager:
- Question the necessity of toilsome work e.g. should it be automated away. Prioritize efforts to making it less toil.
- Spread out the load so it’s not always the same individuals doing it. Institute rotations if necessary.
Fewer, shorter, work streams is a faster path to delivering impact and this is particularly true for migrations. It is easier to maintain team morale for bounded, shorter projects when the end state is clear. Ian’s team experimented with an approach a year ago where they focused on finishing all on-going migrations to the exclusion of other work. In some cases this was finishing work that had been on-going for multiple years. My current principles are:
- Based on team size, there should be a cap on the number of on-going migrations at any given point. Essentially, limit work-in-progress.
- All migrations should be a team-wide effort with sizable staffing and definitely not single person projects.
Recognize and reward working on migrations
For the same reason as migration projects are seen as less glamorous, it can also be harder to leverage work done on them for career progression. A common perception is that these are a necessary evil but not really “promotion projects”. Given the pervasiveness of migrations in infrastructure teams, it is important for managers leading these teams to recognize, reward and communicate the impact of this work. Leading and contributing to migration projects should count towards promotions and career progression to the same degree as shipping features.
If you are an individual contributor, it might feel sometimes as if it rests solely with your manager and the organization and it can be frustrating if they aren’t equipped with the tools to recognize you. In this position you can help them by doing some or all of the following:
- Use the frameworks mentioned above, particularly the metrics and communication tools, to describe the work and its impact to the business and communicate it to the organization on an on-going basis.
- Communicating work publicly, e.g. on LinkedIn or on a resume can be a challenge because there is often no external impact unlike that of a product feature. Referencing the previously mentioned business impact and metrics in a publicly acceptable format is easier when you have the internal frameworks described.
- Most importantly, if you are leading the migration, own it and don’t be afraid to take credit for its impact.
To summarize, leadership is key. Migrations require strategy, planning and communication. Done thoughtfully, they can be an important leverage for the business and executed in a manner so that people working on them also find them rewarding. This is important because there will always be another migration.
Thanks to https://twitter.com/ircri and https://twitter.com/djhoffma for many thoughtful and valuable reviews and edits on this post.