Will Larson on Systems of Engineering Management (Scaling Tech Podcast Ep19)

May 2, 2023 | Developers, Team Management

How did you learn to become an engineering manager? If you’re like most people, you were largely self taught. Engineering managers are often promoted from senior engineering roles and they have probably not been taught best practices in managing teams. In this episode of the Scaling Tech Podcast, we talk with engineering leader and author Will Larson, whose books will go a long way towards helping you become an effective engineering manager.

Will Larson has been a software engineering leader at Calm, Stripe, Uber, and Digg. He is the author of An Elegant Puzzle and Staff Engineer. Before moving to San Francisco, he grew up in North Carolina, and studied Computer Science at Centre College in Kentucky. Will’s books cover all sorts of challenges that engineering managers face – from how to size your teams, to managing technical debt, to hiring and even replacing yourself. They are a very good read, and yet not filled with fluff and filler stories. Instead, they are full of clear explanations, actionable advice and they make a great reference text to keep at your desk.

This is a must-listen episode for all of those who are interested in engineering management!

Listen on Spotify
Listen on Apple Podcasts

Watch the video:
Show notes with links to jump ahead are below

Show Notes from Episode 19 – Will Larson on Systems of Engineering Management
Timestamp links will open that part of the show in YouTube in a new window

  • 00:00 The opening clip from Will is about how new external leaders to a team will listen and learn initially if they are good external leaders. But bad leaders will try to force a big migration immediately, bring in their friends, and blame the system and current team when the architecture is in shambles in a year. The reality is too many leaders are not self aware, and leadership roles do not necessarily reward self-awareness.
  • 01:22 Arin and David talk about their impressions of this episode, and note how thought-provoking it was to speak with Will. David points out how relevant this was to our business at AgilityFeat, where we staff US companies with engineers in Latin American countries. Because we’ve been doing this for 12 years now, we are seeing our team members grow and be promoted with their clients. This comes with challenges though, and how do you promote an engineer into a staff engineer type of role? Will helped us to answer that question today. Arin agrees that this distinction between engineering manager growth paths and staff engineer growth paths is very valuable, especially for those engineers who want to grow in their role and compensation without taking on traditional management responsibilities.
  • 04:26 Arin introduces Will Larson. Will has been a software engineering leader at Calm, Stripe, Uber, and Digg. He is the author of An Elegant Puzzle and Staff Engineer. Before moving to San Francisco, he grew up in North Carolina, and studied Computer Science at Centre College in Kentucky. Will’s books cover all sorts of challenges that engineering managers face – from how to size your teams, to managing technical debt, to hiring and even replacing yourself. They are a very good read, and yet not filled with fluff and filler stories. Instead, they are full of clear explanations, actionable advice and they make a great reference text to keep at your desk.
  • Becoming an Engineering Manager
  • 05:08 Arin asks Will about why is engineering management so self taught, and how did Will enter that space? Will talks about how management education often doesn’t address the issues you need when leading software teams, and software engineering education does not address management skills. For Will, he tended towards engineering management over his career because he found that was where he found the most interesting problems to solve. In most companies, the hard part is really around the people management side, and improving that is where you can really make progress. There should be a more structured way to learn, but most effective engineering managers started out technical and then had to teach themselves to make the switch to engineering management. Arin talks about his journey to engineering management by learning about agile and then getting an MS in Management of IT after leading a death march project that was a miserable introduction to engineering management.
  • 08:40 David talks about in his early engineering management career, he did not have a mentor, and his organization didn’t have the willingness to mentor leaders. David had to learn himself and use his ambition to overcome those challenges. The leaders in that company at the time were not really good role models anyways, so he asks Will to explore that problem in many small companies. Will agrees that this is a tough cycle in small companies, because the past managers you want to learn from were also probably self taught and so have biases based on that. Will talks about trying to find mentors is hard, and so he personally did not rely on them for that reason and because he was at small companies. Once he joined Digg, they had already done layoffs, and so even in a larger company it was hard to find mentors in that situation. The problem of mentorship really is an industry problem, across the size of companies. Arin talks about how easy it is for companies to forget about mentoring their team during difficult times, and so it tends to only be interesting to companies during good times. Will talks about how management has changed over the last decade, and the industry has become more about outcome management instead of people management, and that has made the mentorship problem worse, but it never was good to begin with even during times when we talked more about people management.
  • Hiring and Promoting Engineering Managers
  • 13:30 As companies grow, they face a challenge with hiring managers. Do you hire or promote within from your existing engineering talent? David says he is partial to promoting engineers rather than hiring from outside, because of the importance of understanding the value and principles of a company, and internal promotions will have that understanding already and can better mentor others in those values and ways of working. David asks how to strike a balance between external and internal manager hires? Will says that the industry suggestion is to have a 50/50 split between internally promoted managers and external hires. In rapidly growing companies it’s too hard to foster people, and so you try to bring in people who have seen the growth problems before. But that carries a risk too. Will talks about the risks of external leaders – good ones listen and learn, and bad ones will try to replace things right away. They will initially complain about the existing systems, then force a big change, start hiring their friends, and start pushing out the existing leadership structure. A year later everything is in shambles, and they will blame the previous leaders, without being self aware about their impact on that or their own performance. That’s the nightmare scenario, but you can usually avoid it with a mix of hires, and bringing in new leaders can be a very additive experience. For instance, a more experienced leader may know where ML/AI tools really add value, and where it doesn’t add value. A less experienced manager might assume that “ML can solve anything” and don’t have the wisdom of experience. But you also need to make sure the internal team feels valued and their knowledge of the culture is rewarded by promotions too. So a mix is important, and good leaders should both hire externally and promote internally.
  • The Four States of a Team
  • 20:14 Arin asks about the Four State of a Team, from Will’s book “An Elegant Puzzle.” Will wrote about this in 2019 based on what he frequently saw about how people negotiate for the wrong things at the wrong time to help their team. For example, if no one is working on internal development tooling, then the development process will become too slow. Adding team members to work on those tools will have a huge impact for a few months, but then the results have decreasing returns on the time spent since all the low hanging fruit has been capture. So based on the stage of your team, you need to know when to add capacity or when to focus on efficiency and reducing work in progress. In initial stages, adding more people to a project will help. But then you get to a place of “treading water” where you are saturated with incoming requests, and in this stage, it’s more about figuring out what not to do. You have to focus on finishing pieces of work, and adding new members doesn’t help move the needle. Will talks about an example of solving flaky automated tests – solving that underlying issue will improve overall efficiency if management can protect the team long enough to get that work done and reduce the burden on the team. Leadership must absorb the pain from other teams in protecting their team during this phase. After that phase, teams reach a very nice state (in theory) of no incoming toil, but this is hard to maintain. The final phase to arrive at is having the space to allow the team to try new things and innovate more. This could be innovating with AI tools, and determining the most valuable way to use them in our teams – they need to experiment to decide how a tool like Github Copilot can be most applicable. Arin notes how it’s interesting that only that first stage of “Falling Behind” has the solution of adding more people, but a new manager has the temptation of just adding more people and building their empire.
  • 28:25 Leaders sometimes look for ways to get stuck, which is not their fault. For example, by asking for more headcount that they know the company won’t give them, they get to say that the current situation is not their fault. It’s not always the wrong response, but is too often just a way to avoid accountability, and it rarely ends well. Arin asks about how you reduce WIP in the Treading Water phase – how can a leader convince the business counterparts and the technical team that it’s more important to look at team productivity than individual productivity? Will talks about one of his favorite metaphors for motion versus progress. If you drive a car with the windows down, it feels like you are going faster even though you are not. Many management decisions create this impression, but you might always be increasing drag on the team (just like rolling down the windows increases drag on your car and makes it less efficient). As an example, Will talks about how you might do architectural migrations of best practices within a monolith, instead of jumping straight to a more services oriented approach. A team might feel more free when they are unleashed to create lots of new services, but it’s not really helping the business initially because they are still stuck on the monolith application. It may be more efficient to start enforcing new best practices and incremental improvements in the monolith before moving to the more modular architecture. We have to focus on overall efficiency, not just our desire to feel productive.
  • Designing for Scale
  • 33:40 Arin asks about a portion of Elegant Puzzle where Will describes that any system can only survive one order of magnitude in growth. Will writes that if you’re designing systems to last one order of magnitude in growth, and your system is doubling every six months, then you’ll have to reimplement every system twice every 3 years. This creates a lot of risk and means that critical scaling projects are going on constantly across the organization. Arin notes that’s a lot of change, how do you balance the change involved in system rewrites with the necessity of doing them? Can you design for more than one order of magnitude of growth? Will answers that every company is different, and not all companies grow as fast as the rate he described in that part of the book, and he talks about different scaling examples from his experiences at Stripe and Uber, which were very different products. So it’s hard to generalize but leaders have to acknowledge that it’s hard to see the future, and so most architectural decisions involve meaningful tradeoffs. If you try to plan too far in the future, you become wed to architectural designs that may not work out the way you expected, and these infrastructural decisions can slow down the company’s ability to implement change over time. The most impactful architectural improvements give you scaling ability without impacting product flexibility, and so as cloud vendors have matured since Will originally wrote that around 2016, it has become easier to scale without doing full rewrites as often. However, there’s still a top bound to any commodity solution where you have to redesign your system for new orders of magnitude. Trying to design for that too far in advance is bad, it will hurt your productivity and delivery. Will talks about a time where Digg tried (around 2010) to move from Postgres to Cassandra too early, Cassandra was at too early a version at that time and ended up restricting some of the features they could offer to users.
  • Managing Through Change
  • 41:12 David asks about teams – how can engineering managers prepare their teams to survive and thrive during rapid growth and change? Will says this is a challenging question to answer directly. Most people process changes don’t give a good improvement out of the box, which is different than a product change. People process changes are difficult to implement and better incorporate the aggregate wisdom of the team. For example, an incident response process that worked for a 5 person team won’t work for a 50 person organization. Will mentions a book about finite and infinite games. Finite games have rules, winners, and losers. Infinite games are when participants decide on the rules, and the goal is simply to keep playing. School and sports teach us that most things are finite games – get that promotion, get that score. But in growing companies, the game is changing constantly and if you focus on “winning” then you are misunderstanding what you are supposed to be doing. The rules of the game, the people processes, have to constantly change.
  • 45:30 David talks about how important it is that our clients have a good onboarding process and a new provisioning system for new developers, so they know exactly how to get started, the tooling used, the policies and security procedures, etc. In talking with Will, David realizes though that a company in a rapid growth phase should see onboarding as more like a mentorship process.
  • 47:13 Arin asks about the downsides, when companies do reorganizations and downsizing. What is your experience with running a reorg – how do you prevent loss of knowledge and how do you prevent killing the morale of remaining team members? Will notes how cyclical layoffs are – he joined the industry in 2008 and Yahoo kept freezing their offer to him as they got through initial phases of that economic downturn, and Will had to wait to join. At Digg, he also saw multiple rounds of layoffs, but then money became more available in the industry and there were no layoffs for about 10 years. Now, since the pandemic, it’s become more chaotic with ups and downs. The reality is you will always have some loss of knowledge in the organization. Will tries to focus on why people are still at the company, and make sure those reasons that they stayed are still in effect. Employees worry about leaving if they have a lot of potential wealth locked up in things like stock, and so leadership has to help them find value in the work they do and the mission of the organization. You can’t assume what kept them here in the past will still translate to now. They have to rediscover why there are here, or they will be unhappy and feel trapped after a changeover.
  • 52:00 Will talks about organizational math. Teams should have 6-8 members. Every manager should have 6-8 direct reports, and every senior leader should have 4-6 managers reporting to them. In a reorg it’s important to stick to this sort of organizational math in order to make sure the organization still makes sense. If you don’t consider that in the original layoff, then you’ll find that more changes are necessary later. It’s better to make all the necessary change at once, even though this requires more thought up front.
  • “Staff Plus” path and Staff Engineers
  • 54:05 Arin brings up the other path that Will discusses in his second book, Staff Engineer. The “Staff Plus” path leads to non-management roles like Staff Engineer and Principal Engineer. This path is for those who are not interested in management roles, but still want to advance. Unfortunately those people will initially assume that this path means they get uncapped growth personally but don’t have to work people or deal with people issues. Will’s book emphasizes that there are very meaningful paths for engineers that are not management roles, but it’s important to understand there will still be people problems you have to learn to deal with, even though you don’t have to be the person doing performance reviews. Even an architectural change is composed of technical problems and people problems – you have to sell others on the technical changes you want to make. It’s not just about being a faster engineer, it’s about problem solving skills, specific domain knowledge, and those communication and collaboration skills. Staff Engineer is fundamentally a different job, it’s not just a more Senior Engineer (although it’s confusing because some companies treat the title that way).
  • 1:03:30 To learn more about Will’s work, visit his blog “Irrational Exuberance” at lethain.com.

Links from Episode 19 – Will Larson on Systems of Engineering Management