Software Engineering at Google

engineering
management
software development

1 Listen to Software Engineering at Google Summary

2 Book Summary: Software Engineering at Google

This book defines software engineering as “programming integrated over time.” It explores three fundamental principles that shape Google’s engineering culture, processes, and tools, focusing on how to create a sustainable and healthy codebase that can evolve over decades.

2.1 Principle 1: Time and Sustainability

Software isn’t static; it must adapt to changing requirements, technologies, and dependencies over its entire life cycle. A project is sustainable if you are capable of reacting to valuable changes as they arise.

  1. Programming vs. Software Engineering: Programming is the immediate act of writing code. Software engineering is the entire life cycle, including development, modification, and maintenance over time.
  2. Code is a Liability: All code carries a maintenance cost. The goal is to maximise the functionality delivered per unit of code, which often means removing obsolete systems rather than just adding new ones.
  3. Plan for Change: Assume that underlying dependencies (libraries, OS, hardware) will change over a project’s lifetime. An organisation must be capable of deploying patches and upgrades. Stagnation is a risky and often expensive choice.

“With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviours of your system will be depended on by somebody.

This is a dominant factor in changing software over time. Even the most innocuous change will break something for someone. This means any analysis of a change’s value must include the cost of identifying and resolving those breakages.

2.2 Principle 2: Scale and Growth

As an organisation and its codebase grow, processes and tools must scale efficiently in terms of both human effort and computational resources. Policies that work for a small team often fail at a large scale.

  1. Identify Non-Scalable Policies: Policies that require work proportional to the size of the organisation or codebase are not scalable. For example, asking every consumer of a library to manually migrate to a new version doesn’t scale. Instead, the team making the change should do the migration work, benefiting from economies of scale and expertise.
  2. Automate and Optimise: Every task your organisation does repeatedly should be scalable, ideally through automation. Code formatters, static analysis, and automated large-scale change tools are critical.
  3. The Beyoncé Rule: “If you liked it, you should have put a CI test on it.” Infrastructure teams are not at fault for breakages if a product’s critical behaviour wasn’t covered by tests in the Continuous Integration (CI) system. This policy scales because it makes teams responsible for their own stability, rather than requiring infrastructure teams to know the testing specifics of every project.

2.3 Principle 3: Trade-offs and Costs

Good engineering involves making rational, data-driven decisions by evaluating trade-offs. The goal is to move away from “because I said so” and toward evidence-based choices.

  1. Evaluate All Costs: Decisions must consider financial, resource, personnel, opportunity, and societal costs. In software engineering, personnel cost (engineering effort and happiness) often dominates.
  2. Data-Informed, Not Data-Obsessed: While data is crucial, not everything is measurable. Decisions are a mix of data, assumption, precedent, and argument. Leaders must exercise judgement when data is incomplete.
  3. Revisit Decisions: Data and contexts change over time. A decision that was correct a year ago might not be today. It’s critical to be able to revisit decisions and admit mistakes when new evidence emerges.

2.4 Key Ideas from Culture, Processes, and Tools

  1. Software Development is a Team Endeavour: The “genius myth” of the lone programmer is harmful. High-functioning teams are the key to success.
  2. The Three Pillars: All healthy collaboration is based on Humility (you are not your code), Respect (genuinely care for your colleagues), and Trust (believe others are competent).
  3. Blameless Post-mortems: Failure is an opportunity to learn. Focus on understanding the root cause and implementing preventative measures, not on assigning blame.
  4. Psychological Safety: Create an environment where it is safe to ask questions, admit ignorance, and take risks without fear of punishment. This is the most important factor for an effective team.
  1. Code Review is about Code Health: Beyond correctness, reviews ensure code is comprehensible, consistent, and maintainable. It’s a key mechanism for knowledge sharing and maintaining a collective standard.
  2. Test for the Future: The primary purpose of testing isn’t just to find bugs, but to enable change. A robust, automated test suite allows engineers to refactor and add features with confidence. Favour small, narrow-scoped, fast, and deterministic tests (unit tests).
  3. Deprecation Requires Active Effort: Simply marking something as “deprecated” is not enough. Obsolete systems must be actively removed by dedicated teams to prevent them from becoming a long-term drag on productivity. This is often done via Large-Scale Changes (LSCs).
  1. Artifact-Based Build Systems: Task-based systems (like Ant, Maven) don’t scale well because they offer too much flexibility. Artifact-based systems (like Bazel) are declarative, allowing the system to guarantee correctness, parallelism, and caching, enabling fast, reproducible builds at massive scale.
  2. The Monorepo and One-Version Rule: Google uses a monorepo where all developers work at head. This is coupled with a “One-Version Rule”: there is only one version of any third-party dependency in the repository. This prevents diamond dependency conflicts and simplifies the dependency graph.
  3. Tooling for Large-Scale Changes (LSCs): To manage a huge codebase, Google has invested heavily in tools that can safely automate changes across thousands or millions of files. This allows infrastructure teams to evolve the platform without placing the migration burden on every product team.

2.5 Key Principles and Mantras

  • Software engineering is programming integrated over time.
  • Code is a liability, not an asset. Its value is the functionality it provides.
  • With a sufficient number of users, every observable behaviour of your system will be depended upon by somebody (Hyrum’s Law).
  • If you liked it, then you shoulda put a CI test on it (The Beyoncé Rule).
  • Faster is safer. Releasing frequently in small batches reduces risk.
  • Don’t Repeat Yourself (DRY) is for production code. Tests should be Descriptive And Meaningful Phrases (DAMP).
  • Treat your servers like cattle, not pets. They should be easily and automatically replaceable.

3 Summary Video

4 Practise

The book emphasises spotting processes that don’t scale. Think about a recurring task in your own team or organisation. 1. Identify the Task: e.g., Onboarding a new team member, requesting a new test environment, or migrating users of an old API. 2. Analyse its Scalability: How much human effort does this task require? If your team or company grew by 10x, would the total effort for this task grow by 10x (linear) or more (superlinear)? 3. Brainstorm a Scalable Solution: How could this task be automated or redesigned to require sublinear human effort? Could an expert team own it? Could a self-service tool be built? This exercise helps apply the “Scale and Growth” principle.

5 Learn More

Back to top