What I Learned from “Data Privacy: A Runbook for Engineers”
Before reading this book, I believed that data privacy was a matter of securing the data collected from users. I thought that security best practices were all it took to guarantee data privacy. I was wrong.
Privacy changes the way we think about data. It’s not really about how we can protect the data that is collected — it forces us to think about what kind of data we really need to collect, for how long, and how to anonymize it. Security asks “how do we protect this data?” Privacy asks whether we should have it at all.
The concept that shifted my thinking the most was data classification. Without it, there’s no way to understand the data collected across all the different services inside a company — how it’s collected, why, and what decisions should be made about it. Classification, especially with machine-readable tags based on privacy level, makes those conversations possible. It’s also the easiest place to start.
Take a hospital system. It collects identity data (name, CPF, contact), sensitive health data (diagnoses, prescriptions), payment data, demographic data, transaction history, and authentication logs. Each of those categories carries a different level of risk and a different legal obligation. But without classification, they all end up treated the same — stored in the same way, retained for the same amount of time, accessible to the same teams. The question “do we really need to keep the full demographic profile of a patient who visited once three years ago?” never gets asked, because nobody has a map of what exists or why. Data classification is what creates that map. When an engineer tags a field as sensitive/health, it triggers a set of questions: who can access this? how long do we keep it? can we anonymize it after the appointment closes? The tag doesn’t answer those questions — but it makes sure they get asked.
What Else the Book Taught Me
Data Inventory is the natural next step after classification. Once you have tags, you need a place to aggregate them — a catalog that shows, across all services, what data exists, where it lives, and who owns it. In a healthcare context, this matters enormously: the same patient identifier might appear in the billing system, the scheduling service, the lab results API, and the authentication logs. Without an inventory, you can’t answer a basic LGPD or GDPR question like “what data do we hold about this person?” With one, you can. The inventory doesn’t have to be perfect to be useful — even a partial map is better than no map at all.
Data Protection Impact Assessments (DPIAs) — or Privacy Impact Assessments (PIAs) — are the book’s answer to the question of when to slow down and think. A DPIA is a structured review you run before launching a new feature or system that processes sensitive data. It forces a team to document what data they’re collecting, why, what the risks are, and what mitigations are in place. In practice, this often feels like friction — another form to fill out before shipping. But the book reframes it well: a DPIA is not a compliance checkbox, it’s a design tool. Running one early enough actually changes what gets built. For engineers working in healthcare, where almost everything touches sensitive data, having a lightweight version of this process built into the development workflow is one of the most practical things the book recommends.
The Privacy Maturity Model gave me a way to locate where a company actually stands. The model describes a progression from reactive (“we handle privacy issues when they become legal problems”) to proactive (“privacy is designed in from the start”). Most companies, in my experience, sit somewhere in the middle — they have some policies, maybe a DPO, but privacy is still largely an afterthought at the engineering level. The book doesn’t shame that position; it treats it as a starting point. Progress looks like moving from ad-hoc decisions to documented processes, from “we think we’re compliant” to “we can demonstrate it.” That shift doesn’t happen all at once, and it doesn’t require a perfect inventory or a mature DPIA process before you start. It starts with classification.
Who Should Read This
This book is for engineers who work with user data and have started wondering if there’s more to privacy than encryption and access controls. You don’t need a legal background — the book is written by engineers, for engineers, and it stays practical throughout. It’s especially relevant for those working in regulated industries like healthcare or finance, where privacy obligations are real and the cost of getting it wrong is high. Senior engineers, tech leads, and architects will get the most out of it, but any developer who has ever designed a database schema or built an API that handles personal data will find something actionable here.
Thanks to Yaso for sparking my interest in this subject.