Absolutely awesome article.
As far as I can tell, what happens at these companies is that they started by concentrating almost totally on product growth. That’s completely and totally reasonable, because companies are worth approximately zero when they’re founded; they don’t bother with things that protect them from losses, like good ops practices or actually having security, because there’s nothing to lose (well, except for user data when the inevetible security breach happens, and if you talk to security folks at unicorns you’ll know that these happen).
The result is a culture where people are hyper-focused on growth and ignore risk. That culture tends to stick even after company has grown to be worth well over a billion dollars, and the companies have something to lose. Anyone who comes into one of these companies from Google, Amazon, or another place with solid ops practices is shocked. Often, they try to fix things, and then leave when they can’t make a dent.
…
Google didn’t go from adding z to the end of names to having the world’s best security because someone gave a rousing speech or wrote a convincing essay. They did it after getting embarrassed a few times, which gave people who wanted to do things “right” the leverage to fix fundamental process issues. It’s the same story at almost every company I know of that has good practices. Microsoft was a joke in the security world for years, until multiple disastrously bad exploits forced them to get serious about security. Which makes it sound simple: but if you talk to people who were there at the time, the change was brutal. Despite a mandate from the top, there was vicious political pushback from people whose position was that the company got to where it was in 2003 without wasting time on practices like security. Why change what’s worked?
…
The data are clear that humans are really bad at taking the time to do things that are well understood to incontrovertibly reduce the risk of rare but catastrophic events. We will rationalize that taking shortcuts is the right, reasonable thing to do. There’s a term for this: the normalization of deviance. It’s well studied in a number of other contexts including healthcare, aviation, mechanical engineering, aerospace engineering, and civil engineering, but we don’t see it discussed in the context of software. In fact, I’ve never seen the term used in the context of software.
More often, the natural inertia about doing anything new or old is exacerbated by the usual presence of the requirement to spend money to deal with such risks. It is easy to say that such risk is sufficiently low order that remediation is not worthwhile, but there is seldom if ever an honest evaluation of the costs of the identified risk and how those costs relate to the odds of the risked event and the costs of the remediation. The inertia also applies to such honest evaluation. Ostriches are everywhere.
Many years ago, I had a computer installation where dirty power was causing the lost of on average one $20,000 disk per month. We needed a $15,000 power filter to permanently solve the problem, but the money had to come out of a different budget line item than the disks, so I was refused the filters. Going on the principle that it is easier to be forgiven than it is to get permission, I went ahead and ordered the filters. By the time the budget guys had figured out their problem, we had already saved over $100,000, this in 1984. I was called up before the board and had to promise never to do any such thing again, but the next time they needed a problem solved, they called on me anyway.