Meltdown and Spectre will wreak havoc for years to come
Last week, in between presidential tweets about nuclear buttons and record cold across the country, the tech industry quietly absorbed what could become the biggest computing catastrophe since the Pentium FDIV bug of 1994, when Intel released its embargo on a flaw in its chips that is requiring fundamental changes to operating systems everywhere.
Operating system vendors, including Red Hat, Oracle, Microsoft, and Apple all were forced to quickly release patches that cover up the chip flaw. They did a nice job working so quickly to push the updates out not only to enterprises and consumers, but also through the rapidly growing cloud providers including AWS, Azure, Google, and many others.
By some counts there are 10 million servers in the public cloud, and over 2 billion PCs of various type, most Intel-based, in the world. The last few days the primary concern has been plugging the security hole caused by the “Meltdown” bug, before hackers can take advantage of the now publicly disclosed vulnerability. The Spectre flaw is more complex than Meltdown and likely a design flaw rather than a bug. Most processors, including ARM and AMD chips, suffer from the flaw.
Spectre is likely a result of the race amongst CPU chip manufacturers to improve performance with various types of optimizations, sometimes at the expense of security.
What are Meltdown and Spectre – in layman’s terms
The two vulnerabilities are flaws in the way that the chip manages memory. The kernel is the privileged part of the operating system that manages the multiple programs running on a computer as well as communication with all external devices (network, disk, peripherals, etc.). Applications typically run in what is called “user space,” an area of the computer where memory is dedicated to a particular application. When an application has to communicate with a device, it uses a special protected memory area called “kernel space.” Due to these flaws, a malevolent application could possibly peek into kernel space and see things like passwords and encryption keys, or peek into another application and see data not intended for it.
One of several concerns is the significance of the impact of Meltdown and Spectre on cloud computing. Public cloud computing relies on multi-tenancy at several levels. IaaS providers share hardware by using virtualization and containerization to provide customers with separation from other customers even when sharing the underlying hardware. However, there are scenarios where exploits will allow an application to look across the virtualization boundary and see data stored in another application – even when they are owned by separate tenants. The patches, if thoroughly applied, could protect against these types of exploits (certainly for Meltdown if not for Spectre).
But there are at least two side effects we will feel for years. First, hackers will be encouraged to find other similar vulnerabilities – there is no guarantee that Meltdown and Spectre are the only problems of their type. Second, despite the tremendous inertia in the CIO and CISO communities to accept “IaaS, PaaS, and SaaS are just-as-secure as my data center” 21st century approaches to computing, they might start backing off. We might see a reluctance to leverage the various forms of multi-tenancy due to these types of vulnerabilities.
There is a lot of FUD around the exact magnitude of the performance slowdown expected from the operating system patches put out by the OS vendors (Red Hat, Microsoft, Apple, etc.). Ostensibly, most operations that require communication between user space and kernel space can be significantly slowed down – which could include almost all I/O (both disk and network). Estimates suggest that many applications can be slowed down up to 20-25%. Imagine that – up to 25% of the world’s CPU resources gone overnight. Epic Games published the chart below after they applied the patches – and notified their users to expect downtime and timeouts as a result.
Finally, as with most low-level fixes, there are concerns that these hurried patches will expose other bugs that up to now have been hidden. Whenever you change something low-level in the operating system, everything is affected. While it is unlikely that these patches create new bugs, they might expose bugs in various software (device drivers, infrastructure, etc.) that were hidden by the way memory was user and kernel space was being managed before this fix.
Stay tuned for Part II – a deeper dive into the architectural issues causing these chip flaws.
[This article was contributed by Jeffrey Vogel, Managing Director]