Everything is Distributed

System design is key in determining the software engineer's level. Distributed systems are the present and the future of our discipline.

Everything is Distributed
Photo by Alina Grubnyak / Unsplash

Fifteen years ago, I joined Microsoft as part of the mailbox migration team for Exchange. Today, this job doesn't exist. What happened, and what can it tell us about the present and future of software engineering?

Deceptive Simplicity

Every day, billions of people around the world check their email. Most users don't stop to think about where their email is except "on Gmail" or "my work email." Of course, software engineers know that those emails are stored on servers that are accessed by a remote client with a token tied to verified credentials (username and password). In practical terms, every message, every attachment, every saved contact, is a piece of data occupying physical storage. What happens when a mailbox exceeds the storage capacity of its existing server's available space?

For Exchange, migrating emails was a manual process. When a user mailbox approached the limit of their on-premises server, IT managers would flag the box for migration to a team at Microsoft - the Move Mailbox team, my first job. We had a coded solution to maintain data, clone the mailbox, and make a steady state migration to the new storage location; the email had to be frozen temporarily to make final changes, reroute the delivery to the new target, and then reenabled. In the earliest days of Exchange and managing physical servers, this could take hours. By the time I moved from the Exchange team to Azure data services, it was down to a few minutes through various clever engineering tricks, including live migration with only a temporary disconnect for processing the last delta to the new inbox.

To the end user, this was mostly invisible except for the short downtime of migration, which could be scheduled for overnight hours, rendering it practically seamless. Today, the overwhelming majority of mailboxes are stored in a cloud environment, regularly moving between physical servers with no active engineering input, and is invisible, with no downtime at all. A simple task like copying a database (an email inbox) has many moving parts easy to overlook unless you're actually working on them.

The Education Gap

When I was in undergrad, I took one course on distributed systems. It was an elective, and it was almost entirely theoretical. This is where computer science education was in the early 2000s; Facebook hadn't launched beyond Harvard's campus; Amazon's Simple Storage Service (S3) and Simple Query Service were still in the future, as was Microsoft Azure and Google Cloud. The only major organization doing SaaS was Salesforce(.com), which had launched with the dot-com boom and weathered the storm to IPO in 2004. (Concur, a b2b solution for expense management, went SaaS in 2001, but was founded as a local installation sold at retail.) There were web mail services, but almost no one was building scalable, distributed systems. Neither big tech companies nor startups were looking for the system design discipline.

Today, of course, hundreds of companies (from startups to giants like Adobe, Samsung, and eBay) are running distributed systems hosted on scalable frameworks offered by Amazon, Google, Microsoft, DigitalOcean, or Oracle. Facebook, Salesforce and Netflix are effectively planet-scale systems with their own cloud. But the disciplines for scalable cloud were theoretical just 15 years ago; best practices have evolved since then based on the amazing work by these teams (of which I was fortunate enough to be a member).

What this means is that only now are most undergrads getting distributed systems education beyond theory. Education is naturally always a few years behind practice, as companies are developing, deploying, and iterating solutions that then need to be formalized and put into a curriculum before being taught. By the time students are learning system design, there's a new best practice uniquely implemented - and one they'll have to learn and unlearn over and over again once at a target company.

Every Company is a Tech company; All Products are Distributed Systems

I have about $50 on my Starbucks card. Well, Starbucks calls it a card, but it's really just the app. I refilled it a few weeks ago as my account was running low. There was no cash involved. The app ran a payment process tied to Apple Pay, updated the balance displayed to me, and allowed me to buy a latte.

Starbucks is a coffee company. But one of their largest revenue drivers is a distributed system that no one thinks about. It's tied to product and inventory, point of sale systems, user accounts (over 18 million), payment processing, and promotions. Almost every other fast food and cafe chain has mimicked that approach, along with retail stalwarts like Walmart, Target, and Macy's. Mobile apps are one touch point, websites are another, and for any that support in-store pickup, that's another end point that all supports a shared system.

On the enterprise side, the move toward cloud solutions (SaaS, PaaS, IaaS) means traditional local installations have migrated to the cloud from Microsoft Office to Intuit's accounting and tax suites. Your desktop window is just editing on the backend, validated by your user account. Web browsers and mobile apps can touch the same things.

If you're a front-end developer, your interface hooks are driving to back-ends that are distributed. If you're a back-end developer, you're hosted somewhere. Whether on mobile, desktop, or web, almost every developer today is working on a distributed system. There are only narrow cases like hardware and OS developers that run locally (though in the case of the OS, even then there are hooks for things like network time synchronization).

The Present and Future of Software Engineering

We live in a world of permanently connected systems now - for work, for personal, for enterprise - and the demand on software engineers is to develop across multiple platforms, secure user accounts, and connect various components based on different authentications in one application. The field is more multi-disciplinary now than it has ever been. In this multifaceted and evolving landscape, see how Educative equips engineers to meet these challenges head-on. Read Educative reviews to discover firsthand accounts of software professionals mastering multi-platform development, cybersecurity, preparing for interviews (like Blind 75 problems), and more through our comprehensive, hands-on courses.

With the rise of the Internet of Things, the amount of data passing through our networks is only going to grow. Data scientists will be sifting through stores in data lakes in the cloud, surfacing information in warm storage, and trying to make sense of it with machine learning algorithms that - you guessed it - live in the cloud. To navigate and optimize these processes, it's crucial to learn machine learning techniques and applications.

System design is now a key component in the software engineer's level determination. The farther along a career you progress, the more you'll need to be able to develop, leverage, and implement APIs to connect various services within one application. This is the present and the future of our discipline.

Subscribe to Fahim.dev - Thoughts on Learning and Growth in Software Development

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.