Important topics to focus to ace your next system design interview

Avatar of Rakesh Mothukuri

Rakesh Mothukuri

| Reading Time : 13 minutes

Avatar of Rakesh Mothukuri Avatar of Rakesh Mothukuri Avatar of Rakesh Mothukuri Avatar of Rakesh Mothukuri

Topics and tools for acing your system design interview

imgPhoto by Agnieszka Boeske on Unsplash

If you were following me from the previous blog where I have explained about the mindset and preparation method before appearing your next system design interview, you are in the right place as promised here in this following blogs.

I will list out important topics, provide a brief overview and additional reading material of fundamentals to improve your system design, problem-solving. Although it’s a big list, by no means it’s an exhaustive list. So what I’m going to do is go through them one by one and give a one-line explanation on each of the concepts. Obviously, I cannot go into too much detail because, frankly speaking, each of them is also can make up a story on its own(maybe I will write up a blog on each if you would like to hear more about these in detail). So let’s start

Vertical vs horizontal scale

So if you need to scale up your system, either you can do what you call scaling, which means that you add more, more memory CPU and hard drive to an existing host, or you do a horizontal scaling, which is to keep one small, but instead add another to it.

So when you go, vertical scaling it can be expensive and also there is a limitation of how much memory and CPU you can add to a single host, but it does not have distributed systems problem. On the other hand, horizontal scaling again infinitely keep adding more host, but you have to deal with all the distributed system challenges. So obviously horizontal scaling is more preferred than vertical scaling.

CAP theorem

CAP stands for consistency, availability, partition tolerance.

  1. Consistency says that your read has the most recent write
  2. Availability says that you will get a response back but It might or might not be the most recent write.
  3. Partition tolerance is saying is that between two nodes you could be dropping network packets.

So what CAP theorem says that is you can only achieve two out of these three properties and partition tolerance is inevitable because you drop network packets. So basically you’re choosing between consistency or availability.

There are traditional relational databases and they choose consistency over availability, which means that they could be less available. But their data is always consistent. On the other hand non-relational databases, they prefer availability or consistency.


ACID stands for Atomicity, Consistency, Isolation, Durability and BASE stands for Basically Available, Soft state, Eventual consistency. ACID is used more in terms of relational databases where BASE is used more for those NoSQL databases and you need to know the differences because once you start using you need to understand which part of acid properties you are willing to sacrifice

Partitioning/Sharding Data

Let’s suppose you have trillions of records and there is no way you can store the trillions of records in one node of a database you need to store them in many different nodes of databases and that’s where sharding comes into the picture answering question such as How do you are split this data such that every node of a database is responsible for some part of some of the records of those trillions of records and one technique used heavily is consistent caching and you definitely need to know how consistent caching works. What are some of the advantages of it and which consistence hacking offers

Optimistic vs Pessimistic locks

Let’s suppose you’re doing a database transaction and in an optimistic way and you do not require any locks. But when you are ready to commit your transaction at that point you noticed to see if no other transaction updated the record which you are working on. On the other hand, in Pessimistic locking, you acquire all the locks beforehand and then you commit your transaction. Both of them have their advantages and disadvantages and you do need to understand when to use for each type of scenario

Strong vs Eventual Consistency

Strong consistency means that your reads will always see the latest write while eventual consistency means that your readers will see some write and eventually you’ll see the latest write. Strong consistency is obviously used in relational databases and No SQL databases you have to decide if you want strong versus eventual consistency and the benefit eventual consistency provides is that it gives higher availability and this all goes back to the CAP theorem.

Relational DB vs No SQL DB

These days I see that most of the people prefer to use NoSQL and that’s fine but do not discard relational database just yet. Remember, relational database provides all these good ACID properties, while NoSQL databased scaled a little bit better and has higher availability. So depending on the situation and depending on the problem, try to see which one of the two fits better.

Types of NoSQL databases.

  1. Key-Value: These are the simplest kind where you have a key and you have value and it stores this key-value pair into the database.
  2. Wide — Column database: Wide-column stores such as Bigtable and Apache Cassandra are not column stores in the original sense of the term, since their two-level structures do not use a columnar data layout
  3. Document-based database: In this kind. If you have a semi-structured data or if you have an Excel or JSON data, and if you want to put that into the database, then you would use document-based database.
  4. Graph database is a database designed to treat the relationships between data as equally important to the data itself. It is intended to hold data without constricting it to a pre-defined model. Instead, the data is stored like we first draw it out — showing how each individual entity connects with or is related to others.


Caching is used to speed up your request. If you know that some data is going to be accessed more frequently then it is stored in the cache so that it can be accessed quickly.

Caching is of two types.

  1. If every node does its own caching. So the cached is not shared between nodes.
  2. Spited cache, where the cache today is shared between different nodes.

If you’re doing caching, you have to consider a few things. First, a cache cannot be the source of truth. Second, cache that has to be pretty small because cache tends to keep all the data in memory until you have to consider some of the eviction policies around the cache.

Http vs Http2 vs Websockets

HTTP is a request and response type architecture between client and server, pretty much entire web runs on HTTP. Where HTTP2 is an improvement on HTTP where it does multiple requests over a single connection and we have WebSockets which is a fully bidirectional connection between client and server so it would be good to know some of their differences between them and some of their inner workings

TCP/IP Model

In the TCP/IP model and there are four layers in this model and it’s good to know what each layer does.

IP4 vs IP6

If you know IP before has 32-bit addresses and IP6 which has 128-bit addresses, we are running out of IP before addresses. So the word is migrating towards IP6 and it’s good to know some of the details around that. And also how does the IP routing works.


TCP is connection-oriented, reliable connection while UDP is unreliable connection. So if you’re doing streaming of a video, then you are better off using UDP because even though it is unreliable but it is super fast. On the other hand, if you’re sending some documents, then you’re better off using TCP


DNS lookup, domain name server lookup. So if you type in, in a browser, then DNS request goes to the DNS which does a translation of this address into an IP address. So it’s good to know how that how those things work. What is a hierarchy around them, how do they do caching and things like that


Transport layer security. it is used to secure communication between client and server, both in terms of privacy and data integrity. when used with HTTP, it pretty much HTTPS

Symmetric and asymmetric encryption

Asymmetric encryption is computationally more expensive so it should be used to send a small amount of data preferably asymmetric key. So an example of asymmetric encryption is public-private key encryption and one example of symmetric encryption is AES


Load balancers sit in the front of service and delegate the client request to one of the nodes behind the service. This delegation could be based on Round-Robin basis or the load average on the nodes behind that service. Load balancers can operate on L4 or L7 and these are the levels for the OSI model.

So for L4 load balancer considers both client and destination IP addresses and port numbers to do the routing while at L7, which is an HTTP level uses HTTP URI to do that routing and most of the node balances operate at L7.

CDN & Edge

Let’s suppose you are watching Netflix from London. So what Netflix will do is put does the movies and series in a content delivery network close to you in London so when you’re streaming movie, the movie can be streamed right there from the CDN close to you instead of all the way from the data centre and this helps both in performance and latency for the end-user and Edge is also a very similar concept where you do processing close to the end-user another advantage with EDGE is that it has a dedicated network from the edge to all the way to the data centre so your request could be routed to this dedicated network instead of going over the internet.

Bloom filters and Count-min sketch

Bloom filters and count-min sketch are space-efficient, probabilistic based data structure, Bloom filter is used to decide if an element is a member of set or not. Bloom filters can have false positives, but you will never have a false negative. So if your design can tolerate false positive you should consider using a bloom filter because it’s very space-efficient.

Count-min sketch is a similar data structure, but it is used to count the frequency of events. Let’s suppose you have millions of events and you want to keep the track of talking events then you can consider using counterman sketch instead of keeping the count of all the events so far a fraction of space. it will give you an answer which will be close enough to the actual answer with some error rate.


Paxos is used to derive consensus over the distributed host. Before Paxos came along finding consensus was a very hard problem an example of consensus is doing an election among a distributed host it’s good to know what are some of the use cases which Paxos solves

Design Patterns and Object-oriented design

For design pattern, it’s worth knowing things like factory methods and singletons and for object-oriented design, things like abstractions and inheritance are some of the things you should be knowing

Virtual Machines & Containers

Virtual Machines are a way of giving you an operating system on top of a shared resource such that you feel like you are the exclusive owner of this hardware, while in reality that hardware is shared between different isolated operating systems. While containers are way off running your applications and its dependencies and isolated environments, containers have become extremely important and they run a line in the production environment these days.

Publisher — Subscriber / Queue

You have a publisher publishes a message to a Queue and a subscriber receives that message from that Queue and this pattern has become extremely important in the system design these days and you should definitely use them whenever you have an opportunity.

These below are some of the tools/techniques which are useful not just for the system design interview, but also in real life. If you going to work on a high skill system obviously this is a very small list and there are many, many other tools out there, but in the interest of length of the article I have kept it restricted to this small number,


Cassandra is a wide column, highly scalable database, and it’s used for different use cases like simple key-value storage or for storing time-series data or just your more traditional roles. With many columns, Cassandra can provide both eventual and strong consistency in the whole Cassandra uses consistent hashing to shard your data and also use gossiping to keep all the nodes informed about the cluster.


If you have JSOM like structure and if you want to persist that then Mongo DB would work perfectly fine. They provide ACID properties, at a document level and they also scale pretty well

If you have a more traditional use case with many tables and relationships within these tables and if you want a full set of acid properties then I would go ahead and use MySQL database and MySQL database also has master/slave architecture, so is also scales pretty well.

Memcache and REDIS

Memcached and REDID are distributed cache and they hold the data in memory. Memcached is a simple fast key-value storage release can also do key-value storage. REDIS can also be set up as a cluster so it can collect things like more availability and data replication. REDIS can also flush data on the hard drive if you configure it to do so. The main point to remember when using distributed cache is first is that there should never be the source of truth and they can only hold a limited amount of data, which is limited by the amount of memory on the host


This is a centralized configuration management tool. It is also used for things like leader election and distributed locking. zookeeper scales very well for the reads but does not scale very well for their writes. Also, since the zookeeper keeps all data in memory so you cant store away too much data in that zookeeper. If you want to store a small amount of data which would be highly available and which has tons of reads, then zookeeper is what you should be using.


Kafka is a fault-tolerant, highly available Queue using publisher, subscriber or streaming applications, depending on your use case, it can deliver a message exactly once. and also it keeps all the message ordered inside of partition after topic.

NGINX and HAProxy

NGNIX and HAproxy are load balancers and are very efficient, for example, and NGNIX can manage thousands or even tens of thousands of connections from a client from a single instance.

Solar and Elastic Search

Solar and Elastic Search, both of them are search platform on top of Lucine. Both of them are highly available, very scalable and fault-tolerant search platform and they do provide things like full-text search,


Docker is a software platform for containers inside which you can develop and run your distributed applications. These containers can run on your laptop, on the data centre or even on the cloud. Kubernetes and Mesos are the software tools used to manage and coordinate this container’s

This is it. This is my introduction to system design and topic need to prepare before attending system design interviews. I know I went through the concepts on a high level, but my intention was not to give you too many details, but instead introduce them so that you can read them in your own time. So I’m going to put all additional reading materials here below in bottom for you read in order

Reading reference links: