Short URL — System Design

Ruthreshwar Gajendiran
6 min readJun 6, 2021

--

This is one of the most frequently asked or the basic system design question that we all have come over. There are lot of examples and design diagrams stacked over google search. Either there are too many details or it is too precise.

Let’s see the system design of Short Url in detail.

What is a Short URL?

Short URL is like alias of a long URL. Say if a long URL has to be shared via SMS or via phone call then it becomes tricky to share long URL’s. In these cases we can use some short URL as alias.

eg) https://www.example.com/home/nav/send/money?sender=Jonh&receiver=Kim&t=asdfdfadfafdadfadfdf

The above URL when shortened gets mapped to

https://www.shorturl.com/dfsghij

The value dfsghij is like hash code or unique code which will retrieve the long URL.

Requirements for Short URL

a. Functional Requirements

  1. Short URL application can be hosted as a REST service. It enables other developers and application to use service directly.
  2. Short URL should have 2 operations

URL shortener — Given a long url this operation should return short url

URL expand — Given a short url this operation will return corresponding long url

3. Short URL should give option to users to have their own custom url.

4. There should be some expiry time for the URL. Users should also be able to set the expiration date (not beyond a specific threshold).

5. Authentication — This can be kept optional. Since this is going to be open service any one should be able to hit our service with long URL.

6. RateLimiting — There are chances that hackers may spoof our system with too many requests (DDOS attack). There should be some mechanism to avoid this.

7. For duplicate long URL the short url service should return the same short URL for that user. If different user submits the same URL in that case both the short URL’s should be unique.

I guess we have listed down all the requirements to build the short URL. Now let’s move on to technical aspect of short URL.

b. Non Functional Requirements

8. Minimum latency/response time — Service should respond quickly (short & expand).

9. High availability — If in case this service is down all the operations will start failing and will result in huge impact for the users.

10. Data eviction — Evict or archive old data

11. Short URL’s should not follow any pattern basically short URL should not be guessed by hackers.

Capacity Planning

Data modelling and system components design mainly depends on the usage of the service. So we will first figure out the usage details first before diving in deeper on data modelling.

Assume that we have 1 Million requests per day. It drills down to 30 Million requests per month.

a. Data capacity planning

Let’s say we store the data for 3 years.

Below is the calculation for how much data we will store in 3 years.

Data Modelling
Data calculation for 3 years

Apart from the above we will have user data too.

b. Choosing Database

As mentioned earlier the write and access should be very fast.

Also for this use case there will be lot of reads over writes.

Option 1: Using RDBMS

Advantages:

  1. Provides ACID properties
  2. Supports faster access based on indexing
  3. High availability

Disadvantages:

  1. RDBMS is not scalable for the larger number of data for this use case
  2. Indexing based on hash (short url) makes read & write slower for larger amount of data

If we are choosing RDBMS then we have to DB partitioning and use extra layer for caching. We can consider using Redis distributed cache or LRU cache for this purpose.

Option 2: Using NOSQL

Advantages:

  1. Provides eventual consistency
  2. Easy to scale
  3. High availability

We can consider using any Key — Value data store like Dynamo DB or Oracle NOSQL DB

System Design

a. High Level Design

Our initial design of the system will look like below. Let’s see what can we tune here based on the functional and non functional requirements.

High Level Design

b. Low Level Design

URL Short Service

Let’s now bring back the requirements that we listed in first place.

Requirement 1: Short URL should be hosted as a REST service

POST — https://api.shorturl.com/v1/short

GET — https://api.shorturl.com/v1/expand/dfsghij

Requirement 2: Authentication

If authentication is needed we can use OAuth 2. But mostly short url will be open service without any authentication.

Requirement 3: RateLimiting

We need to prevent our service from DDOS attack. We need to introduce rate limiting based on Client IP address.

There are chances that the hacker may be from distributed system and invokes service from different IP address. In those cases observing traffic trend change could help in early prevention.

Once DDOS attack is identified add the IP address to a play book which will help in taking decision earlier when the attack repeats.

Requirement 4: User based URL generator

We cannot do search in database based on Long URL. So one way to find the duplicate long url is to create short url in such a way that it expands to long url with some hashing technique.

Let’s say we are using MD5 hash technique to calculate hash for the long url. MD5 hash generates 128-bit long output so out of 128 bit we will take 43 bit to generate a short URL of 10 characters.

MD5 can create a lot of collisions. For two or many different long URL inputs we may get the same unique id for short URL and that could cause data corruption. So we need to perform some checks to ensure that this unique id doesn’t exist in the database already.

In order to do this we can append the User ID to the end of the long URL before calculating hash. This will make sure every user has unique short url.

Let’s now look at Non Functional Requirements

Requirement 5: Minimum Latency/Response time

To achieve this we can introduce a distributed cache which will have faster access time.

We should decide on how much data we need to store in cache and cache eviction policy and choose cache storage accordingly.

Requirement 6: High availability

Below are the failure points

  1. Application server
  2. Database

We can add multiple application server against a load balancer and monitor the heart beat of the application. If server is down we should replace the server with a new server.

For database we can introduce a back up/replica DB. There are multiple replication technique which can be chose based on the use case.

After considering all the requirements now our system looks like below

Final System Design

Below are some questions that may come out of the design

  1. What happens when there are concurrent requests → This is handled using hashing. During insert we should make sure we don’t have duplicate keys.
  2. How to get metrics on location, users etc.. → We should get this clarified at initial requirement phase. Based on the input we can add columns to the User table.
  3. What happens if the connection to DB is down → Our application should fall back to the Replica DB in case of original DB failure
  4. How writes are handled → Writes to DB will happen synchronously. For replicas the writes may happen async based on the quorum configuration.

Thanks for reading!!!

--

--