Short URL — System Design
This is one of the most frequently asked or the basic system design question that we all have come over. There are lot of examples and design diagrams stacked over google search. Either there are too many details or it is too precise.
Let’s see the system design of Short Url in detail.
What is a Short URL?
Short URL is like alias of a long URL. Say if a long URL has to be shared via SMS or via phone call then it becomes tricky to share long URL’s. In these cases we can use some short URL as alias.
eg) https://www.example.com/home/nav/send/money?sender=Jonh&receiver=Kim&t=asdfdfadfafdadfadfdf
The above URL when shortened gets mapped to
https://www.shorturl.com/dfsghij
The value dfsghij is like hash code or unique code which will retrieve the long URL.
Requirements for Short URL
a. Functional Requirements
- Short URL application can be hosted as a REST service. It enables other developers and application to use service directly.
- Short URL should have 2 operations
URL shortener — Given a long url this operation should return short url
URL expand — Given a short url this operation will return corresponding long url
3. Short URL should give option to users to have their own custom url.
4. There should be some expiry time for the URL. Users should also be able to set the expiration date (not beyond a specific threshold).
5. Authentication — This can be kept optional. Since this is going to be open service any one should be able to hit our service with long URL.
6. RateLimiting — There are chances that hackers may spoof our system with too many requests (DDOS attack). There should be some mechanism to avoid this.
7. For duplicate long URL the short url service should return the same short URL for that user. If different user submits the same URL in that case both the short URL’s should be unique.
I guess we have listed down all the requirements to build the short URL. Now let’s move on to technical aspect of short URL.
b. Non Functional Requirements
8. Minimum latency/response time — Service should respond quickly (short & expand).
9. High availability — If in case this service is down all the operations will start failing and will result in huge impact for the users.
10. Data eviction — Evict or archive old data
11. Short URL’s should not follow any pattern basically short URL should not be guessed by hackers.
Capacity Planning
Data modelling and system components design mainly depends on the usage of the service. So we will first figure out the usage details first before diving in deeper on data modelling.
Assume that we have 1 Million requests per day. It drills down to 30 Million requests per month.
a. Data capacity planning
Let’s say we store the data for 3 years.
Below is the calculation for how much data we will store in 3 years.
Apart from the above we will have user data too.
b. Choosing Database
As mentioned earlier the write and access should be very fast.
Also for this use case there will be lot of reads over writes.
Option 1: Using RDBMS
Advantages:
- Provides ACID properties
- Supports faster access based on indexing
- High availability
Disadvantages:
- RDBMS is not scalable for the larger number of data for this use case
- Indexing based on hash (short url) makes read & write slower for larger amount of data
If we are choosing RDBMS then we have to DB partitioning and use extra layer for caching. We can consider using Redis distributed cache or LRU cache for this purpose.
Option 2: Using NOSQL
Advantages:
- Provides eventual consistency
- Easy to scale
- High availability
We can consider using any Key — Value data store like Dynamo DB or Oracle NOSQL DB
System Design
a. High Level Design
Our initial design of the system will look like below. Let’s see what can we tune here based on the functional and non functional requirements.
b. Low Level Design
URL Short Service
Let’s now bring back the requirements that we listed in first place.
Requirement 1: Short URL should be hosted as a REST service
POST — https://api.shorturl.com/v1/short
GET — https://api.shorturl.com/v1/expand/dfsghij
Requirement 2: Authentication
If authentication is needed we can use OAuth 2. But mostly short url will be open service without any authentication.
Requirement 3: RateLimiting
We need to prevent our service from DDOS attack. We need to introduce rate limiting based on Client IP address.
There are chances that the hacker may be from distributed system and invokes service from different IP address. In those cases observing traffic trend change could help in early prevention.
Once DDOS attack is identified add the IP address to a play book which will help in taking decision earlier when the attack repeats.
Requirement 4: User based URL generator
We cannot do search in database based on Long URL. So one way to find the duplicate long url is to create short url in such a way that it expands to long url with some hashing technique.
Let’s say we are using MD5 hash technique to calculate hash for the long url. MD5 hash generates 128-bit long output so out of 128 bit we will take 43 bit to generate a short URL of 10 characters.
MD5 can create a lot of collisions. For two or many different long URL inputs we may get the same unique id for short URL and that could cause data corruption. So we need to perform some checks to ensure that this unique id doesn’t exist in the database already.
In order to do this we can append the User ID to the end of the long URL before calculating hash. This will make sure every user has unique short url.
Let’s now look at Non Functional Requirements
Requirement 5: Minimum Latency/Response time
To achieve this we can introduce a distributed cache which will have faster access time.
We should decide on how much data we need to store in cache and cache eviction policy and choose cache storage accordingly.
Requirement 6: High availability
Below are the failure points
- Application server
- Database
We can add multiple application server against a load balancer and monitor the heart beat of the application. If server is down we should replace the server with a new server.
For database we can introduce a back up/replica DB. There are multiple replication technique which can be chose based on the use case.
After considering all the requirements now our system looks like below
Below are some questions that may come out of the design
- What happens when there are concurrent requests → This is handled using hashing. During insert we should make sure we don’t have duplicate keys.
- How to get metrics on location, users etc.. → We should get this clarified at initial requirement phase. Based on the input we can add columns to the User table.
- What happens if the connection to DB is down → Our application should fall back to the Replica DB in case of original DB failure
- How writes are handled → Writes to DB will happen synchronously. For replicas the writes may happen async based on the quorum configuration.
Thanks for reading!!!