Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL shortener system - Abby Aalinia #104

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
8 changes: 8 additions & 0 deletions assumptions.md
Original file line number Diff line number Diff line change
@@ -1 +1,9 @@
# Assumptions
- In this case I just considered to design web API without any web UI
- In this phase, I decided not to consider authentications
- There can be more than one request to shorten a long URL in different times. Because of saving time, I decided to allow duplication based on long URL in storage instead of searching in DB.
- Analytics for short links is not considered.
- It's not possible for user to choose dynamic part of a short URL
- Links after expiration will be removed from DB. In this case there should be a service or job to do the task in every specific time periods for instance once a day.
- Only 0-9, a-z, A-Z characters can be used in shorten URL due to better readability and writability(in case someone has to type the link in)To handle number of short URLs which system can produce, we use 7-length string for hash-key(auto-generated part of short link) with 62 possible choices for each slot in hash-key (0-9,a-z,A-Z = 62). With doing some calculation we can see there are 6 trillion possibilities for hash-key which means that there is more than 160 million possibilities per day for 100 years.
- There will be authomatic tests,CI,CD,... but won't be discussed here.
10 changes: 10 additions & 0 deletions questions.md
Original file line number Diff line number Diff line change
@@ -1 +1,11 @@
# Questions
- Is it a full web application with UI (like goo.gl or tinyUrl) or just Api is enough?
- Does it need authentication(user account, api key, ...)?
- Do you need analytics? (for short links as well as actual urls)
- Can users choose their short links too or all will be auto-generated? (ee.g.: www.trav.ix/abby)
- Can users modify or delete their links?
- Are there any software/hardware limitations?
- What is strategy to dealing with outdated links(including on-shot links after usage) or even very old links? do we need to keep them?
- What is the strategy to dealing with duplicate long urls? (e.g.: if n requests came to shorten "https://www.travix.com/contact" in different times)
- Is there any standard for designing API in travix?
- What about the Security concerns? (e.g.: DDos attacks , or in case someone try to abuse service)
9 changes: 9 additions & 0 deletions scale.md
Original file line number Diff line number Diff line change
@@ -1 +1,10 @@
# Scale
To cover scalability concerns I decided to consider various points to approaching best result:
- Multiple instances(nodes) of workers are used. On the other hand, it's possible to implement the services using "container", so it will be easier to add a new node to handle more traffics(horizontal scaling). obviously by using multiple servers/nodes,a load balancer is needed to redirect requests to nodes.
- Microservice architecture for scalability of application in future is a good option.
- using queuing system like RabbitMQ for writing data to database that makes request faster is good choice. In this case the "write to db" action which leads to bottlenecks in case of heavy traffic, is taken care.
- sharding database also will help and decrease the reading time from database, specially when the number of records growth in years to millions and billions records.
- using cache will decrease number of readings from database and makes requests faster
- my approach to generate new random hash-key(dynamic part of short link) is fast(because of unique number that is used) and does not generate duplicates, so we don’t need to check if generated key exists. that means faster request as well.
- using CDN also helps in this matter and decrease response time (everyone get response from closest server)
- also using cloud services like Amazon AWS or Google Cloud or MS Asure gives a lot of built-in tools and options to cover scalability concern. for example using amazon elastic beanstalk can make service/application Auto-scale which means it will scale according to number of requests.
20 changes: 20 additions & 0 deletions solution.md
Original file line number Diff line number Diff line change
@@ -1 +1,21 @@
# Solution.md
- My design includes these components : core service which is REST Api, Load balancer, Range keeper, Cache, Queuing system and Database.
- core services include 2 APIs:
url shortener service : it gets actual URL and expirationTime(optional) and returns Short URL.
For generating hash-keys, we fetch "counter" and convert a unique number represented by counter value to its expression in base64. and then we add 1 to counter for next hash-key. By this method we solve the collision problem.
If we store "counter" in one server then "counter host" will be both "single point of failure" and "single point of bottleneck". because of that we divide whole numbers between 0 to 6 trillion into ranges(e.g.: 0 to 1,000,000 and 1,000,001 to 2,000,000 and so on) and put the ranges in some High availability and distributed service like ZooKeeper. so the workers just go to zookeeper and get a range that is not flagged as "used" and do generating their hash-keys.

URL resolver service: it gets short URL and return a related response or redirect to target url.
gets the hash-key(short URL) and fetch the actual URL. then
1- if the link does not exists or is invalid(expired) return the proper response code witjout redirecting
2- if link exists and is valid and also is a "get" then redirect to that(if is on-shot then set "expiratinDateTime")
3- if link exists and is valid and is a "post" then it can forward the request to target page and get the response and then return it to client.
- because there is no relation in model, it is better to choose No-SQL database for this situation(faster fetching). and the model could be like this:
{
Hashkey, --for saving some storage we save just hash-key and drop "trav.ix/"
ActualUrl,
IsOneShot,
ExpirationDateTime,
WebVerb, -- get or post
}
- "need to be quickly accessible across the world" for this matter we use some geographical distribution like CDN(content delivery network)
Binary file added system context diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed system-context-diagram.png
Binary file not shown.