diff --git a/.DS_Store b/.DS_Store new file mode 100644 index 0000000..5008ddf Binary files /dev/null and b/.DS_Store differ diff --git a/assumptions.md b/assumptions.md index c47b198..67deddf 100644 --- a/assumptions.md +++ b/assumptions.md @@ -1 +1,9 @@ # Assumptions +- In this case I just considered to design web API without any web UI +- In this phase, I decided not to consider authentications +- There can be more than one request to shorten a long URL in different times. Because of saving time, I decided to allow duplication based on long URL in storage instead of searching in DB. +- Analytics for short links is not considered. +- It's not possible for user to choose dynamic part of a short URL +- Links after expiration will be removed from DB. In this case there should be a service or job to do the task in every specific time periods for instance once a day. +- Only 0-9, a-z, A-Z characters can be used in shorten URL due to better readability and writability(in case someone has to type the link in)To handle number of short URLs which system can produce, we use 7-length string for hash-key(auto-generated part of short link) with 62 possible choices for each slot in hash-key (0-9,a-z,A-Z = 62). With doing some calculation we can see there are 6 trillion possibilities for hash-key which means that there is more than 160 million possibilities per day for 100 years. +- There will be authomatic tests,CI,CD,... but won't be discussed here. diff --git a/questions.md b/questions.md index 6b2a3b9..e3441b7 100644 --- a/questions.md +++ b/questions.md @@ -1 +1,11 @@ # Questions +- Is it a full web application with UI (like goo.gl or tinyUrl) or just Api is enough? +- Does it need authentication(user account, api key, ...)? +- Do you need analytics? (for short links as well as actual urls) +- Can users choose their short links too or all will be auto-generated? (ee.g.: www.trav.ix/abby) +- Can users modify or delete their links? +- Are there any software/hardware limitations? +- What is strategy to dealing with outdated links(including on-shot links after usage) or even very old links? do we need to keep them? +- What is the strategy to dealing with duplicate long urls? (e.g.: if n requests came to shorten "https://www.travix.com/contact" in different times) +- Is there any standard for designing API in travix? +- What about the Security concerns? (e.g.: DDos attacks , or in case someone try to abuse service) diff --git a/scale.md b/scale.md index 6896f84..93975f4 100644 --- a/scale.md +++ b/scale.md @@ -1 +1,10 @@ # Scale +To cover scalability concerns I decided to consider various points to approaching best result: +- Multiple instances(nodes) of workers are used. On the other hand, it's possible to implement the services using "container", so it will be easier to add a new node to handle more traffics(horizontal scaling). obviously by using multiple servers/nodes,a load balancer is needed to redirect requests to nodes. +- Microservice architecture for scalability of application in future is a good option. +- using queuing system like RabbitMQ for writing data to database that makes request faster is good choice. In this case the "write to db" action which leads to bottlenecks in case of heavy traffic, is taken care. +- sharding database also will help and decrease the reading time from database, specially when the number of records growth in years to millions and billions records. +- using cache will decrease number of readings from database and makes requests faster +- my approach to generate new random hash-key(dynamic part of short link) is fast(because of unique number that is used) and does not generate duplicates, so we don’t need to check if generated key exists. that means faster request as well. +- using CDN also helps in this matter and decrease response time (everyone get response from closest server) +- also using cloud services like Amazon AWS or Google Cloud or MS Asure gives a lot of built-in tools and options to cover scalability concern. for example using amazon elastic beanstalk can make service/application Auto-scale which means it will scale according to number of requests. \ No newline at end of file diff --git a/solution.md b/solution.md index 02bb42e..e099711 100644 --- a/solution.md +++ b/solution.md @@ -1 +1,21 @@ # Solution.md +- My design includes these components : core service which is REST Api, Load balancer, Range keeper, Cache, Queuing system and Database. +- core services include 2 APIs: +url shortener service : it gets actual URL and expirationTime(optional) and returns Short URL. +For generating hash-keys, we fetch "counter" and convert a unique number represented by counter value to its expression in base64. and then we add 1 to counter for next hash-key. By this method we solve the collision problem. +If we store "counter" in one server then "counter host" will be both "single point of failure" and "single point of bottleneck". because of that we divide whole numbers between 0 to 6 trillion into ranges(e.g.: 0 to 1,000,000 and 1,000,001 to 2,000,000 and so on) and put the ranges in some High availability and distributed service like ZooKeeper. so the workers just go to zookeeper and get a range that is not flagged as "used" and do generating their hash-keys. + +URL resolver service: it gets short URL and return a related response or redirect to target url. +gets the hash-key(short URL) and fetch the actual URL. then + 1- if the link does not exists or is invalid(expired) return the proper response code witjout redirecting + 2- if link exists and is valid and also is a "get" then redirect to that(if is on-shot then set "expiratinDateTime") + 3- if link exists and is valid and is a "post" then it can forward the request to target page and get the response and then return it to client. +- because there is no relation in model, it is better to choose No-SQL database for this situation(faster fetching). and the model could be like this: +{ + Hashkey, --for saving some storage we save just hash-key and drop "trav.ix/" + ActualUrl, + IsOneShot, + ExpirationDateTime, + WebVerb, -- get or post +} +- "need to be quickly accessible across the world" for this matter we use some geographical distribution like CDN(content delivery network) \ No newline at end of file diff --git a/system context diagram.png b/system context diagram.png new file mode 100644 index 0000000..eea4bcf Binary files /dev/null and b/system context diagram.png differ diff --git a/system-context-diagram.png b/system-context-diagram.png deleted file mode 100644 index ea14a92..0000000 Binary files a/system-context-diagram.png and /dev/null differ