A lightweight crawler and news classifier
###Crawler Get links/nodes, build edges between them and download web documents
###Indexer Indexing web document to speed up search query
###Ranker Ranking documents using PageRank algorithm
###News Monitor Monitor and update latest news from the news source – Re-crawl / using RSS
###Tokenizer Extract features/tokens from web documents to classify
###Classifier Be able to classify new crawled web pages using Naïve Bayes algorithm