Skip to content

duytd/blackspider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Blackspider

A lightweight crawler and news classifier

Blackspider components

###Crawler Get links/nodes, build edges between them and download web documents

###Indexer Indexing web document to speed up search query

###Ranker Ranking documents using PageRank algorithm

###News Monitor Monitor and update latest news from the news source – Re-crawl / using RSS

###Tokenizer Extract features/tokens from web documents to classify

###Classifier Be able to classify new crawled web pages using Naïve Bayes algorithm

Blackspider architecture

Overall Architecture

About

A lightweight Scala web crawler and news classifier

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published