Skip to content

Latest commit

 

History

History
33 lines (27 loc) · 1.17 KB

README.md

File metadata and controls

33 lines (27 loc) · 1.17 KB

bbripper

Project to scrape/rip certain content from the web

Requirements:

  • Jython
  • Java 7 JRE
  • mozilla firefox (current version)
  • scrapy 0.16 or later
  • sikuli 1.0.0 or later
  • ImageMagick + textcleaner
  • tesseract-ocr
  • pdfocr (modified) + option-modifier script

Hardware/OS requirements:

  • Linux initial support (ubuntu/unity), should work on Windows and Mac too
  • approx 200GB disk space (possibly more)
  • possible integration with VPS/cloud servers

Objective:

Running:

  • from ./sikuli_api/ ./sikuli-script -r ../workspace/bbripper/sikuli.sikuli