-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added file support and fixed uri typo #60
base: master
Are you sure you want to change the base?
Added file support and fixed uri typo #60
Conversation
Add support for reading trackers from files. Use while initilization via scraper.Scraper(trackerfile="filepaths") or add via calling scraper.Addtrackfile("filepaths"). "filepaths" are comma seperated paths. For a single file, put only a single path
Also on _connect_request ConnectionResetError could be raised (for various reasons such as URL blocked by isp) Since the error is being raised in _connect_request and not in scrape_tracker I have generated the error on _connect_request and passed back to scrape_tracker in connection_id parameter. Not the cleanest approach but still a way to preserve the error. |
Thank you for the improvements, is it okay if I change the base to develop? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice feature! But we could add tests to cover it
@@ -243,6 +270,9 @@ def scrape_tracker(self, tracker): | |||
results += _bad_infohashes | |||
return {"tracker": tracker_url, "results": results} | |||
|
|||
def Addtrackfile(self, filename): #comma seperated lists of files to read trackers from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cant find references to this method anyhere, where is it used?
@@ -115,9 +118,27 @@ def get_good_infohashes(self) -> list: | |||
) | |||
return good_infohashes | |||
|
|||
def get_trackers_viafile(self,trackers,filename): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind writing a test case to cover this? Thanks.
@@ -83,7 +85,7 @@ def connect(self, timeout): | |||
|
|||
class Scraper: | |||
def __init__( | |||
self, trackers: List = [], infohashes: Tuple[List, str] = [], timeout: int = 10 | |||
self, trackerfile: str = "", trackers: List = [], infohashes: Tuple[List, str] = [], timeout: int = 10 | |||
): | |||
""" | |||
Launches a scraper bound to a particular tracker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A docstring update would be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to push an update to docstring soon
logger.error("External tracker file not found: %s", e) | ||
#raise Exception("External tracker file not found: %s" % e) | ||
else: | ||
file1 = open(filename, 'r') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use Path().open()
to also open and read the file so file1 = my_file.open()
https://docs.python.org/3/library/pathlib.html#pathlib.Path.open
Fix uri's having colon after two slashes.
Add support for reading trackers from files.
Use while initilization via scraper.Scraper(trackerfile="filepaths") or add via calling scraper.Addtrackfile("filepaths").
"filepaths" are comma seperated paths. For a single file, put only a single path