-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Radical suggestion: Deprecate this repo and break up pieces #206
Comments
Note there is another concern not well covered here: scraping (#174) and monitoring (#172). That said, they could maybe be lumped in with a future “Python implementation of the Ruby server.” My 2¢: I’m not sure I see as much value in splitting it up so granularly. I feel like I see two major areas here…
I can see the actual diffing algorithms living in either place. From a practical standpoint so far, they’ve proven to have zero applicability outside our problem space and thus outside the diffing server. The diffing server is also nothing but a wrapper for them (and, more importantly, at least one past experience showed not having the server to go along with the algorithm to be a real barrier). On the other hand, I can at least clearly envision our analysts using the algorithms in their scripts. It seems like we are finally semi-serious about a Python rewrite of the DB server. I think the scraping and monitoring tools would eventually move there, but they could continue to live here pretty comfortably until that becomes real enough. |
From last week’s dev meeting, the short-term plan is to break this repo in two:
With a potential future of splitting up the toolbox stuff here when we see it’s useful to do so. @danielballan please edit if I got this wrong. |
To surface a conversation from private channels:
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions. |
Updating this from some recent discussions… the amount of work and oddball logic involved in #174, combined with questions I’ve seen from people on Wayback’s Slack, have convinced my of @danielballan’s original point that a package dedicated to Wayback would be good. (I’d love to answer some questions I’ve seen with “just use this,” but it’s impractical given how buried it is in this repo.) Also marking this as |
I’ve also made sure that the new analyst task sheet script is in a separate repo (https://github.com/edgi-govdata-archiving/web-monitoring-task-sheets/), which puts me in a bit of a mind to at least reorganize things internally here a bit for sanity’s sake. I’m thinking:
I don’t think this organization magically solves anything, but I hope it makes the “grab-bag” feeling of this repo a little more navigable, and maybe helps us see where we can or should excise bits into separate places. Also, to be clear, I’m not suggesting we make each of these folders separate packages. This is just a little bit of organization within the |
I like this layout. 👍 I agree it will make it easier to grok what the various pieces of this project are. |
This reorganizes the content of web_monitoring into a hierarchy of modules for easier management and comprehension. See the discussion in #206 for more. scripts/ # Stubs for things in web_monitoring/cli annotations_import ia_healthcheck wm wm-diffing-server web_monitoring/ tests/ [same as today] diff/ content_type.py differs.py diff_errors.py html_diff_render.py links_diff.py diff_server/ server.py cli/ cli.py ia_healthcheck.py ia_import.py annotations_import.py __init__.py _version.py utils.py db.py This also drops `filtering.py`, which was vestigial and no longer used.
I’m going ahead and closing this issue. I think we’ll continue to pull things apart or reorganize where it seems reasonable, but the above re-organization (plus the upcoming reduction in developer time for this project) probably get us as close as is currently reasonable to the main goals of this issue. |
I am tempted to propose deprecating web-monitoring-processing and creating several separate repos in its place.
Arguments in favor:
Arguments against:
The text was updated successfully, but these errors were encountered: