Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve crawl configuration and launch management #84

Open
3 tasks
anjackson opened this issue Apr 13, 2022 · 0 comments
Open
3 tasks

Improve crawl configuration and launch management #84

anjackson opened this issue Apr 13, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@anjackson
Copy link
Contributor

anjackson commented Apr 13, 2022

Building on #83, improve crawl management:

  • Make crawl launches 'back-fillable` so we can re-run launches if the don't happen:
    • Needs date-stamped crawl feed files.
    • Needs a separate task that is dependent on the data export, or the current w3act_export needs to be made back-fillable.
  • Blocks, seeds and scope files in use by the crawlers need to be updated:
    • Blocks and scope managed via Watched Files, less clear if/how seeds should be blanket-updated.
    • Not clear how best to do that. Probably push rather than pull, as this means Airflow is always in charge of things. But then, a shared volume updated directly by Airflow? Files made available and remote task or service prompted to pull them down?
  • Launch metrics need to be posted to Prometheus.
@anjackson anjackson self-assigned this Apr 13, 2022
@anjackson anjackson added the enhancement New feature or request label Apr 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant