-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GitHub Action to Generate a WARC of Hosted Site #66
base: master
Are you sure you want to change the base?
Conversation
- generate only on master branch - add Gemfile and Gemfile.lock necessary for local Jekyll build - add local jquery instead of loading from cdn so that the WARC is more self-contained - use 'iipc-warc-specification.warc.gz' as the name (note github still zips the file)
Co-authored-by: Sawood Alam <[email protected]>
It is worth noting that these artifacts will not be preserved forever. They expire after 90 days automatically. Also, there might be some disk quota associated. If we had an external storage where these WARCs can be pushed as the next step after artifacts are built, that would be great. Also, it will be better to add timestamp in the filename. |
Yeah, timestamp is a good idea.. Maybe there should be a separate workflow for turning the artifacts into releases, which would be permanent.. perhaps on a version change? |
I don't think this repository is tagged/versioned, but if we plan to do that every now and then after major changes, uploading workflow artifacts as release artifacts would be a good idea. On the other hand, one can always recreate a WARC file of a prior state by checking the code out at a specific repo state, building the site, and running warcit on it. |
Latest update uses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes intend to add the ability to manually trigger an event to create WARC file on demand at a supplied Git reference.
This, unfortunately will not work on prior commit states as Gemfile will be missing. However, if we do not mandate inclusion of Gemfile in the repo and install all the ruby dependencies inline in this workflow file then it should work on historical versions as well.
Co-authored-by: Sawood Alam <[email protected]>
I suppose you can check if Gemfile exists and, if not, create it on the fly..
There may be more variations too, like if the gh pages root is in the |
If we were to do that, then it would be simpler to not rely on a Gemfile and have all the packages necessary to replicate default GH Pages builder.
In that case you should be able to ask users to provide input variables to identify which category their site falls under while having a more sensible and common default. There are a handful of reusable actions to host static sites on GH Pages, built from many different static site generators. |
Inspired by @anjackson's tweet, here's a github action that will generate a WARC of the github pages site after every commit to master.
I figured having a WARC for every commit of the WARC specification might be a good test case for this idea!
This PR adds an action that builds the site via Jekyll and then generates a WARC using
warcit
and uploads it as an artifact to github, like this:https://github.com/ikreymer/warc-specifications/actions/runs/170823529
(Note that due to limitation of github, the artifact is always also zipped, so that WARC file is placed in a zip file - can't be changed for now).
This PR also adds:
(The github api to list active issues an of course the active issues themselves are not included, which might be a nice future extension...)