-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add image building and tagged testing to CI #350
base: main
Are you sure you want to change the base?
Changes from 6 commits
96bbc24
42683d3
36facda
79cb6d6
f9d266f
6e5122d
49536db
bb69c94
db82883
fe3ad44
1783048
6122bde
2ed3ffa
82fb8e6
1666e46
fdd90b9
bbbb1c1
f1247c7
67f958f
cab5ebb
bc6d094
4b34844
2cb85e1
6091289
fed8b44
3e0c027
a9eba5c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,30 +1,52 @@ | ||
#!/bin/bash | ||
|
||
TAG="$1" | ||
[[ -z "${TAG}" ]] && TAG="latest_rc" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would do TAG="integration_tests" by default. |
||
|
||
NOW="$(date +"%Y-%m-%d_%H_%M")" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No seconds? At the very least it could be good for debugging. |
||
ARCH="amd64" | ||
[[ "$(uname -m)" = "arm64" ]] && ARCH="arm64" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it bad to just do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I want |
||
|
||
RUNDIR="apps/integration/runs/${NOW}" | ||
GIT_LOGFILE="${RUNDIR}/git.log" | ||
DOCKER_LOGFILE="${RUNDIR}/docker.log" | ||
POETRY_LOGFILE="${RUNDIR}/poetry.log" | ||
PYTEST_LOGFILE="${RUNDIR}/pytest.log" | ||
QUERY_LOGFILE="${RUNDIR}/test_queries.log" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd just rename these to |
||
|
||
main() { | ||
if [[ ! -d ".git" ]]; then | ||
echo "Error: please run this script from sycamore root!" >&2 | ||
exit 1 | ||
fi | ||
mkdir -p "${RUNDIR}" | ||
echo "Building/testing tag ${TAG}" >&2 | ||
echo "Get the newest git commits" >&2 | ||
checkout_main_if_new | ||
local should_run=$? | ||
if [[ $should_run ]]; then | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can shorten this to if [[ checkout_main_if_new ]]; then There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can ditch the square brackets entirely.
|
||
echo "Changes detected. Running Tests" >&2 | ||
poetry install | ||
build_containers | ||
runtests | ||
handle_outputs | ||
poetry install > "${POETRY_LOGFILE}" 2>&1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're going to need || die "poetry install failed" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. well, I always want to get to the poetry install > "${POETRY_LOGFILE}" 2>&1 \
&& build_containers > "${DOCKER_LOGFILE}" 2>&1 \
&& runtests > "${PYTEST_LOGFILE}" 2>&1 |
||
build_containers > "${DOCKER_LOGFILE}" 2>&1 | ||
runtests > "${PYTEST_LOGFILE}" 2>&1 | ||
local passed_tests=$? | ||
handle_outputs passed_tests | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Missing a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you know it! (exit codes don't do anything weird when they become variables, right?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe the number just carries across the assignment faithfully. It's when used as a conditional that zero becomes true. |
||
else | ||
echo "No changes detected. Skipping integration tests" >&2 | ||
fi | ||
} | ||
|
||
error() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should call this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. leaving as error but changing so it doesn't exit. I basically never want this script to exit early There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it's just cleanup you're worried about, you could also look into |
||
echo "ERROR: $@" | ||
exit 1 | ||
} | ||
|
||
checkout_main_if_new() { | ||
old_sha="$(git rev-parse HEAD)" | ||
git fetch origin main >&2 | ||
git fetch origin main > "${GIT_LOGFILE}" | ||
new_sha="$(git rev-parse FETCH_HEAD)" | ||
if [[ "${old_sha}" != "${new_sha}" ]]; then | ||
git pull origin main >&2 | ||
git pull --rebase origin main >> "${GIT_LOGFILE}" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Check that git status is clean.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. since I dont want to die, does There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, I need it to be okay with untracked files, bc I create them in apps/integration/runs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's not rely on specific English-language messages. How about running There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. alternatively I can just add apps/integration/runs to the gitignore and then take the simpler -z git status --procelain you gave me earlier |
||
return 0 | ||
else | ||
return 1 | ||
|
@@ -33,20 +55,73 @@ checkout_main_if_new() { | |
|
||
build_containers() { | ||
echo "Yep, definitely building containers. That's what this function does" >&2 | ||
docker-build-hub apps/crawler/crawler/http/Dockerfile | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we want to build and push to docker hub or only build to local? This seems like it's going to generate a lot of transit data. Moreover, given the way you're doing the testing, if we want to test on arm it would be a separate build/test run and then the two builds would clash. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personally, I'd find it convenient that reasonably up-to-date images are easily available to test with. Having them in DockerHub seems like the lowest-friction way. |
||
docker-build-hub apps/crawler/crawler/s3/Dockerfile | ||
docker-build-hub apps/importer/Dockerfile.buildx | ||
docker-build-hub apps/opensearch/Dockerfile | ||
docker-build-hub apps/jupyter/Dockerfile.buildx --build-arg=TAG="${TAG}" | ||
docker-build-hub apps/demo-ui/Dockerfile.buildx | ||
docker-build-hub apps/remote-processor-service/Dockerfile.buildx | ||
} | ||
|
||
handle_outputs() { | ||
echo "Yep, definitely handling test outputs. That's what this function does" >&2 | ||
local passed_tests="$1" | ||
mv test-output.log "${QUERY_LOGFILE}" | ||
[[ ${passed_tests} = 0 ]] && touch "${RUNDIR}/passed" | ||
[[ ${passed_tests} != 0 ]] && touch "${RUNDIR}/failed" | ||
aws s3 cp -r "${RUNDIR}" "s3://sycamore-ci/${ARCH}" | ||
} | ||
|
||
runtests() { | ||
docker system prune -f --volumes | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems risky since if someone runs it it will prune other stuff they might want; It also means that we get rid of the old volumes; it's also not guaranteed to clean things up if it's still in use. I'd instead suggest that we generate a unique run id (use the date stamp?). and do the docker compose -p up. If we tag the images in the same way, then if there's a problem someone could go back and debug against the exact set of images. This would imply we don't want to push to docker hub since it's a lot of stuff. |
||
docker compose up reset | ||
poetry run pytest apps/integration/ -p integration.conftest --noconftest --docker-tag latest_rc | ||
poetry run pytest apps/integration/ -p integration.conftest --noconftest --docker-tag "${TAG}" | ||
# this is a complicated command, so: ^ ^ ^ test against containers tagged latest_rc | ||
# | don't load conftest at pytest runtime; it's already loaded | ||
# load conftest with plugins, to capture the custom command line arg --docker-tag | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would just reformat this as:
Then you don't need to keep updating the ASCII art if the above code changes. And, you get more horizontal space for the explanations. |
||
return $? | ||
} | ||
|
||
docker-build-hub() { | ||
local docker_file="$1" | ||
[[ -n "${docker_file}" ]] || error "missing ${docker_file}" | ||
local repo_name="$(_docker-repo-name "${docker_file}")" | ||
[[ -n "${repo_name}" ]] || error "empty repo name" | ||
shift | ||
|
||
local platform=linux/amd64,linux/arm64 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we want this? I would think we would build/test for the local platform. |
||
echo | ||
echo "Building in sycamore and pushing to docker hub with repo name '${repo_name}'" | ||
docker buildx build "$(_docker-build-args)" -t "${repo_name}:${TAG}" -f "${docker_file}" --platform ${platform} \ | ||
--cache-to type=registry,ref="${repo_name}:build-cache",mode=max \ | ||
--cache-from type=registry,ref="${repo_name}:build-cache" \ | ||
"$@" --push . || error "buildx failed" | ||
echo "Successfully built in $dir using docker file $docker_file" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where does |
||
} | ||
|
||
_docker-repo-name() { | ||
local docker_file="$1" | ||
echo "Finding repo name in: ${docker_file}" >&2 | ||
local repo_name="$(grep '^# Repo name: ' "${docker_file}" | awk '{print $4}')" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A more robust approach might be:
It would save you from multiple copies and variations in formatting. It goes well with your check that the word count is one, below. In assignment context, you don't need the quotes around There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, my pipe to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't need to be very robust here. We control the input file. |
||
if (( $(wc -w <<< ${repo_name}) != 1 )); then | ||
echo "Unable to find repo name in ${docker_file}" 1>&2 | ||
exit 1 | ||
fi | ||
echo "${repo_name}" | ||
} | ||
|
||
_docker-build-args() { | ||
local branch="$(git status | head -1 | grep 'On branch ' | awk '{print $3}')" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. head -n1 is better (fewer warnings). I'd also use grep -i just in case (get it?). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
local rev="$(git rev-parse --short HEAD)" | ||
local date="$(git show -s --format=%ci HEAD | sed 's/ /_/g')" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's preferred to have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ...why? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Mostly hygiene. If the program happens to have something that might get interpreted as a command-line option, -e protects it. Also, you can do multi-expression scripts with multiple -e directives. |
||
local diff=unknown | ||
if [[ $(git status | grep -c 'nothing to commit, working tree clean') = 1 ]]; then | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We want to check for the specific message to guarantee the tree is clean (no unexpected files also). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The definition of clean is zero modified files and zero untracked files. The future-proof way to do this is:
|
||
diff=clean | ||
else | ||
diff="pending_changes_$(git diff HEAD | shasum | awk '{print $1}')" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the SHA important? If so, SHA-1 is considered obsolete; so, you'd want |
||
fi | ||
echo "--build-arg=GIT_BRANCH=${branch} --build-arg=GIT_COMMIT=${rev}--${date} --build-arg=GIT_DIFF=${diff}" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why double-delimiters between rev and date? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because date has -'s in it; and -- makes it easier to read. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are better formats for date available, for instance ISO 8601 basic. |
||
} | ||
|
||
main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you, but quotes aren't necessary in square brackets. This file isn't consistent, if you care.