This repository contains automation tools to simplify the use of the new experimental Bioconductor Hubs Ingest stack. These tools streamline the process of creating and managing temporary endpoints for ingesting data for the Bioconductor Hubs.
The easiest way to manage endpoints is through our provided GitHub Actions workflows.
- The following repository secrets must be configured by an administrator:
KUBECONFIG
: Kubernetes configuration for cluster access
Two distinct types of secrets are used in this system:
-
User S3 Keys (
S3KEY_<USERUSER>
):- One unique key per data submitter
- Used only for their specific endpoint
- Should be randomly generated for security
-
Admin Access Password (
ADMINPASS_<ADMINUSER>
):- One password per administrator
- Used for ALL RStudio instances launched by that admin
- Should be a secure, memorable password you'll reuse
- Same password works on any endpoint you examine
Generate a random key for each data submitter using one of these methods:
-
Using OpenSSL (recommended):
openssl rand -hex 32
-
Using /dev/urandom:
cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1
-
Manual method: Randomly type at least 32 letters and numbers on your keyboard
Add the generated key as a GitHub secret:
- Name it
S3KEY_<USERUSER>
(e.g.,S3KEY_DATAOWNER
) - Share this random key securely with the data submitter
- They'll need it for S3 endpoint access
As an administrator:
- Choose a secure password you want to reuse
- Create a secret named
ADMINPASS_<ADMINUSER>
- Where ADMINUSER is YOUR GitHub username in uppercase
- Example: GitHub user 'almahmoud' creates
ADMINPASS_ALMAHMOUD
- This will be your password for ALL RStudio instances you launch
- Username will always be
rstudio
- Password will always be your ADMINPASS value
- Works on any rstudio endpoint when you run the launch workflow
- Username will always be
- Navigate to the "Actions" tab
- Select the "Create Hub Ingest Endpoint" workflow
- Click "Run workflow"
- Fill in the parameters:
- Username: Your username (must match the
S3KEY_<USERUSER>
secret) - Size: Storage size (e.g., "50Gi")
- Username: Your username (must match the
- Click "Run workflow"
The workflow will:
- Create your endpoint with the specified storage
- Automatically test the endpoint by:
- Creating a test bucket
- Uploading a test file
- Retrieving the file
- Confirm the S3 credentials and endpoint are working properly
These tools are for administrators to examine data that contributors have uploaded to their endpoints:
Note: This will stop the contributor's ingestion endpoint. Only run these steps after confirming they have completed their data uploads.
Run a virus scan on a contributor's uploaded data:
- Navigate to the "Actions" tab
- Select the "Scan Data for Viruses" workflow
- Enter the contributor's username
- Click "Run workflow"
The scan results will be displayed directly in the GitHub Actions workflow log, clearly marked between separator lines for easy viewing.
- Click on the Job
- Expand "Run virus scan"
- Find and investigate "Virus Scan Report"
Launch an RStudio instance to examine a contributor's data:
-
First, ensure you have set up your admin password:
- Secret name:
ADMINPASS_<ADMINUSER>
where ADMINUSER is YOUR GitHub username in uppercase - Example: GitHub user 'almahmoud' needs secret
ADMINPASS_ALMAHMOUD
- Secret name:
-
Launch RStudio:
- Enter the CONTRIBUTOR'S username
- The workflow will use YOUR admin password for RStudio access
- Example: Admin 'almahmoud' examining contributor 'dataowner's data:
- Username parameter: dataowner
- RStudio password: Value from
ADMINPASS_ALMAHMOUD
-
Access RStudio:
- URL:
https://<contributor>-rstudio.hubsingest.bioconductor.org
- Example:
https://dataowner-rstudio.hubsingest.bioconductor.org
- Example:
- Login with:
- Username: Always
rstudio
- Password: Your
ADMINPASS_<ADMINUSER>
value
- Username: Always
- URL:
- Navigate to the "Actions" tab
- Select the "Delete Hub Endpoint" workflow
- Click "Run workflow"
- Enter your username
- Click "Run workflow"
- Kubernetes configuration file (kubeconfig)
- Contact your Kubernetes administrator to obtain this file
- The file should be placed at
~/.kube/config
- The configuration must have permissions to create namespaces and deploy resources
Note: You can customize the installation path by exporting the BIOC_HUBSINGEST_PATH
environment variable before running the installation command. If not specified, the tools will be installed in the default directory (/usr/local/bin/hubsingest).
curl https://raw.githubusercontent.com/Bioconductor/hubsingest/refs/heads/devel/install_hubsingest.sh | sudo bash
This script will:
- Create a directory for the tools (default: /usr/local/bin/hubsingest)
- Download the necessary scripts
- Make the main script runnable
- Provide instructions for updating your PATH
For those interested in the installation process, please examine the installation script.
hubsingest create_endpoint <username> <size> [<password>]
Example:
# With auto-generated password
hubsingest create_endpoint testuser 50Gi
# With specific password
hubsingest create_endpoint testuser 50Gi myspecificpassword
hubsingest delete_endpoint <username>
Example:
hubsingest delete_endpoint testuser
To scan a contributor's data for viruses:
hubsingest scan_data <username>
To launch an RStudio instance for examining data:
hubsingest launch_rstudio <username> <password> [bioc_version]
Example:
hubsingest launch_rstudio dataowner mypassword 3.18
After creating an endpoint, you can test it using the built-in test function or manually using the AWS CLI.
- AWS CLI installed (
aws
command available in your terminal) - Your S3 access key (username) and secret key (password)
hubsingest test_endpoint <username>
This will automatically:
- Create a test bucket
- Upload a test file
- Verify the file exists
- Clean up the test bucket
For manual testing or data upload, configure an AWS profile:
aws configure --profile hubsingestusername
# Enter your access key (username) when prompted
# Enter your secret key (password) when prompted
# Leave region blank (just press Enter)
# Leave output format blank (just press Enter)
When using AWS CLI commands manually, you would then have to include the profile and endpoint URL:
aws --profile hubsingestusername --endpoint-url https://<username>.hubsingest.bioconductor.org s3 <command>
Example commands:
# Make bucket and upload a file
aws --profile hubsingestusername --endpoint-url https://username.hubsingest.bioconductor.org s3 mb s3://mybucket
aws --profile hubsingestusername --endpoint-url https://username.hubsingest.bioconductor.org s3 cp myfile.txt s3://mybucket/
# List buckets
aws --profile hubsingestusername --endpoint-url https://username.hubsingest.bioconductor.org s3 ls
# Download a file
aws --profile hubsingestusername --endpoint-url https://username.hubsingest.bioconductor.org s3 cp s3://mybucket/myfile.txt ./