Jobs should NEVER be submitted from
-/staging
. All HTCondor job submissions must be performed from your
-/home
directory on the submit server and job log
,
-error
, and output
files should never be
-written to /staging
.
-
-CHTC accounts do not automatically include access to `/staging`. If you think
-you need a `/staging` directory, please contact us at
- -![Use Transfer Server](images/use-transfer-staging.png) - -Our dedicated transfer server, `transfer.chtc.wisc.edu`, should be used to -upload and/or download your files to/from `/staging`. - -The transfer server is a separate server that is independent of the submit -server you otherwise use for job submission. By using the transfer server -for `/staging` data upload and download, you are helping to reduce network -bottlenecks on the submit server that could otherwise negatively impact -the submit server's performance and ability to manage and submit jobs. - -**Users should not use their submit server to upload or download files -to/from `staging` unless otherwise directed by CHTC staff.** - -*How do I connect to the transfer server?* -Users can access the transfer server the same way they access their -submit server (i.e. via Terminal app or PuTTY) using the following -hostname: `transfer.chtc.wisc.edu`. - -*How do I upload/download files to/from `staging`?* -Several options exist for moving data to/from `staging` including: - -- `scp` and `rsync` can be used from the terminal to move data -to/from your own computer or *another server*. For example: - - ``` - $ scp large.file username@transfer.chtc.wisc.edu:/staging/username/ - $ scp username@serverhostname:/path/to/large.file username@transfer.chtc.wisc.edu:/staging/username/ - ``` - {:.term} - - **Be sure to use the username assigned to you on the other submit server for the first - argument in the above example for uploading a large file from another server.** - -- GUI-based file transfer clients like WinSCP, FileZilla, and Cyberduck -can be used to move files to/from your personal computer. Be -sure to use `transfer.chtc.wisc.edu` when setting up the connection. - -- Globus file transfer can be used to transfer files to/from a Globus Endpoint. -See our guide [Using Globus To Transfer Files To and From CHTC](globus.html) -for more details. - -- `smbclient` is available for managing file transfers to/from file -servers that have `smbclient` installed, like DoIT's ResearchDrive. See our guide -[Transferring Files Between CHTC and ResearchDrive](transfer-data-researchdrive.html) -for more details. - -[Return to top of page](#data-transfer-solutions-by-file-size) - -
-
-
-![Staging File Transfer](images/staging-file-transfer.png)
-
-`/staging` is a distinct location for temporarily hosting your
-individually larger input files >100MB in size or in cases when jobs
-will need >500MB of total input. First, a copy of
-the appropriate input files must be uploaded to your `/staging` directory
-before your jobs can be submitted. As a reminder, individual input files <100MB
-in size should be hosted in your `/home` directory.
-
-Both your submit file and bash script
-must include the necessary information to ensure successful completion of
-jobs that will use input files from `/staging`. The sections below will
-provide details for the following steps:
-
-1. Prepare your input before uploading to `/staging`
-2. Prepare your submit files for jobs that will use large input
-files hosted in `/staging`
-3. Prepare your executable bash script to access and use large input
-files hosted in `/staging`, delete large input from job
-
-## Prepare Large Input Files For `\staging`
-
-**Organize and prepare your large input such that each job will use a single,
-or as few as possible, large input files.**
-
-As described in our policies, data placed in `/staging` should be
-stored in as few files as possible. Before uploading input files
-to `/staging`, use file compression (`zip`, `gzip`, `bzip`) and `tar` to reduce
-file sizes and total file counts such that only a single, or as few as
-possible, input file(s) will be needed per job.
-
-If your large input will be uploaded from your personal computer
-Mac and Linux users can create input tarballs by using the command `tar -czf`
-within the Terminal. Windows users may also use a terminal if installed,
-else several GUI-based `tar` applications are available, or ZIP can be used
-in place of `tar`.
-
-The following examples demonstrate how to make a compressed tarball
-from the terminal for two large input files named `file1.lrg` and
-`file2.lrg` which will be used for a single job:
-
-```
-my-computer username$ tar -czf large_input.tar.gz file1.lrg file2.lrg
-my-computer username$ ls
-file1.lrg
-file2.lrg
-large_input.tar.gz
-```
-{: .term}
-
-Alternatively, files can first be moved to a directory which can then
-be compressed:
-
-```
-my-computer username$ mkdir large_input
-my-computer username$ mv file1.lrg file2.lrg large_input/
-my-computer username$ tar -czf large_input.tar.gz large_input
-my-computer username$ ls -F
-large_input/
-large_input.tar.gz
-```
-{: .term}
-
-After preparing your input,
-use the transfer server to upload the tarballs to `/staging`. Instructions for
-using the transfer server are provide in the above section
-[Use The Transfer Server To Move Large Files To/From Staging](#transfer).
-
-## Prepare Submit File For Jobs With Input in `/staging`
-
-Not all CHTC execute servers have access to `/staging`. If your job needs access
-to files in `/staging`, you must tell HTCondor to run your jobs on the approprite servers
-via the `requirements` submit file attribute. **Be sure to request sufficient disk
-space for your jobs in order to accomodate all job input and output files.**
-
-An example submit file for submitting a job that requires access to `/staging`
-and which will transfer a smaller, <100MB, input file from `/home`:
-
-```{.sub}
-# job with files in staging and input in home example
-
-log = my_job.$(Cluster).$(Process).log
-error = my_job.$(Cluster).$(Process).err
-output = my_job.$(Cluster).$(Process).out
-
-...other submit file details...
-
-# transfer small files from home
-transfer_input_files = my_smaller_file
-
-requirements = (HasCHTCStaging =?= true)
-
-queue
-```
-
-**Remember:** If your job has any other requirments defined in
-the submit file, you should combine them into a single `requirements` statement:
-
-```{.sub}
-requirements = (HasCHTCStaging =?= true) && other requirements
-```
-
-## Use Job Bash Script To Access Input In `/staging`
-
-Unlike smaller, <100MB, files that are transferred from your home directory
-using `transfer_input_files`, files placed in `/staging` should **NEVER**
-be listed in the submit file. Instead, you must include additional
-commands in the job's executable bash script that will copy (via `cp` or `rsync`)
-your input in `/staging` to the job's working directory and extract ("untar") and
-uncompress the data.
-
-**Additional commands should be included in your bash script to remove
-any input files copied from `/staging` before the job terminates.**
-HTCondor will think the files copied from `/staging` are newly generated
-output files and thus, HTCondor will likely transfer these files back
-to your home directory with other, real output. This can cause your `/home`
-directory to quickly exceed its disk quota causing your jobs to
-go on hold with all progress lost.
-
-Continuing our example, a bash script to copy and extract
-`large_input.tar.gz` from `/staging`:
-
-```
-#!/bin/bash
-
-# copy tarball from staging to current working dir
-cp /staging/username/large_input.tar.gz ./
-
-# extract tarball
-tar -xzf large_input.tar.gz
-
-...additional commands to be executed by job...
-
-# delete large input to prevent
-# HTCondor from transferring back to submit server
-rm large_input.tar.gz file1.lrg file2.lrg
-
-# END
-```
-{:.file}
-
-As shown in the exmaple above, \*both\* the original tarball, `large_input.tar.gz`, and
-the extracted files are deleted as a final step in the script. If untarring
-`large_input.tar.gz` insteads creates a new subdirectory, then only the original tarball
-needs to be deleted.
-
-
-
-If your your job will transfer >20GB worth of input file, then using `rm` to remove these
-files before the job terminates can take a while to complete which will add
-unnecessary runtime to your job. In this case, you can create a
-subdirectory and move (`mv`) the large input to it - this will complete almost
-instantaneously. Because these files will be in a subdirectory, HTCondor will
-ignore them when determining with output files to transfer back to the submit server.
-
-For example:
-
-```
-# prevent HTCondor from transferring input file(s) back to submit server
-mkdir ignore/
-mv large_input.tar.gz file1.lrg file2.lrg ignore/
-```
-{:.file}
-
-Want to speed up jobs with larger input data?
-
- -![Staging File Transfer](images/staging-file-transfer.png) - -`/staging` is a distinct location for temporarily hosting -individual output files >4GB in size or in cases when >4GB -of output is produced by a single job. - -Both your submit file and job bash script -must include the necessary information to ensure successful completion of -jobs that will host output in `/staging`. The sections below will -provide details for the following steps: - -1. Prepare your submit files for jobs that will host output in `/staging` -2. Prepare your executable bash script to tar output and move to `/staging` - -## Prepare Submit File For Jobs That Will Host Output In `/staging` - -Not all CHTC execute servers have access to `/staging`. If your -will host output files in `/staging`, you must tell HTCondor to run -your jobs on the approprite servers via the `requirements` submit -file attribute: - -```{.sub} -# job that needs access to staging - -log = my_job.$(Cluster).$(Process).log -error = my_job.$(Cluster).$(Process).err -output = my_job.$(Cluster).$(Process).out - -...other submit file details... - -requirements = (HasCHTCStaging =?= true) - -queue -``` - -**Remember:** If your job has any other requirments defined in -the submit file, you should combine them into a single `requirements` statement: - -```{.sub} -requirements = (HasCHTCStaging =?= true) && other requirements -``` - -## Use Job Bash Script To Move Output To `/staging` - -Output generated by your job is written to the execute server -where the run jobs. For output that is large enough (>4GB) to warrant use -of `/staging`, you must include steps in the executable bash script of -your job that will package the output into a tarball and relocate it -to your `/staging` directory before the job completes. **This can be -acheived with a single `tar` command that directly writes the tarball -to your staging directory!** It is IMPORTANT that no other files be written -directly to your `/staging` directory during job execution except for -the below `tar` example. - -For example, if a job writes a larger ammount of output to -a subdirectory `output_dir/` along with an additional -larger output file `output.lrg`, the following steps will -package the all of the output into a single tarball that is -then moved to `/staging`. **Note:** `output.lrg` will still exist -in the job's working directory after creating the tarball and thus -must be deleted before job completes. - -``` -#!/bin/bash - -# Commands to execute job - -... - -# create tarball located in staging containing >4GB output -tar -czf /staging/username/large_output.tar.gz output_dir/ output.lrg - -# delete an remaining large files -rm output.lrg - -# END -``` -{: .file} - -If a job generates a single large file that will not shrink much when -compressed, it can be moved directly to staging. If a job generates -multiple files in a directory, or files can be substantially made smaller -by zipping them, the above example should be followed. - -``` -#!/bin/bash - -# Commands to execute job - -... - -# move single large output file to staging -mv output.lrg /staging/username/ - -# END -``` -{: .file} - -## Managing Larger `stdout` From Jobs - -Does your software produce a large amount of output that gets -saved to the HTCondor `output` file? Some software are written to -"stream" output directly to the terminal screen during interactive execution, but -when the software is executed non-interactively via HTCondor, the output is -instead saved in the `output` file designated in the HTCondor submit file. - -Because HTCondor will transfer `output` back to your home directory, if your -jobs produce HTCondor `output` files >4GB it is important to move this -data to `/staging` by redirecting the output of your job commands to a -separate file that gets packaged into a compressed tarball and relocated -to `/staging`: - -``` -#!/bin/bash - -# redirect standard output to a file in the working directory -./myprogram myinput.txt > large.stdout - -# create tarball located in staging containing >4GB output -tar -czf /staging/username/large.stdout.tar.gz large.stdout - -# delete large.stdout file -rm large.stdout - -# END -``` -{: .file} - -[Return to top of page](#data-transfer-solutions-by-file-size) - -
-