To get help, use the gogolist “–help” option:
user@hostname:~$ gogolist.py --help
Usage: gogolist.py [options]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-w FOLDER, --workspace=FOLDER
path to workspace directory [REQUIRED]
--stdin use stdin to get listing to process (instead of using
args). See also --split-max-lines
-e EXECUTE_CMD, --execute=EXECUTE_CMD
command to execute [REQUIRED]
--streaming execute command giving listing with a pipe on stdin
--qsub-options=QSUB_OPTIONS
options we will give to qsub
--qsub-max-running-jobs=QSUB_MAX_RUNNING_JOBS
nbr jobs running at same time for qsub jobarrays
(modulo option in jobarray)
--batch-manager=BATCH_MANAGER
batch manager : torque | pbspro | local
[default=torque]
--split-max-lines=SPLIT_MAX_LINES
max lines for internal listing [required when using
--stdin]. (note : '0' will generate a temporary
complete listing)
--split-max-jobs=SPLIT_MAX_JOBS
max jobs to run (used to get the number of lines for
internal listing)
-v, --verbose activate verbose output
-d, --debug run in debug mode
--dry-run prepare workspace and listings, but do not execute the
command (=> no log, no report)
--reporting prepare workspace and listings, but do not execute the
command (=> no log, no report)
--no-register do not register job in monitor [url=http://cercloudweb
/jobsmonitor/api/v1/gogolistjob/]
-c CONFIG_FILE, --config=CONFIG_FILE
config file (json syntax)
Basically, to launch a task with gogolist, gogolist requires the following informations:
For each execution gogolist creates a workspace folder in which will be set:
- the job configuration
- the input listings
- the error listings
- the reporting and monitoring informations
- the executable output logs
Using “-w” (or “–workspace”) option, the user specifies the root of the workspaces (called [UserRootWorkspaceDir] in this doc). Gogolist then creates a hierarchical structure based on the current date and a execution id. For example, for the third execution on 2012/08/02, the following structure will be created
[UserRootWorkspaceDir]/20120802/000003/{report,input,monitor,logs,output}
Note
The user listing can be send to Gogolist in two different ways :
the listing file as Gogolist argument (default mode)
gogolist.py [options] /path/to/userlisting.txt
using Pipe (“–stdin” option)
cat /path/to/userlisting.txt | gogolist.py --stdin [options]Tip
This is a really pratical way to provide a listing to Gogolist, without using intermediate files. Here are some examples:
- If you want to launch the process on the first five lines of your listing:
cat /path/to/userlisting.txt | head -n5 | gogolist --stdin ...
- You can create your listing dynamically and send it to gogolist :
~user/listing_generator.sh | gogolist --stdin ... find /path/to/data -type f | gogolist --stdin ...
- You can easily relaunch the job listing-lines which failed :
cat [UserRootWorkspaceDir]/YYYYMMDD/XXXXXX/output/listing.err* | gogolist --stdin
Gogolist has to split the user-listing to allow the job parallelization. The following options define how the user listing will be split. All those sub-listings will then be processed in parallel.
Note
The number of sub-listings is the maximum number of tasks which can run in parallel.
“–split-max-lines” : the maximum number of lines for each sub-listing gogolist will create.
Example: For an input listing of 100 lines, –split-max-lines=5 will create 20 sub-listings
“-split-max-jobs” : the maximum number of sub-listings gogolist will create.
Gogolist can connect to several batch schedulers. To choose the one you have, use the “–batch-manager” option.
You may have to give some specific options to the scheduler, using the “–qsub-options”. Usual needs are to set the cluster name and the resource reservations. Example:
--qsub-options="-l nodes=1:cercloudcluster,mem=2gb"
In “default” mode, sequential processing is used for each sub-listing.
myexecutable sublisting_lineX
In “streaming” mode, the sub-listing is send to the executable using the unix pipeline “|”.
cat /path/to/sublistingY.txt | myexecutable
In “streaming-in-exec”, The executable is launched and the sub-listing path is provided to the executable as an argument.
myexecutable /path/to/sublistingY.txt
Tip
The streaming modes can be really effective if launching n executables for a single line is much longer than launching a single executable for n lines. Reasons can be :
- Initialization time of the executable
- Post treatments
- Network connections sessions/initialization
- ...
Gogolist provides some monitoring tools which allows you to see the processing progress and have some reporting on it.
In order to activate the progress report, use the option “–reporting”.
By default Gogolist records each new Job in a web based monitoring tool, to disable this use the option “–no-register”.
Thr configuration file is optionnal. Its aims is to overwrite some default parameters:
Gogolist looks for the configuration file in this order:
to be continued...