Context Navigation

Changes between Version 11 and Version 12 of ComputeResources/UMCGCluster

Timestamp:: Jan 8, 2013 11:08:48 AM (13 years ago)
Author:: Pieter Neerincx
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

ComputeResources/UMCGCluster

-                      v11
+                      v12
 First of all here are a few '''important''' things to know about the cluster and using it efficiently:
- * '''Head Node''': The head node should NOT be used to run any job directly nor any intensive process. Here intensive means either CPU, RAM or I/O intensive (i.e. large copying jobs should be run as jobs and queued appropriately). Note that loading the head node can cause the scheduler to crash which is harmful for everyone running jobs on the cluster!
  * '''Storage''': The block size on the storage is 6MB, which means that each file -regardless of its real size- will occupy at least 6MB on the file system. This means that data should rather be kept in big files rather than a multitude of small files whenever possible. Typically things like logs, old submit scripts, etc. should be compressed into 1 file for archiving.
  * '''I/O''': While 10Gb network connection per node is fast, typical GoNL jobs use large files and consumes lots of I/O. Therefore, I/O should be kept minimal and if a job can be parallelized on multiple cores (i.e. load data once in memory, process it on multiple cores, push it back), it is typically preferred as having separate processes all loading the same data in memory.
 …
    * Removes a job from the queue, killing the process if it was already started
    * "qdel all" can be used to purge all of your jobs
+== Available queues ==
+In order to quickly test jobs you are allowed to run the directly on cluster.gcc.rug.nl outside the scheduler. Please think twice though before you hit enter: if you crash cluster.gcc.rug.nl others can no longer submit or monitor their jobs, which is pretty annoying. On the other hand it's not a disaster as the scheduler and execution daemons run on physically different servers and hence are not affected by a crash of cluster.gcc.rug.nl.
+To test how your jobs perform on an execution node and get an idea of the typical resource requirements for your analysis you should submit a few jobs to the test queues first. The test queues run on a dedicated execution node, so in case your jobs make that server run out of disk space, out of memory or do other nasty things accidentally, it will not affect the production queues and ditto nodes.
+Once you've tested your job scripts and are sure they will behave nice & perform well, you can submit jobs to the production queue named ''gcc''.
+||**Queue**||**Job type**||**Limits**||
+||test-short||debugging||10 minutes max. walltime per job; limited to a single test node / 48 cores||
+||test-long||debugging||max 4 jobs running simultaneously per user; limited to half the test node / 24 cores||
+||gcc||production - default prio||none||
+||gaf||production - high prio||only available to users from the gaf group||
 === qsub options ===
 Jobs to be submitted via PBS qsub can specify a number of options to claim resources, report status, etc. These options can either be specified in the qsub command or in your job script. The latter is usually preferred as all information about the job including memory requirements, etc. stay with the script, below is an example header with some commonly used options, followed by a list of some commonly used flags and their meaning.