Condor

Schedule and run remote CPU intensive applications using the idle cycles of distributively owned workstations
Download

Condor Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Apache
  • Price:
  • FREE
  • Publisher Name:
  • Condor Authors
  • Publisher web site:
  • http://www.cs.wisc.edu/condor/
  • Operating Systems:
  • Mac OS X 10.4 or later
  • File Size:
  • 322.3 MB

Condor Tags


Condor Description

Schedule and run remote CPU intensive applications using the idle cycles of distributively owned workstations Condor is a specialized workload management system for compute-intensive jobs. Like other full-featured batch systems, Condor provides a job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Users submit their serial or parallel jobs to Condor, Condor places them into a queue, chooses when and where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion.While providing functionality similar to that of a more traditional batch queueing system, Condor's novel architecture allows it to succeed in areas where traditional scheduling systems fail. Condor can be used to manage a cluster of dedicated compute nodes (such as a "Beowulf" cluster). In addition, unique mechanisms enable Condor to effectively harness wasted CPU power from otherwise idle desktop workstations. For instance, Condor can be configured to only use desktop machines where the keyboard and mouse are idle. Should Condor detect that a machine is no longer available (such as a key press detected), in many circumstances Condor is able to transparently produce a checkpoint and migrate a job to a different machine which would otherwise be idle. Condor does not require a shared file system across machines - if no shared file system is available, Condor can transfer the job's data files on behalf of the user, or Condor may be able to transparently redirect all the job's I/O requests back to the submit machine. As a result, Condor can be used to seamlessly combine all of an organization's computational power into one resource. Condor monitors a pool of machines to find idle machines that are then used to complete submitted jobs, and also provides remote system and checkpointing mechanisms that allow for transparent remote execution and job migration.NOTE: Condor is licensed and distributed under The Apache License. What's New in This Release: · This release is incompatible when communicating with previous versions of Condor if CCB is enabled or if PRIVATE_NETWORK_NAME is configured. · Updated the DRMAA version. This new version is compliant with GFD.133, the DRMAA 1.0 grid recommendation standard. Three new functions were added to meet the specification's requirements, and several bugs were fixed. New Features: · Added support for using any recognized script as an executable in a submit file on Windows. For more information please see section 6.2.6 on page . · Improved support for private networks: Added CCB, the Condor Connection Broker. It is similar in functionality to GCB, the Generic Connection Broker, but it has several advantages, including ease of use and working on Windows as well as Unix platforms. GCB continues to work, but we may remove it some time in the 7.3 development series. The main missing feature in CCB at the moment that prevents it from replacing GCB, is support for connectivity from one private network to another. CCB only works when connecting from a public network to a private one. For example, jobs may be sent from a condor_schedd on the public Internet to condor_startd daemons on a private network, if the condor_startd daemons are configured to use a CCB server that is accessible to the condor_schedd daemon. However, if the condor_schedd daemon is on one private network and the condor_startd daemons are on a different private network, CCB does not help. For more information on CCB, see section 3.7.3. · Added support for a CPU affinity on Linux platforms. · Added support for the condor_q -better-analyze option on Windows. · Added WANT_HOLD. When PREEMPT becomes true, if WANT_HOLD is true, the job is put on hold for the reason (optionally) specified by WANT_HOLD_REASON and WANT_HOLD_SUBCODE. These policy expressions are evaluated by the execute machine. As usual, the job owner may specify periodic_release and/or periodic_remove expressions to react to specific hold states automatically. · Added the ClassAd function debug(). See section 4.1.1 for the details of this function. · The condor_schedd can now use MD5 check sums to avoid storing multiple copies of the same executable in its SPOOL directory. Note that this feature only affects executables sent to the condor_schedd via the copy_to_spool command within a submit description file. · Reduced the number of sleeps condor_dagman does to maintain log file consistency when a DAG uses multiple user logs for node jobs. DAGMan now does one sleep per submit cycle, instead of one sleep for each submit. · Added the -import_env command-line flag to condor_submit_dag. This explicitly puts the submittor's environment into the .condor.sub file. · Optimized the removal of large numbers of jobs. Previously, removal of tens of thousands of jobs caused the condor_schedd daemon to consume a lot of CPU time for several minutes. · Reduced memory usage by the condor_shadow daemon. Since there is one condor_shadow process per running job, this helps increase the number of running jobs that a submit machine can handle. Under Linux 2.6, we found that running 10,000 jobs from a single submit machine requires about 10GBytes of system RAM. We also found in this case that to run more than 10,000 simultaneous jobs requires a 64-bit submit machine. On a 32-bit Linux platform, kernel memory is exhausted, regardless of how much additional RAM the system has. · Reduced the memory usage of the condor_collector daemon, when UPDATE_COLLECTOR_WITH_TCP = True. Configuration Variable Additions and Changes: · The new configuration variable OPEN_VERB_FOR__FILES allows the default interpreter for scripts with an extension EXT to be changed. For more information please see section 6.2.6 on page . · The new configuration variable CCB_ADDRESS configures a daemon to use one or more CCB servers to allow communication with Condor components outside of the private network. See page . · The new configuration variable MAX_FILE_DESCRIPTORS (on Unix platforms only) specifies the required file descriptor limit for a Condor daemon. File descriptors are a system resource used for open files and for network connections. Condor daemons that make many simultaneous network connections may require an increased number of file descriptors. For example, see page for information on file descriptor requirements of CCB. · The new configuration variables ENFORCE_CPU_AFFINITY and SLOTx_CPU_AFFINITY on Linux platforms allow for Condor to lock slots to given CPUs. · The new configuration variable DEBUG_TIME_FORMAT allows a custom specification for the format of the time printed at the start of each line in a daemon's log file. See 3.3.4 for the complete definition of this variable. · The new configuration variable SHARE_SPOOLED_EXECUTABLES is a boolean value that determines whether the condor_schedd daemon will use MD5 check sums to avoid storing multiple copies of the same executable in the SPOOL directory. The default setting is True.


Condor Related Software