====== Scheduler ====== This document covers the technical implementation of the scheduler. It is intended for anyone who needs to replace, modify, or debug the scheduler. User oriented scheduler documentation can be found at [[foss-scheduler]]. This document is not intended for people who want to create an agent (although it certainly helps to know this information). See [[writing_an_agent |Writing an agent ]] if you want to know more about that. ===== About the Scheduler ===== The scheduler is a //super-agent//, responsible for spawning and managing all other agents. The scheduler balances the needed tasks (found in the job queue) with the available resources. It tries to ensure that: - One task does not lock out other tasks. - Unused resources are used by other pending tasks. - Agents are spawned in an optimal fashion (fastest). - Agents do not exceed the alloted resources. Although the scheduler is single-threaded, it manages child processes that run in parallel. * Scheduler is single-threaded. * Scheduler spawns agents (children) that run in parallel. ===== Front-End Communications ===== The scheduler communicates with the front-end UI through the database's jobqueue table. This table lists which agents need to run, the necessary parameters for the agent, and the current operation status. * Agents can either be general -- running an any available server (host) -- or they can be host-specific. The jobqueue specifies a "runonpfile" field if the agent should be locked to a specific host. The parameter denoted by the runonpfile field is used to identify the host: run on the host that matches the pfile. * Jobs can contain a single parameter that is passed to the agent, or an SQL query can be provided for generating parameters. The latter multi-SQL-query (MSQ) is used in lieu of adding thousands of individual jobs to the jobqueue. The scheduler performs the MSQ request and the results are individually passed to agents. The combination of host and query type leads to four combinations, but only two combinations are implemented. ^ ^ Any host ^ Host-specific ^ ^ One parameter | OK | N/A | ^ MSQ | N/A | OK via runonpfile | Some example agents: * **wget:** Any host, one parameter (the URL to get) * **license:** The bSAM agent uses a host-specific MSQ. This allows bSAM to run on the system containing the license files, rather than resorting to NFS accesses. * **filter_license:** Similar to the bsam agent, the host-specific MSQ reduces NFS access. Some agents could fit into other categories, but are limited by the two available choices: * **unpack:** Host-specific, MSQ. The SQL parameter returns ONE record that contains the pfile and ufile to unpack. This should be implemented as a host-specific one-parameter job, but the only host-specific option is an MSQ. * Any future agent that only performs database accesses could be better fit as an any-host MSQ job. However, they will likely be implemented as a host-specific MSQ job. ==== Adding to the Queue ==== Jobs are added to the jobqueue by the front-end UI. The UI knows the desired agent type, whether it is an any-host or MSQ job, and the proper parameters (or SQL). The scheduler has no control over what is added and does not validate whether the added job is correct. ==== Job Tracking ==== The scheduler tracks jobs based on the jobqueue start and end times. * No start time? Ok to run. * Start without end? Job is currently being managed by the scheduler. * Start with end? Job is completed. Some jobs need to be rescheduled. For example, an MSQ may have a "LIMIT 5000" (allowing the scheduler to only manage a few results at a time and permitting a limited timeslice scheduling). This is done by removing the start time when the job completes -- effectively putting the job back into the jobqueue. If the MSQ returns no results, then the end-time is set, completing the job. The jobs run with the following priorities: - Anything currently running is allowed to run. The rationale: a job may take a long time and it is better not to cancel the job and try to restart it later. - Any jobs held by the scheduler come next. - Of the jobs held by the scheduler, jobs for available, active, agents come first. This reduces kill/spawn times. - If there are no running agents of the correct type, then one is spawned (possibly after killing an incorrect ready/active agent type first). - The job queue has a column for urgent tasks. These come next. - Any available job, oldest first. - Jobs that do not match the available agents are ignored and remain in the jobqueue. The jobqueue keeps a prioritized table in case one agent depends on the results from another agent. As a result, there are frequently jobs in the jobqueue that cannot run (temporarily blocked due to a dependency). This tracking method has a few limitations: * If the scheduler dies, someone needs to remove the start times on incomplete tasks. (The scheduler tries to do this with signal handling, but sometimes dies before resetting the values.) * Since the start time can be reset due to rescheduling, there is no way for the front-end to tell how long a job really took. * The scheduler holds on to jobs. The front-end cannot distinguish between a "held" job and a "running job". * The scheduler can only hold a few jobs at a time: one "any host" per agent, and four MSQ at a time (MAXMSQ in dbq.c). It is very possible for the four pending MSQ commands to be held and waiting for an agent to become available, while other MSQ commands in the queue could run. The main function for checking the queue is in dbq.c: DBProcessQueue(). This checks the jobqueue for new tasks and processes the held MSQ records. ==== Signals ==== The scheduler watches for a few signals. These are mainly used for debugging: * **SIGINT**. Finish all running jobs, but do not start new ones. When all jobs complete, exit. This is denoted in the code by the "SLOWDEATH" flag and is used to provide a clean exit. * **SIGQUIT**. Kill all running children, try to reset the jobqueue start time, and exit. * **SIGTERM**. Handled the same as SIGQUIT. * **SIGUSR1**. Display a quick summary of running processes (how many are running, waiting, or dead.) * **SIGUSR2**. Display details about every MSQ job held by the scheduler. This can generate a huge amount of output, but allows debugging MSQ jobs. * **SIGHUP**. This displays the number of running jobs and the summary of each process (same as SIGUSR1). * **SIGSEGV**. If there is a crash, display all thread info (SIGHUP) before dying. ===== Back-end Children ===== All children are treated as finite-state machines. The states (defined in spawn.h) are: * **ST_FAIL = 0**. If an agent spawns and dies too rapidly, then mark it as failed. Failed agents are not respawned for a few minutes. (Prevents infinite spawning/death loops.) The timeout is defined in spawn.c as RespawnInterval (5 minutes) and RespawnCount (5 respawns). If the agent spawns faster than 5 times in 5 minutes, than mark it as a failure. It will remain a failure for RespawnInterval (5 minutes -- the variable is reused). NOTE: only abnormal deaths are counted here. If the scheduler intentionally kills a process (using ST_FREEING), then the number of spawns is reset and it should never reach ST_FAIL (even if it is spawned and killed rapidly). * **ST_FREE.** The agent is not spawned yet and has no I/O allocated. All agents begin in this state. * **ST_FREEING.** The agent was spawned, but has been told to die by the scheduler. It is now shutting down and has no I/O allocated. * **ST_PREP**. The scheduler is preparing a child data structure. The structure has allocated memory but has not yet been spawned. This step prevents a well-timed SIGCHLD from freeing the data structure before the state becomes ST_SPAWNED. * As an aside: Signals are boolean states and not queued. If three children die at once, then the parent only receives one SIGCHLD. Thus, the scheduler must scan every child when a SIGCHLD is called just in case there were multiple deaths. However, a new-dead child will look just like an old-dead child; there is no distinction. The ST_PREP state prevents an old-dead child from appearing as a new-dead and having its memory freed in the signal interrupt handler, while it was being allocated in the normal (non-interrupt handler) code. * **ST_SPAWNED.** The agent is spawned but not yet ready (I/O allocated). When the agent sends its first "OK", it will be transitioned to ST_READY. * **ST_READY.** The agent is live and ready for data. * **ST_RUNNING.** The agent is actively processing data. When the agent sends an "OK", it will be transitioned back to ST_READY. * **ST_DONE**. This is used by the MSQ table. Each SQL record has a status field and this indicates that the record is completed. When all MSQ records are completed, the MSQ job is done. * **ST_END**. This is an unused marked. Since states are numeric, this allows the code to loop over all possible states: for(i=0; i jq_type = "command" jq_args = parameters for command When the jobqueue's jq_type is the lowercase string "command", the parameters in "jq_args" are interpreted directly by the scheduler. The following jq_args are supported. * "shutdown". The scheduler will finish all running tasks, but not start anything new. When all tasks complete, the scheduler will exit. * "shutdown now". The scheduler kills all running processes and exits ASAP. * "killjob 1234". If the jobqueue item 1234 (jq_pk="1234") is currently being processed by the scheduler, then kill it and mark it as a failure. This is usually used when the user queues a job to be processed, then decided to delete the job while it is running. The front-end UI knows that the job is complete because it will be marked as processed in the jobqueue. ===== Building the Scheduler ===== The scheduler consists of 5 source files: * clients.c: Handles client communications. * dbq.c: Contains ALL DB accesses. If the function touches the DB, then it is here. MSQ results are managed here too. * hosts.c: Functions for managing host-based spawning. This keeps track of the number of spawns per host and whether new spawns are permitted. * sockets.c: The read() and select() functions for communicating with an agent over stdin/stdout. * spawn.c: Functions for spawning processes and handling signals. * scheduler.c: The main file -- handles configuration and the infinite control loop. To build the scheduler, use the Makefile. make clean # remove all compiled files (clean slate for a new build) make # build the scheduler sudo make install # install it to /usr/local/fossology/agents/ The make command should build without any errors or warnings (a clean make). NOTE: If you make any changes to the state machine labels (the ST_* definitions in spawn.h) then you **must** use 'make clean' before 'make'. (Someday we might introduce a 'make depends' file so code is compiled when all dependencies change.)