====== foss-scheduler ====== The fossology scheduler daemon. This daemon runs the fossogy job input queue and agents. The scheduler keeps track of three things: jobs, agents, and running tasks. If assigns each job to an agent, and ensures that there are not too many jobs running at the same time. The logic works as follows: * Read in a job from the jobqueue (or from stdin if -I is used). * Check if there is an available agent already running. If so, then use it to process the job. * If there is no available agent running, check if there is room to spawn a new agent. If so, spawn the new agent and give it the job. * If there is no room, see if some other kind of agent can be killed to make room. (If an agent is sitting unused, then kill it.) Then spawn the right kind of agent and give it the job. * Otherwise... Hold onto the job until it can be assigned to an agent. For a detailed discussion of foss-scheduler see [[scheduler]] under Developer Docs on the fossology home page. The scheduler must run as a member of group 'fossy'. If it is started as root, then it will immediately change itself to run as user 'fossy' in group 'fossy'. ===== SYNOPSIS ===== /usr/local/fossology/agents/foss-scheduler [options] [setup.conf] < 'type command' Common usage: /usr/local/fossology/agents/foss-scheduler -d -L /tmp/foss-scheduler.log ===== DESCRIPTION ===== This is the fossology job scheduler. When an upload is analyzed by fossology it is the scheduler that takes the upload and schedules a number of agents to operate on that upload. The most basic agents do things like unpack the archive and store it in the repository and the data-base. Other agents perform tasks like license analysis, and meta-data analysis. The scheduler uses a configuration file to schedule the agents. This allows for flexible scheduling depending on the machines resources available. Usually the agent configuration file is created when fossology is installed. If setup.conf is not specified then /usr/local/share/fossology/agents/foss-scheduler.conf is used. Advanced users or programmers adding agents to the system may need to regenerate this file or edit it directly. A custom file can also be used. The setup.conf file defines each kind of known agent and how to run it. The list of jobs to run comes from the database's jobqueue table. Alternately (for debugging), -I can be used to specify the jobs to run using stdin. ===== Configuring the Scheduler ===== The scheduler uses a configuration file to specify the number of processes per host and each agent. A configuration file creator script, [[mkconfig]], is available to aid in the creation of this file. For example: %Host localhost 2 1 agent=wget host=localhost | /usr/local/fossology/agents/wget_agent agent=unpack host=localhost | /usr/local/fossology/agents/engine-shell unpack '/usr/local/fossology/agents/ununpack -d /home/repository//ununpack/%{U} -qRCQx' agent=filter_license host=localhost | /usr/local/fossology/agents/Filter_License agent=filter_license host=localhost | /usr/local/fossology/agents/Filter_License agent=license host=localhost | /usr/local/fossology/agents/bsam-engine -L 20 -A 0 -B 60 -G 10 -M 2 -E -T license -O n -- - /usr/local/share/fossology/agents/License.bsam agent=mimetype host=localhost | /usr/local/fossology/agents/mimetype agent=mimetype host=localhost | /usr/local/fossology/agents/mimetype agent=specagent host=localhost | /usr/local/fossology/agents/specagent agent=filter_clean host=localhost | /usr/local/fossology/agents/filter_clean -s agent=pkgmetagetta host=localhost | /usr/local/fossology/agents/pkgmetagetta agent=pkgmetagetta host=localhost | /usr/local/fossology/agents/pkgmetagetta The format of the configuration file is as follows: * Lines beginning with a "#" are comments. * Lines beginning with a "%" are settings. * %Verbose specifies the verbose level (same as using "-v" on the command-line). %Verbose 2 is like "-vv". * %Host lists a host name, the number of agents that can run at a time, and the number of urgent (additional) agents that can run. Currently "urgent" is implemented but not used and not tested. * All other lines define agents. These use two parts: attributes | command. * There is one line per agent. If you want to permit three unpack agents on the same host, then you will need to have three of the exact same line! * Agents are tracked by a unique ID in an array. Each line is assigned one position in the array (the first line is 0.) * Attributes are strings used to match an agent. (They may look like "field=value" pairs, but they are really just strings.) There are some well-defined attributes: * agent=name. This comes from the jobqueue agent table and specifies the type of agent. * host=name. This comes from an MSQ request and specifies the hostname to run on. NOTE: The name is just a string! For usability, I named the string after the host's name, but this is not a requirement! It could just as easily use "%host foo 4 1" and "agent=wget host=foo | ssh bar". The string is only used to identify the correct agent line, not to specify the actual hostname! * A vertical bar (|) separates the attribute list from the command. * The command will be used by system() to run the agent. * Each command is also passed an environment variable "$THREAD_UNIQUE". This specifies the unique thread number for the process. NOTE: It is unique for the current running, but if the child dies then the value will likely be reused. In some situations, this is bettern tha $PID or $PPID for managing any temporary files. * Some commands may appear to contain macro expansion variables, like ${U} or ${*}. However, these are not processed by the scheduler. They are processed by the agent. (In this case, the agent is called "engine-shell" and is used to run ugly-hack agents from shell scripts.) * Commands can be shells around agent processes. For example, "engine-shell" is an agent-aware wrapper for shell scripts. Similarly, you can use "ssh" (specify the full path!) to run a command on a remote host. Remember: The command that is executed is independent of the attribute string "host=". * (For debugging) With the -I option, stdin lists the jobs to run. * stdout comes from threads, non-interlaced and only when thread ends. * stderr comes from threads, interlaced and immediate. ===== Options ===== * **-i**. Initialize the database, then exit. * **-k**. Kill all running schedulers (on this system). -k kills the foss-scheduler itself. All other options are ignored. * **-d**. Run as a daemon. This is the standard way of starting the scheduler by hand. When invoked as a daemon, foss-scheduler still generates stdout and stderr. * **-H**. Ignore hosts for host-specific agent requests. * **-I**. Use stdin and queue. The default is to use the queue only. This option is useful if when debugging foss-scheduler itself. * **-v**. Verbose (-v -v is more verbose). This will produce a lot of output that can be useful when debugging or trying to see why an agent is dying. If the output is going to a log file, the file can get really large fast. Plan on the appropriate amount of space. * **-L log**. Send stdout and stderr to log. It is always a good idea to log the scheduler output. Typically this is already done for you when fossology is started. When starting foss-scheduler by hand this option should be specified, or there will be no log output (only to stderr). * **-q**. Run quietly. Normally the scheduler outputs when an agent status changes, such as from "FREE" to "RUNNING". * **-R**. Reset the job queue in case something was hung. Normally partially processed jobqueue items are automatically cleaned up by the scheduler after the scheduler detects abandonded jobs in the queue. However, this detection may take 10-20 minutes. Use -R to reset the queue immediately. * **-t**. Test every agent to see if it runs, then quit. (Great for debugging the configuration file and system setup.) * **-T**. Test every agent to see if it runs, then continue if no problems are found. ===== Examples ===== The standard way to stop foss-scheduler is usually done with sudo. sudo /etc/init.d/fossology stop Start the scheduler, checking the sched.conf file for problems. This is useful when changes have been made to the sched.conf file and one wants to verify that all agents will run with the new file. sudo /usr/local/fossology/agents/foss-scheduler -t Start the scheduler using a log and in verbose mode. sudo /usr/local/fossology/agents/foss-scheduler -v -d -L /tmp/foss-scheduler.log ===== Author ===== Neal Krawetz