FOSSology
Advancing open source analysis and development

How to Interpret the Job Queue

A common misconception about FOSSology is that an uploaded file (project, tarball, zip file, ISO, etc) is immediately analyzed and available for viewing in real-time. In fact this is not the case.

Within FOSSology, Agents are responsible for processing and analyzing uploaded files. The time it takes for an agent to run can vary from a few milliseconds to many hours, depending on the type of agent and the size of the file being processed. The completion time also depends on what agents are already running within FOSSology, and the order in which new uploads were queued for processing.

Because of the wide range of processing times for agents, and the asynchronous nature of uploads and analysis, the operation of the FOSSology system is controlled by a Scheduler. The scheduler is responsible for queuing and running jobs.

Accessing the Job Queue

To access the job queue, click on the “Jobs” menu and select the “Queue” sub-menu. Choose the “Summary” menu-item to view a summary of the FOSSology job queue.

This presents a list of all active jobs. A job can be in one of 6 color-coded states:

Job StateDescription
QueuedThe job has been queued, but not yet scheduled/run
ScheduledThe job has been scheduled to run, or is currently running
RunningThis state is not currently reported
FinishedThe job has completed successfully
BlockedThe job has been blocked by a job upon which it depends
FailedThe job did not complete successfully

In this example, a series of jobs have been queued because a user has created an upload of Nagios version 3.0.2 from Sourceforge.net.

Job 4562 is the wget agent, which is responsible for retrieving the Nagios source code from Sourceforge.net via the world wide web. It has been queued, but not yet scheduled or run by FOSSology.

Job 4563, the unpack agent, is responsible for unpacking the Nagios source code. Clearly, it cannot run until the wget agent has completed and downloaded the Nagios source code from Sourceforge into the FOSSology repository. This relationship is indicated in the left-hand most column, which lists the job number (4563) followed by a slash, followed by the job number it depends on (4562). In cases where a job has no dependency, only the job number itself is listed.

Clicking on the Refresh link in the upper-right of the page just below the “Logout” button will refresh the job status page.

Now a few moments have passed and some of the jobs originally seen in the Summary have run and completed. The wget and unpack jobs are no longer listed, because they have completed and are no longer active. (Note: They can still be viewed in the Job history)

In this case the spec agent (job #4573) is now scheduled and/or running. This agent is responsible for analyzing any RPM Specfile artifacts found in an upload. Also the pkgmetagetta agent (job #4571) is scheduled and/or running. This agent is responsible for mining any metadata known to the libextractor library.

Note that the license and filter_clean agents have not yet run, even though they appear “before” the spec and pkgmetagetta jobs in the Job Summary listing. This further illustrates the asynchronous nature of job scheduling in FOSSology.

Viewing detailed job information

By clicking on the “Detail” link in the Job Queue mini-menu, a more detailed view of active jobs can be seen:

In this case, the license agent is scheduled to run on Nagios 3.0.2

The detailed job listing shows additional information about scheduled jobs:

  • Number of items processed: For most agents, they will report the number of files that have been processed as part of the job. This includes the ununpack and license agents.
  • Elapsed scheduled: How long (in Hours:Minutes:Seconds) has the job been held by the scheduler. This indicates how much time has passed since the scheduler picked up the job. The scheduler may pick up a job (change its status from “queued” to “scheduled” before an agent slot is available to run it. In this case the job will not actually begin running, even though it is scheduled. Currently FOSSology does not report whether a scheduled job is running or not. It can be assumed that a scheduled job will begin running imminently, but it depends on how many resources are available to actually run jobs.
  • Elapsed running: How much time agents have spent working on the job (in hours:minutes:seconds). This number is independent of parallelism. For example, if a job spent 12 minutes in the license agent, then it will report 12 minutes. It doesn’t matter if the job was split in parallel between 4 agents that took 3 minutes each. A good way to interpret this number: If you only had one agent, it would take this long.

If there is no delay in the queue, then we should see the “Elapsed running” and “Elapsed scheduled” be the non-parallelized and parallelized values of this same processing time. E.g., if you have 4 agents in parallel, then “Elapsed running” should be 4 times larger than “Elapsed scheduled”.

However, these numbers are not always perfectly synchronized. For example, if TWO different jobs are using the same resource, then “Elapsed running” will be correct for each, but “Elapsed scheduled” will be noticeably larger (the job sat in the scheduler for a longer time because there were no free agents available for the task).

Finding the status of an Upload

It is often helpful to view the processing status of a specific recent upload. The scheduler can report this information.

To view the status of an upload, select the Jobs menu, and “Queue” sub-menu. Choose “By Upload”

This will allow you to select which upload to view.

First, select the folder within the repository where the upload was placed. You may view all of the folders in the repository from the drop-down menu.

In this case we will select an upload that was placed in the folder called “Danger”.

Once you select a folder, FOSSology will automatically supply a list of all uploads located in that folder. You can then select the specific upload to view:

Click on the “View Jobs!” button to view the jobs associated with a specific upload.

You will see a list of all jobs (both historical and currently queued/scheduled/running) associated with the selected upload. In this case we can see that the Nagios 3.0.2 upload has completed the wget agent, the unpack agent, the Default Meta Agents agents, and the Meta Analysis agents. Some of the components of the License agents are still running – namely the license, filter_clean, and one of the sqlagents.

Manually scheduling an agent to run on an Upload

Occasionally it may be desirable to upload a project, but not immediately schedule any analysis. At other times, it may be necessary to re-run an analysis on an upload. You can easily use the scheduler to manually schedule an agent to run on an upload at any time.

To manually schedule an agent to run, select the “Jobs” menu, and the “Agents” item.

This will provide a page for selecting the upload and agent to run.

First select the folder containing the upload that you wish to schedule an agent for. The drop down menu provides a list of all the folders in the the repository. Once you select a folder, FOSSology will present a list of all uploads in that folder.

In this case we’ve selected the ext-2.1.zip upload in the folder called “Danger”.

Once an upload is selected, FOSSology provides a list of all the agents that may be scheduled for that upload. With most web browsers, you may select multiple agents from the list by holding down the Control key and clicking on the item(s). Note that if an agent is already queued or scheduled for an upload, it will not be available to schedule manually.

Selecting Jobs –> Queue –> Summary shows that the agents have been scheduled for the upload:

You can re-run agents that have already been run. The results from existing analysis will be replaced by the new results, once the agent has completed. There is currently no way in FOSSology to store multiple versions of an analysis (as of version 0.8.0).

Viewing system-wide agent status

It is sometimes useful to view the status of all agents available to the system. This can be accomplished by selecting the “Admin” menu, and then choosing the “Scheduler” sub-menu. Then click on the “Status” item.

This will present a view of all instances of all agents either waiting on a job, or processing a job.

  • The first column, “Update Time”, shows the last heartbeat received from an agent.
  • The second column, “Status”, shows the status of the agent, which can be “RUNNING” (Agent is currently processing a job), “READY” (agent is preparing to run a job that has been scheduled), or “FREE” (agent is idle and waiting to be assigned a job by the scheduler).
  • The third column shows the name of the agent. Note that the scheduler itself will be listed here as an agent.
  • The fourth column shows the hostname of the system where the agent is running. This can be useful when FOSSology is installed in a multi-host environment. For more information on multi-system setup, refer to Multi-system Setup and the FOSSology README file distributed with the software.
  • The fifth column shows the parameters supplied by the scheduler to scheduled or running agents. (Note, no parameters will be listed for “FREE” agents).

Viewing Job Queue History

You can view the complete history of the jobs that have been queued, scheduled and run. Simply click on the “History” link in the mini-menu above any Job Status page, such as described above in “Accessing the Job Queue”.

This shows all of the jobs already run, organized by upload, and then job name. You can use this history to review jobs that have completed or failed. You within the job history view, you can switch between “Summary” and “Detailed” view using the links in the mini-menu at the top.

 
how_to_interpret_the_job_queue.txt · Last modified: 2008/05/27 12:09 by danger

Copyright (C) 2007-2008 Hewlett-Packard Development Company, L.P.
FOSSology Project documentation is licensed under the GNU Free Documentation License Version 1.2
Recent changes RSS feed Valid XHTML 1.0 Valid CSS Driven by DokuWiki