![]() |
FOSSology Advancing open source analysis and development |
|
Table of Contents
FOSSology: Multi System SetupFOSSology: How To Install Multiple HostsNotes: Hosts.conf, Proxy.conf & Scheduler.conf must be created/modified after running make install and prior to running postinstall FOSSology: How To Configure Multiple HostsThe scheduler and repository are designed so they can be distributed across multiple hosts. There are few reasons for doing this:
The ideal configuration distributes the repository across hosts and runs agents on those hosts. This way, the data used by the agents is local rather than transferred over the network. Part 1: The RepositoryThe repository (repo) is just a directory on the file system. The directory’s location is defined in the RepPath.conf file. (I’ll refer to the path as $Repo in this document, but the default location is /usr/local/share/fossology/repository/RepPath.conf.) The layout of the repository is as follows: $Repo/host/##/##/##/files Where “host” is just a string (not required to be a hostname) and “##” is a hexadecimal number. For example: $Repo/localhost/01/e4/2f/01e42f923c85.txt The Hosts.conf file (default: /usr/local/share/fossology/repository/Hosts.conf) identifies the name of the host and the directories under it. For example: ========== sirius * 00 7f buckbeak * 80 ff ========== This will create two directories: $Repo/sirius/ The subdirectories are the range 00 to 7f. $Repo/buckbeak/ The subdirectories are the range 80 to ff. Now you can use $Repo/sirius/ and $Repo/buckbeak/ as mount-points for remote file systems. The separation of 00-7f and 80-ff should generally split the repository in half. (The split may not be equal in size, but it should be close.) The subdirectories are named after the SHA1 checksum of the files, so this should be a fairly even split due to random data. The repository must be writable by the group “fossy”. To ensure that all files are group accessible, the directories should be set with the permissions “g+rwxs”. By setting the SGID big (g+s) on the directory, all files and directories will regain the group permissions. The big catch here is that all mounted filesystems must use the same group ID for “fossy”. Ideally, the top directories should be owned by user “fossy” and have the same user ID on all systems. Part 2: The SchedulerHowever, just because you split the repository across mount points does not mean you are done. The scheduler.conf file (default: /usr/local/share/fossology/agents/scheduler.conf) lists the host strings where jobs should be used. For example: agent=filter_clean host=localhost | /usr/local/fossology/agents/filter_clean -s If you change Hosts.conf, then you will need to change the “host=” strings in scheduler.conf to match the names in the Hosts.conf directory. There are three different scenarios: Scenario 1: Distributed Repo, Local AgentsIf you lack disk space for the Repo on the local system, you can distribute the repository and still use the local CPUs for running agents. This configuration is not ideal since all communication to the repository will be done over the network (significant speed impact). However, if you need the disk space then this is an option. The simplest solution is to edit the scheduler.conf and simply remove all of the “host=” tags. For example: agent=filter_clean host=localhost | /usr/local/fossology/agents/filter_clean -s will become: agent=filter_clean | /usr/local/fossology/agents/filter_clean -s This tells the scheduler to ignore host designations for the agent and just run it locally. The repository files will be used regardless of where they are remotely hosted. Scenario 2: Distributed Repo, Distributed AgentsThis is the best, usual, and expected scenario since agents can run on the same systems as the repository data. In the scheduler.conf, you will need to change the “host=” lines and add additional lines for additional agents. The easiest way to do this is with by using the mkconfig program and SSH.
For example, if fawkes has 8 CPUs and buckbeak has 4, then you can use: mkconfig -C 8 -R '/usr/bin/ssh fossy@fawkes "%s"' -H fawkes \
-C 4 -R '/usr/bin/ssh fossy@buckbeak "%s"' -H buckbeak \
> new_scheduler.conf
Then, if new_scheduler.conf looks good, you can replace scheduler.conf and restart the scheduler. A few caveats about mkconfig and scheduler.conf:
Scenario 3: Local Repo, Distributed AgentsIf you have lots of CPUs, but only one repository, then you can either pretend to distribute the repository, or just replicate agents. If you choose to pretend to distribute the repository, then it will look just like Scenario 2, except that you will only mount one directory rather than multiple directories. In this configuration, each CPU is assigned a certain range of files to process. This means, some CPUs may go unused. Alternately, you can replicate agents. In this case, you will need to manually edit the scheduler.conf file.
mkconfig -R “/usr/bin/ssh fossy@sirius ‘%s’” -B
%Host localhost 18 1 This tells the scheduler to run at most 18 jobs at once. The various agent lines say where to run the jobs for specific agents. The multiple agent lines indicate that multiple agents can run on the same hosts. For example, this scheduler.conf runs up to three copies of Filter_License on buckbeak and two on fawkes. # 3 CPUs on buckbeak agent=filter_license | /usr/bin/ssh fossy@buckbeak "/usr/local/fossology/agents/Filter_License" agent=filter_license | /usr/bin/ssh fossy@buckbeak "/usr/local/fossology/agents/Filter_License" agent=filter_license | /usr/bin/ssh fossy@buckbeak "/usr/local/fossology/agents/Filter_License" # 2 CPUs on fawkes agent=filter_license | /usr/bin/ssh fossy@fawkes "/usr/local/fossology/agents/Filter_License" agent=filter_license | /usr/bin/ssh fossy@fawkes "/usr/local/fossology/agents/Filter_License" A few caveats to remember:
Testing the ConfigurationWhen you have finished configuring the scheduler.conf, you can test it with the scheduler command. As the user “fossy”, run this command: /usr/local/fossology/agents/scheduler -t This will attempt to spawn every agent. If there are any errors, it will tell you which command failed. Some common failure causes:
FOSSology Project documentation is licensed under the GNU Free Documentation License Version 1.2 | |||