![]() |
FOSSology Advancing open source analysis and development |
|
Table of Contents
Repository (0.6.1)The file repository is used to store the actual files loaded into the FOSSology system. While the Database stores meta information about files, the Repository holds the actual files.
Although sha1 and md5 are relatively unique hashes, there is still a possibility of a hash collision. The working belief is that, while the triplet could have a collision, it is extremely unlikely. Some notes about filenames:
DirectorySince the repository can store hundreds of thousands of files, we want a quick way to organize the contents. The selected method is based on octets. For example, the file ffe1cd8dd6b0b4c031262402ab0375ee876b17cb.732fe0681bc974f1075c4bee147c91f8.4232 is stored in the directory “/ff/e1/cd/”. NOTE: If the filename is shorter than the number of characters needed for the path, then the path is padded with underscores. The filename “abcde” would be stored as “ab/cd/e_/abcde”. TypesThe repository must store many different types of files. The type of file determines the contents and the tool that uses it. Some example types:
The type of file is prepended to the directory tree. Thus, the example file could be found under “/files/ff/e1/cd/”. The different types are not static – new types can be created at any time. (The type is specified when using the tool – see below.) Some notes about types:
HostsIn order load balance storage and processing, the files in the Repository can be distributed across NFS-mounted hosts. A host configuration file specifies which host actually stores which files. The hostname is prepended to the path, so there only needs to be one mount point per host. For example, “/host1/files/ff/e1/cd/”. Note: If there is no host configuration file entry, then no hostname is prepended to the path. ConfigurationThe directory for storing the repository configuration files is /srv/fossology/repository/. If this does not exist, then “.” is used. This can be changed (for testing) by specifying the environment variable “REPCONF”. This should contain the path to the repository. In the repository configuration directory should be 3 files:
For example: host1 test 00 7f host2 test 80 af host3 test b000 b080 host1 gold 00 7f host2 gold 80 ff host4 * 00 ff In this example, the ‘test’ file (file type = “test”) “b081cd8dd6b0b4c031262402ab0375ee876b17cb.732fe0681bc974f1075c4bee147c91f8.4232” would be stored on host4, but the same filename would be on host2’s ‘gold’ repository. Some notes about using the Hosts.conf configuration file:
Repository ToolsThe following command-line tools exist for managing the repository: rephost type sha1.md5.lengthThis displays the hostname where the file would be found or stored. This is used for optimizing processing by running a process on a local host rather than accessing files remotely. If no hostname is found, then localhost is returned. Note: This does not check if the file exists. It only says where the file could be found. reppath type sha1.md5.lengthThis tool displays the path to the file (reading, writing, or debugging). Note: This does not check if the file exists, or even if the directories are valid. It only says where the file could be found. repexist type sha1.md5.lengthDetermine if the file exists in the repository. This is for use in shell scripts: returns “0” for yes, “1” for no. repcat type sha1.md5.lengthIf the file exists, cat the contents to stdout. repwrite type sha1.md5.length < inputCreates a file in the repository. repcopyin type source sha1.md5.lengthecho ‘source sha1.md5.length’ | repcopyin type cat ‘XML from ununpack’ | repcopyin type XML Bulk-populates the repository. There are three use options.
All files are inserted into the repository. But, if the file already exists, then it is not copied in again. (This is for a speed improvement.) The program displays the total number of files imported, duplicated (not imported), and errors (failed to import). Repository LibraryThe repository is managed by a C library: librep.a and librep.h. This library contains the following common functions: REPCONF environment variableThe environment variable REPCONF specifies the configuration directory for the repository. If this is not set, then /srv/fossology/repository/ is used. (And if that does not exist, then the current directory (”.”) is used.) int RepOpen ();Since the repository configuration files may be accessed by every function call, we don’t want to call fopen/fclose millions of times. This opens and sets up global variables. You should call this first – but if you forget, then it is called anyways by all of the other repository functions. Returns 1 if it is configured, 0 if configuration failed. void RepClose ();This closes all global variables. It is proper to call this when you are done, but if you forget... shared memory will not be lost. NOTE: If you want to refresh the configuration, then call: RepClose(); RepOpen(); char * RepMkPath (char *Type, char *Filename);Allocate a string containing a path for the type and file. Returns a string, or NULL if the type/filename is invalid (or an allocation error occurs). The depth of the path is determined by the value in $REPCONF/Depth.conf. If this file does not exist, then the default is “2”. The caller is responsible for calling free(). char * RepGetRepPath ();Allocate a string containing the path to the top of the repository. Returns NULL if an error occurred. The caller is responsible for calling free(). char * RepGetHost (char *Path, char *Type, char *Filename);Allocate a string containing the hostname where the file is stored. The hostname is determined from the $REPPATH/Hosts.conf file. Returns a string if the hostname was found. Returns NULL if there is no hostname OR if an error occurred. The caller is responsible for calling free(). int RepExist (char *Type, char *Filename);Determines if the type+file exists in the repository. Returns 1 if it exists. Returns 0 if it does not exist. Returns -1 if an error occurred. int RepHostExist (char *Type, char *Host);Determines if the type+hostname exists in the repository. This is useful for determining of this particular host stores any files of the given type. Returns 1 if it exists. Returns 0 if it does not exist. Returns -1 if an error occurred. int RepRemove (char *Type, char *Filename);Remove a file from the repository. Returns the result from unlink() – 0 on success. If there is an error, then a non-zero value is returned. FILE * RepFread (char *Type, char *Filename);This is a replacement for fopen(filename,”rb”). It returns a FILE pointer to the type+filename, or NULL on error. The caller should run RepFclose() when they are finished. FILE * RepFwrite (char *Type, char *Filename);This is a replacement for fopen(filename,”wb”). This function will also create the repository’s directory if it is needed. It returns a FILE pointer to the type+filename, or NULL on error. The caller should run RepFclose() when they are finished. int RepFclose (FILE *F);This is a replacement for fclose(FilePointer). This returns the value from fclose(). int RepImport (char *Source, char *Type, char *Filename, int HardLink);This is a really fast file copy. If HardLink is set (not zero), then it will use a hard link before trying a regular file copy (making it REALLY fast). The contents from Source are copied into the repository. This returns 0 on success, non-zero on failure. RepMmapStruct * RepMmap (char *Type, char *Filename);This is a replacement for mmap(). The file is opened for read-only access! Do not use this command to create a new file. It allocates and returns a structure containing the mmap handle: struct RepMmapStruct
{
int FileHandle; /* handle from open() */
unsigned char *Mmap; /* memory pointer from mmap */
int MmapSize; /* size of mmap */
};
typedef struct RepMmapStruct RepMmapStruct;
The caller must call RepMunmap() to free the structure. RepMmapStruct * RepMmapFile (char *Filename);Similar to RepMmap(), but takes a full filename as a parameter rather than a repository entry. (Technically, this is used by the RepMmap() function.) void RepMunmap (RepMmapStruct *M);This un-mmaps and deallocates the RepMmapStruct variable created by RepMmap() and RepMmapFile(). FOSSology Project documentation is licensed under the GNU Free Documentation License Version 1.2 | |||