![]() |
FOSSology Advancing open source analysis and development |
|
Table of Contents
ReadMeHow to configure and set up nightly Freshmeat updates. OverviewThe Freshmeat process consists of a number of some what independent programs all tied together and automated by a shell/cron script. The process starts with initially loading the top 1000 projects. After the initial top 1000 projects have been obtained and uploaded. When the cron job is set up and activated, only the projects in the top 1000 that have had a version change will be uploaded. This is accomplished by using the GetFM script. GetFM should be scheduled with cron to get the daily Freshmeat RDF file and process it. GetFM will:
If you use the examples below, then setting up the nightly updates using the GetFM script will be easier. Set Up
The Freshmeat process expects the following directory setup: <path>/Freshmeat
<path>/Freshmeat/Rdfs
Input-Files
Run-logs
If you have changed the makefile (see above), then the GetFM script will make the above directories. GetFM will also create a golden directory. A directory called golden{Date}, where date is yyyy-mm-dd format, is created in the Freshmeat area for each run, all data for a run is kept in the golden.yyyy-mm-dd directory. For the example below: the $DIR = /srv/fossology/repository/Curly/Freshmeat Get the initial project seed
wget -o /tmp/wget-log $DIR/RDfs/fm-projects.rdf-20007-12-9.bz2 http://freshmeat.net/backend/fm-projects.rdf.bz2
bunzip fm-projects.rdf.bz2
Top1k-yyyy-mm-dd for example, Top1k-2007-12-09.
get-projects -f /srv/fossology/repository/sneezy/Freshmeat/Rdfs/Top1k-20007-12-09 The above will create a directory called golden.20007-12-09 all results will be stored in this directory. There are three subdirectories in the golden directory:
cp2foss -f <path>/goldenxxx/Input-files/Freshmeat_to_Upload
Once the initial 1000 projects have loaded, the cron job should be set up so that changes are checked once a day and uploaded to the repository/db. There is alredy an existing crontab file, it’s called fm.cron and is stored with the freshmeat sources, (that is the sources for the process that harvests Freshmeat.net). See below for a complete discussion of the steps needed. Setting Up daily Runs
The contents of the above files is the path to the current and previous days top 1000 Freshmeat projects xml file respectively. These files are also in the <path>/Freshmeat/Rdfs/ path. Using the above example, the initial projects were obtained (the top 1000). The next update run was on 12/05. At this point the two files should contain: Current-top1k: <path>/Freshmeat/Rdfs/Top1k-2007-12-06 Previous-top1k: <path>/Freshmeat/Rdfs/Top1k-2007-12-05 After that when the GetFM script runs on 12-06 the GetFM script will keep the two files updated, and the above process should not need to be done. It’s always a good idea to check on things to make sure it all worked. This is the hardest part to get right in the set up. The Current-top1k and Previous-top1k files may need to be adjusted if the cron is stopped for more than 1 day. Technically the diffs should just get bigger, but it doesn’t always seem to work that way. Again checking on the process is encouraged.
That’s it! Monitoring Activity–OUTLINE–
FOSSology Project documentation is licensed under the GNU Free Documentation License Version 1.2 | |||