![]() |
FOSSology Advancing open source analysis and development |
FOSSology Backup and Restore Scope
Fully implement and document how to backup and restore a running fossology system including database, repository, and any necessary system configuration specific to fossology.
Will provide 2 solutions to backup and restore the repository:
Provide user a instructions about backup and restore entire repository solution.
1. Stop the Scheduler before backup(and verify that all the agents have stopped)
2. Backup the postgresql database
3. backup entire repository data to a backup server using rsync, include gold, files, license directory
4. Start the Scheduler after finished all backup
1. Restore the postgresql database
2. Restore the entire repository
3. Restart Scheduler
Notes:
Suggest user if you have enough disk space, we recommend user to use this backup solution. And I suggest we also list approximately backup and restore time will cost in the instructions, in order to give user to tradeoff.
We also provide a a solution that involves the backup and restore only Gold files, and the database. This is a good solution if user's don’t have enough disk space or don’t want to backup entire repo.
1. Stop the Scheduler before backup(and verify that all the agents have stopped)
2. Backup the postgresql database
3. backup only repository gold and license directory
4. Start the Scheduler after finished all backup
1. Restore the postgresql database
2. Restore only repository gold and license directory and don't do unpack in restore process
3. Restart Scheduler
4. Give a user interface if user want to reunpack the gold files
To select which unpacked files we should save and which unpacked files we should not save but now in the repository, removing the unpacked files should not save in repository, implement this in backup scope.
Open question: Which configuration files need to be backed up?
Don’t consider backup the system configuration files, only adding Notes in the backup and restore procedures document. FOSSology needs to backup are Scheduler.conf, Host.conf, Db.conf, RepPath.conf
**Question**: The configuration file detail backup requirements, when to backup the configuration file (same time with database and repository backup)?
Answer bobg Aug-3-09: If these files are lost, the user should recover them from their normal system backup. If they are lost due to a system failure, they have bigger problems than restoring fossology.
Design, Review, Build, test, and document any tools, agents, or plugins that are necessary to enable the backup and restore process documented in #1
bobg Aug-3-09:
Does it need to me more complex than this?
Implement the proposed only backup and restore gold files strategy in the two running FOSSology production systems (external and internal systems)
Question: What’s production systems configuration and deployment, should further understand the infrastructure of production system?
I drew a picture of the external FOSSology Production system deployment, please review the diagram and add comments.
Question: What’s the relationship between an agent and its storage? Is the storage in agent's local disk or network file system?
When loose the agent, how should the FOSSology cluster react?
Note: This brings up some very important questions:
Mulit-system usual deploy method: Distributed agents and distributed repository
The following is an old conversation about backing up an internal machine.
Additional notes from 11/13 meeting:
Notes on document procedure to backup the fossology metadata from 2008.05.6 IRC dicussion
<danger> BTW, do we have an easy way to let users back up their FOSSology database?
<danger> or is that documented anywhere?
<taggart> danger: no, I raised this issue 6+ months ago
<bobg> danger: that's in the postgres docs
<taggart> danger: since we need it ourselves, we're not doing backups of fossology yet
<danger> bobg: I know Postgres has a way to let you back things up, but it would probably be a good idea to summarize the fossology specifics
<danger> bobg: and somewhere down the road (a long ways) create a “Back up my FOSSology data” menu item
<danger> taggart: ack.
<danger> taggart: we now have a way to capture this :)
<danger> sorry for the delay
<taggart> I had proposed having fossology automatically dump state to the filesystem, so that a normal filesystem backup could grab those snapshots
<taggart> by that I mean the fossology postgresql db
<taggart> I think the repo and golden area, etc should be fine with a normal filesystem backup
Update 7/17 danger
Postgres backups are now in force on our repos. This is accomplished with a simple
pg_dumpall | gzip > backup_filename
put into a cron job on the repository system.
For gold files, in any large system there are a lot of gold files. it would be advantageous to only store the source URL for all gold files that have it set. Only backup the physical gold files for those that have no source URL.
Note this has not been implemented yet. it would require querying the database to find out where gold files came from.