PuppyLinux : DistributedPackageRepository

PuppyLinuxMainPage :: Categories :: PageIndex :: RecentChanges :: RecentlyCommented :: Login/Register

Untrustworthy Puppy Linux Distributed Package Repository

Request For Comments On Design


latest author
Will Davies - initial scoping (please add your name below this if you edit this document)

Introduction and Scope

This document describes the design of a distributed package repository for puppy linux software. It is not intended that it be used to discuss the design of end user client software though clearly some of the interfaces provided by the repositories are intended for use by this software. A collection of repository servers should be capable of providing a stable mirror system that is tolerant of the failure of multiple nodes.

Requirements and Constraints

  1. The system should be capable of being implemented on a number of untrusted web servers with serverside scripting technology such as php using the HTTP protocol. Availabilty of an sql database cannot be guaranteed, nor can access to a full shell or cron though these technologies might be used in some implementations to achieve automation and to give faster/more efficient responses.
  2. The repository system should offer an accessible means for members of the community to distribute software that they have written themselves or compiled from publicly available sources.
  3. It should not offer guarantees about the quality of the software that is distributed. However it should be capable of distributing digitally signed package checksums that can be used to confirm the original creator of the package inconjuction with a user provided pgp public key.
  4. A node should offer a user interface to upload new packages. This interface should be capable of collecting the full complement of package metadata to publish it. Each new package should be assigned a unique id that is namespaced by the node id to avoid any collisions across the entire distributed repository.
  5. A node should be capable of existing in isolation from the other repositories.
  6. A node should be capable of publishing its current health/status.
  7. A node should be capable of checking the status of other nodes and publishing status information about itself and others. This must include being able to maintain a list of the packages stored on other repositories. Note even if the node does not mirror packages, as outlined in the clause below, it should still provide details of the packages stored on other nodes.
  8. A repository may choose to download a copy of the packages available on another repository and make them available. This clause is optional to allow the administrator of a repository to choose not to mirror nodes which they feel provide software that they do not wish to distribute. In mirroring an external package a repository must not change the id of the package. This is to avoid the confusion of multiple copies of the same package.
  9. It is probably safer for a node to only mirror packages that it has downloaded from the original publishing node. This causes problems if the original node dies.
  10. The system should include a delete mechanism for packages. The original publishing node of a package should be able to propagate a deletion request through the entire system. This needs further detail. Should nodes poll for deletion events or should they provide an api to receive delete requests. This mechanism should not allow a compromised or malicious node to carry out mass deletions. Maybe request should be queued for node administrator approval with a reason for the request. How should delete requests be initiated if the original publishing nod is dead.
  11. It should be possible to migrate packages from the old system to the new one. In practice this means it must be possible to create a minimal metadata file for each dotpup, dotpet or tgz automatically. This proccess should include infering categories from the current file hierachy based system.

API

Server-Server

Many of these can be implemeted as vanila files that are updated at regular intervals.
  1. server-health.txt (not sure of formatting)
  2. known-mirrors.txt (not sure of formatting, probably needs to be names and urls)
  3. packages-ids.txt (I'm not sure what use this is without any other data but it is cheap to build and might prove useful)
  4. your-packages-ids.txt
  5. your-new-package-ids-last-24-hours.txt (guaranteed to cover the last 24 hours worth of updates, though might be longer)
  6. your-new-package-ids-last-week.txt (guaranteed to cover the last weeks worth of updates, though might be longer)
  7. list-deleted-package-ids.txt
  8. list-deleted-package-ids-last-24-hours.txt
  9. list-deleted-package-ids-last-week.txt
  10. get-package-metadata-for-packages comma separated list of packages (dynamic, might want to impose a limit on the maximum number of ids allowed)

Server-Client

  1. get-package-packageid (Should return a single pup/pet/tgz for instalation. May also be used by nodes wishing to download a package to mirror it. This could be possibly be achieved by a directory full of symlinks to all the install files, though this runs the risk of running out of inodes. I suspect this needs to be dynamic)
  2. known-mirrors-packageid (returns a list of nodes where this package is held, should this be node names or urls)
  3. search TERM (dynamic, ideally this should return all of the metadata for each package that matches the search, though this might be too big, could be paginated, could just return ids and then require client to call get-package-metadata-for-packages. TERM could also be expanded in the future to allow field specific searches.
  4. get-associated-files packageid (should return a list of urls to the files, this allows client to request more information about a package)

Package Format

The package format should allow more than one file to be included in a package. For example firemonkey-0.5.pet, firemonkey-readme.txt, my-screenshot.png should all be associated under the same package id. This could be achieved by storing each package in a separate directory with the package id as its name.

Possible Directory Structure

The package metadata should be stored in a separate file packageid.txt. ie for a node called homer

node
packages (this will probably need to be split into sub direcories to avoid running out of nodes or making it hard to browse)
homer0001
firemonkey-0.5.pet
firemonkey-readme.txt
firemonkey-links.htm
my-screenshot.png
homer0002
seamonster-2.3.pet
metadata (this will probably need to be split into sub direcories to avoid running out of nodes or making it hard to browse)
homer0001.txt (contains metadata for package homer0001)
homer0002.txt (contains metadata for package homer0002)
bart0001.txt (contains metadata for package bart0001 uploaded to node bart)
nodes
homer0001.txt (contains a list of all the nodes where this package is stored)

  1. It should be straightforward to parse this structure with php at regular intervals to provide the data needed by the api.
  2. Each package should only contain one file for installation, the pet,pup,deb,rpm or tgz. All the other files are provided as information that might be used by a client when browsing the repository.
  3. I'm not sure whether it is sensible to allow a package to contain optional dependencies, this sort of thing makes it very easy for a user to downgrade a library by mistake. It also complicates the md5sum and digital signatures issue.

Package ID

As already stated the package IDs should be assigned by the upload server and namespaced by the server ID. It might be sensible to include the subdirectory id as part of this to make it easy to find packages. For example marge-0805-0005 contains the fifth package uploaded to marge in may 2008. This would probably be stored in packages/marge/0805.

Package Metadata

  1. This is stored in a file named {packageid}.txt
It might be possible for the person packaging to parse some of this from a .desktop file. Most of these fields are optional to allow the inclusion of legacy packages, however the upload form should strongly encourage that they are completed.

name:bogtrotter-0.5 MANDATORY (could just be grabbed from the install filename minus extension)
md5sum:isurehgiouoius (this is of the target file)
targetfile:the name of the file to be installed MANDATORY
targetversion:3.01 (should be the target trunk puppy version it is intended for)
testedversions:3.01,2.14,2.15,2.16,2.17,3.00 (should use comma separated list of official puppy version numbers)
categories:games (probably should use freedesktop standard names)
dependencies:not sure how this list should be formatted,maybe repository IDs
description:might need to set a character limit for this an/or break into short and long versions
license:the license under which this software is distributed, should probably offer defaults, GPL, MPL, proprietry

  1. The package metadata file should contain no blank lines.
  2. This allows metadata of multiple packages to be sent inside one file with at least one blank line (it would probably be safer to assume two lines) between each packages metadata.
  3. Each line should start with a fieldname followed by a colon. Clients should drop lines whose fields they do not recognise. Fields may be inserted in any order.

There are no comments on this page. [Add comment]

Valid XHTML 1.0 Transitional :: Valid CSS :: Powered by Wikka Wakka Wiki 1.1.6.0
Page was generated in 0.0924 seconds