BACKEND 4.65 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Most package data is available via sleepycat databases. Those have been
specifically crafted for efficient lookup, and reasonable times to generate
them, so that both the daily cronjobs and the actual pagerequests take only a
moderate amount of CPU and I/O

The only pages that are really generated staticly are the pages listing all
packages in a given section.


This is a brief overview of the available databases:

*********************************************************
Generated by means of Packages.gz files:
*********************************************************

| packages_small.db:
|  key: packagename
|  value: \0 separated tuples of "archive suite arch component section priority version shortdescription"
|          (so you can split on spaces in 8 pieces, but need to not split further
|          because shortdescription can have spaces)
  Notes: - maybe add did right before shortdescription?
22 23 24 25 26 27 28
		 - for each suite, newest package is shown first, and (suite,
		   architecture) is unique, the newest one is choosen. Once you find
		   the right suite, you know you've got the newest, once you found
		   your (suite,arch), you know you've found the only unique such entry
		 - The very first element is different (TODO: maybe should be
		   different DB then?), a \01 separated hash of suite -> provided-by,
		   like "suite1\01prov1 prov2\01suite2\01prov1"
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88

| package_postfixes.db:
|  key: a postfix string of a package name
|  value: \0-separated list of prefixes that can preseed this postfix with '^'
|         instead of the empty string in case a postfix happens to (also?) be
|         a full package name
 Note: value can also be \01<decimal number>, meaning there were <decimal
 number> different packages with that postfix (always more than 100)

| packages_descriptions.db:
|  key: "packagename version arch"
|  value: a unique description id, did
 
| descriptions.db:
|  key: did
|  value: description, first line being short, the rest being long [no
|            newline transformation]

| descriptions_packages.db:
|  key: did
|  value: one or more occurances of: "packagename version arch", separated by \0

| packages_all_$suite.db:
|  key: "packagename arch version"
|  value: \0-separated pairs of key\0value items, with key being always
|         lowercase and having most normal Packages.gz entries, except:
|         - source: always available, contains straight source package name
|         - description: has did (description id) only
|         - archive: notes source archive

| sources_packages.db:
|  key: sourcepackagename
|  value: \0 separated tuples of "archive suite package version arch"

  Note: this also comes from the Packages.gz files, and not from Sources.gz
  files, for accuracy.

*********************************************************
Generated by means of Sources.gz files:
*********************************************************

| sources_small.db:
|  key: sourcename
|  value: \0 separated tuples of "archive suite component section priority version"

| source_postfixes.db:
|  key: a postfix string of a source name
|  value: \0-separated list of prefixes that can preseed this postfix with '^'
|         instead of the empty string in case a postfix happens to (also?) be
|         a full package name
 Note: value can also be \01<decimal number>, meaning there were <decimal
 number> different packages with that postfix (always more than 100)

| sources_all_$suite.db:
|  key: "archive suite sourcename"
|  value: \0-separated pairs of key\0value items, with key being always
|         lowercase and having most normal Sources.gz entries, except:
|         - files: \01 separated list of "md5 size filename"
  Note: different key from packages_all, is that needed?

89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109
*********************************************************
Generated by means of Contents-$arch.gz files:
*********************************************************

This one is tricky, because it deals with about 1G of raw uncompressed data
per suite. Not all data is updated every day though, so dealing with that
efficiently pays off.

Each sourcefile will create a filelists_$suite_$arch.db, with prefix
compression. The last updated one will have a symlink from _all.db to it, to
help filelist queries for 'all' packages.

reverse_$suite_$arch.txt will be the reversed pathnames for that file,
lowercased, sorted, with packagename:arch following it.

For each suite, the suite-wide indices can then be updated by reading the 11
or so reverse_$suite_$arch.txt in sorted order with sort -m. Same pathnames
can be put together, and stored in reverse_$suite.db; filenames are then also
incidently coming by grouped uniquely (but reverse sorted, not normal sorted),
and can be written out linearly to filenames_$suite.txt