...
 
Commits (9)
......@@ -14,21 +14,19 @@ Nussbaum, and the co-mentors were Stefano Zacchiroli and Marc Brockschmidt.
## Data Sources
The imported data comes from different sources. Each source has a specific type
(e.g. popcon). For each such type, there is a program to import this data into
(e.g. `popcon`). For each such type, there is a program to import this data into
the database (a "gatherer"). Also, there is an optional way to update the data
(i.e. get it from the source) for each source.
Each source has its own documentation. See doc/sources/.
Each source has its own documentation. See `doc/sources/`.
## Installation
## Local Setup
How to create a local instance of the UDD.
### 1. Download
- Get the code from https://salsa.debian.org/qa/udd/
### Getting the code
Download from https://salsa.debian.org/qa/udd/
# clone over https:
git clone https://salsa.debian.org/qa/udd.git
......@@ -37,39 +35,50 @@ Download from https://salsa.debian.org/qa/udd/
git clone git@salsa.debian.org:qa/udd.git
## Setting up a Development Environment
### 2. Set up a Development Environment
### Option 1: Vagrant
#### Option 1 (recommended): Vagrant
Probably, the easiest way to hack on the UDD is using the Vagrant development
environment. Just run `vagrant up` in the UDD source code. See `Vagrantfile`
and the vagrant scripts for details.
and the scripts in `vagrant/` for details.
This development environment supports setting up tunnels to access the main
instance remotely, or dumping/importing data locally, depending on what you
want to work on.
### Option 2: Manually
#### Option 2: Manually
If you don't want to (learn how to) use Vagrant, your best bet is to look at the Vagrant
provision scripts in `vagrant/` to understand how to setup your own instance manually.
### 3. Configuration
- Edit `<your-config-file.yaml>` (see `doc/README.config` for details)
- Setup the DB: `psql udd < sql/udd-schema.sql`
- Initialize the DB:
for i in \
$(cat <your-config-file.yaml> | grep -v "^ " | grep ":" | grep -v general|sed 's/://')
do <path/to/update-and-run.sh> $i
done
### 4. Execution
- Fetch external data: `./udd.py <configuration> update`
- Import the data into the DB: `./udd.py <your-config-file.yaml> run`
If you don't want to use Vagrant, your best bet is to look at the Vagrant
provision script to understand how to setup your own instance manually.
## Troubleshooting
In case a transaction is waiting in idle mode you should do the following:
## Usage
- edit config.yaml (see README.config for details)
- setup the DB: `psql udd < sql/setup.sql`
1. Kill all idle transactions:
### To initialize the DB, do something like:
for i in \
$(cat config-org.yaml | grep -v "^ " | grep ":" | grep -v general|sed 's/://')
do /org/udd.debian.org/udd/update-and-run.sh $i
done
psql udd -c "SELECT * FROM dsa_kill_all_idle_transactions();"
### Running the Database
- run ./udd.py <configuration> update [ fetches the external data ]
- run ./udd.py <configuration> run [ import the data into the DB ]
2. Kill running importers
## Licensing
......@@ -90,7 +99,7 @@ their related code, are both licensed under the GNU GPL3+.
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
See LICENSE.GPLv3 for a local copy of the license.
See LICENSE.GPLv3 for a local copy of the GNU General Public License. If not, see <https://www.gnu.org/licenses/>.
The database itself, and its related data are both licensed under the Open Data
......
......@@ -4,14 +4,30 @@
Vagrant.configure(2) do |config|
config.vm.box = "debian/stretch"
config.disksize.size = "50GB"
config.vm.provision "shell", path: "vagrant/disk/partition.sh"
config.vm.provision :reload
config.vm.provision "shell", path: "vagrant/disk/format.sh"
config.vm.network "forwarded_port", guest: 80, host: 8080
config.vm.provider "virtualbox" do |vb|
vb.memory = "4096"
vb.customize ["setextradata", :id, "VBoxInternal2/SharedFoldersEnableSymlinksCreate/v-root", "1"]
end
config.vm.synced_folder ".", "/vagrant", type: "rsync",
rsync__args: ["--verbose", "--archive", "--delete", "-z"]
config.vm.provision "shell", path: "vagrant/provision.sh"
config.vm.post_up_message = <<~HEREDOC
UDD set up at http://localhost:8080/
# Forward agent when doing 'vagrant ssh' (needed for tunnels to ullmann/alioth)
config.ssh.forward_agent = true
The database is, by default, empty. Either:
- use a tunnel to the real UDD: vagrant ssh -c /vagrant/vagrant/setup-tunnel.sh
this requires shell access to udd.debian.org and to enable SSH agent forwarding
- import (parts of) the real UDD:
vagrant ssh -c '/vagrant/vagrant/populate-db.sh all'
HEREDOC
end
Kill all idle transactions:
udd=> select * from dsa_kill_all_idle_transactions();
size of DB:
SELECT pg_size_pretty(pg_database_size('udd'));
size of relations:
select relname, pg_total_relation_size(relname::text) size from pg_class where relnamespace=2200 and relkind='r' order by size desc;
Check whether a vacuum is working:
SELECT relname, reltuples, relpages
FROM pg_class
ORDER BY relpages DESC;
;; -(thanks Q_)
DB size:
SELECT pg_size_pretty(pg_database_size('udd'));
Relations size:
SELECT relname, pg_total_relation_size(relname::text) size
FROM pg_class
WHERE relnamespace=2200
AND relkind='r'
ORDER BY size DESC;
To check if vacuum is working:
SELECT relname, reltuples, relpages FROM pg_class ORDER BY relpages DESC;
(thanks Q_)
In case a transaction is waiting in idle mode you should do the following:
1. psql udd -c "select * from dsa_kill_all_idle_transactions();"
2. kill running importers
This diff is collapsed.
#!/usr/bin/env bash
set -xeuo pipefail
sudo mkfs.ext4 -F /dev/sda2
sudo mkdir /var/lib/postgresql
sudo tee -a /etc/fstab <<EOF
/dev/sda2 /var/lib/postgresql auto nodev,nosuid 0 0
EOF
sudo mount /var/lib/postgresql
#!/usr/bin/env bash
set -xuo pipefail
sudo apt install -y kpartx
sudo swapoff /dev/sda5
sudo sed -i '/swap/d' /etc/fstab
sudo update-initramfs -u -k all
sudo fdisk /dev/sda <<EOF
p
d
5
d
2
n
p
2
p
w
EOF
exit 0
#!/bin/sh
#!/usr/bin/env bash
set -e
set -x
DUMP_URI="https://udd.debian.org/dumps"
sshtarget="lucas@udd.debian.org"
SCHEMA_URI="${DUMP_URI}/udd-schema.sql"
POPCON_URI="${DUMP_URI}/udd-popcon.sql.xz"
BUGS_URI="${DUMP_URI}/udd-bugs.sql.xz"
UDD_URI="${DUMP_URI}/udd.dump"
if [ "$1" = "" ]; then
echo "Specify target as parameter (schema, packages)"
export PGUSER=udd
export PGDATABASE=udd
load() {
while [ "$#" -gt 0 ]; do
case "$1" in
*.xz)
curl "$1" | unxz | psql
;;
*.gz)
curl "$1" | gunzip | psql
;;
*.sql)
curl "$1" | psql
;;
*.dump)
curl "$1" | pg_restore -v -d udd -x --disable-triggers
;;
*)
echo "Unable to process '$1'" >&2
exit 1
;;
esac
shift
done
}
# If running interactively, stop there
if [ -z "$BASH" ] || [[ "$-" == *i* ]]; then
return
fi
# Otherwise, parse arguments and load the corresponding files
set -eo pipefail
if [ "$#" -eq 0 ]; then
echo "Specify at least one target as parameter (schema, packages, all)"
exit 1
fi
while [ "$1" != "" ]; do
if [ "$1" = "all" ]; then
# everything (except DD-restricted)
dumptarget="-c --if-exists --exclude-table=ldap --exclude-table=pts"
elif [ "$1" = "schema" ]; then
# everything, without data (except DD-restricted)
dumptarget="--schema-only -c --if-exists --exclude-table=ldap --exclude-table=pts"
elif [ "$1" = "packages" ]; then
# only tables related to sources/packages
dumptarget="--data-only -n sources -n packages -n packages_summary"
elif [ "$1" = "table" ]; then
# only specified table
dumptarget="--data-only -t $2"
shift
else
echo "Unknown target: $1"
exit 1
fi
shift
fname="udd-dump-$(date +%s).$$.dump"
ssh -t $sshtarget pg_dump --no-owner -p 5452 -Fc -v -f /tmp/$fname $dumptarget service=udd
rsync -avP $sshtarget:/tmp/$fname /run/shm/$fname
ssh $sshtarget rm -f /tmp/$fname
pg_restore -U udd -j 8 -v -d udd /run/shm/$fname
bad_all() {
echo "Target 'all' is incompatible with other targets" >&2
exit 1
}
if [ "$1" = "all" ]; then
[ "$#" -eq 1 ] || bad_all
targets=( "${UDD_URI}" )
else
targets=()
while [ "$1" != "" ]; do
case "$1" in
"all")
bad_all
;;
"schema")
targets+=("${SCHEMA_URI}")
;;
# "packages")
# targets+=("${UDD_URI}")
# ;;
# "table")
*)
echo "Unknown target: '$1'" >&2
exit 1
esac
shift
done
fi
for target in "${targets[@]}"; do
load "${target}"
done
......@@ -4,7 +4,7 @@ set -x
set -e
sudo sed -i s/httpredir.debian.org/deb.debian.org/ /etc/apt/sources.list
sudo apt-get update
sudo apt-get install -y apache2 postgresql postgresql-plperl-9.6 postgresql-9.6-debversion ruby-debian ruby-oj rsync python-yaml python-psycopg2 ruby-pg ruby-sequel-pg
sudo apt-get install -y apache2 postgresql postgresql-plperl-9.6 postgresql-9.6-debversion ruby-debian ruby-oj rsync python-yaml python-psycopg2 ruby-pg ruby-sequel-pg curl
# trust local connections
sudo sed -ri 's/(local\s+all\s+all\s+)peer/\1trust/' /etc/postgresql/9.6/main/pg_hba.conf
sudo sed -ri 's/(host\s+all\s+all\s+127.0.0.1\/32\s+)md5/\1trust/' /etc/postgresql/9.6/main/pg_hba.conf
......@@ -27,7 +27,7 @@ sudo -u postgres createdb -T template0 -E SQL_ASCII udd
# create the database, named 'udd', forcing the encoding to SQL_ASCII, since that's the format of the export.
# We base it off 'template0' because 'template1' (the default) might be set to UTF8 which prevents creation
# of new SQL_ASCII databases.
sudo -upostgres psql udd -c 'CREATE EXTENSION debversion'
sudo -u postgres psql udd -c 'CREATE EXTENSION debversion'
# Also create a guest user (used by CGIs)
sudo -u postgres dropuser guest || true
sudo -u postgres createuser -lDRS guest
......@@ -41,19 +41,10 @@ sudo ln -sf /vagrant/vagrant/apache.conf /etc/apache2/sites-enabled/000-default.
sudo ln -sf /etc/apache2/mods-available/cgi.load /etc/apache2/mods-enabled/
sudo ln -sf /etc/apache2/mods-available/rewrite.load /etc/apache2/mods-enabled/
sudo rm -f /etc/apache2/conf-enabled/serve-cgi-bin.conf
#
# Run apache2 as the vagrant user. Yes, eek. But this avoids all permission problems.
sudo sed -i 's/APACHE_RUN_USER=www-data/APACHE_RUN_USER=vagrant/' /etc/apache2/envvars
sudo sed -i 's/APACHE_RUN_GROUP=www-data/APACHE_RUN_GROUP=vagrant/' /etc/apache2/envvars
sudo chown -R vagrant:vagrant /var/log/apache2
sudo chown -R vagrant:vagrant /var/lock/apache2
sudo service apache2 restart
echo "
UDD set up at http://localhost:8080/
The database is empty. Either:
- use a tunnel to the real UDD: vagrant ssh -c /vagrant/vagrant/setup-tunnel.sh
- import (parts of) the real UDD:
vagrant ssh -c '/vagrant/vagrant/populate-db.sh schema'
vagrant ssh -c '/vagrant/vagrant/populate-db.sh udd_logs'
"