Skip to content
Commits on Source (2)
Tag: field::biology:sequence
Description: Biological sequence analysis
Tag: field::biology:sequence:dna
Tag: field::biology:sequence:rna
Tag: field::biology:sequence:protein
Tag: # I would rather move the :sequence tags to the works-with:: facet -- Charles
Tag: field::biology:structural
Tag: field::biology:molecular
Tag: field::biology:evolution
Tag: field::biology:interaction
Tag: field::biology:genomics
Tag: field::biology:proteomics
Tag: field::biology:metabolomics
Tag: field::biology:transcriptiomics
Tag: field::biology:systems
Tag: field::mathematics:graphs
Tag: field::medicine:imaging
Tag: field::medicine:practice
Tag: field::medicine:odontology
Tag: field::medicine:his
Tag: field::statistics:clustering
Tag: field::statistics:classifyer
Tag: hardware::input:dmm
Tag: implemented-in::ACD # The AJAX Command Definition (ACD) languag, used by EMBOSS, has nothing to do with web 2.0
Tag: suite::emboss # will also be used for the embassy pacakges
Tag: use::annotation
Tag: use::analysis # further depths would be redundant with the works-with:: facet
Tag: use::comparison
Tag: use::comparison:alignment
Tag: use::comparison:phylogeny
Tag: use::experiment
Tag: use::experiment:molecular-biology
Tag: use::measuring
Tag: use::database:index # Or just use::index since there is already a works-with::db tag ?
Tag: use::database:query # Or just use::query since there is already a works-with::db tag ?
Tag: works-with::device
Tag: works-with::sequence:desoxyribonucleic
Tag: works-with::sequence:ribonucleic # or just sequence:nucleic?
Tag: # I think that sequence:biolgical would be superfluous because of the field::biology tag
Tag: works-with::sequence:peptidic
Tag: works-with::structure # or can it be unified with works-with::3Dmodel?
Tag: works-with-format::abi # Applied BIosystems. This one is not plaintext, I think.
Tag: works-with-format::fitch # neither this one
Tag: works-with-format::biology:fasta
Tag: works-with-format::biology:affy:dat
Description: Scanner pixel-level raw data of Affymetrix microarray
Tag: works-with-format::biology:affy:cel
Description: Probe-level raw data of Affymetrix microarray
# I would rather move the above formats in the ::plaintext subfacet -- Charles
# This is fine with me. And what about affy:cdf? This could go to plaintext:cdf directly? -- Steffen
Tag: works-with-format::plaintext:acedb
Tag: works-with-format::plaintext:aln
Description: Nucleotide or Protein sequence alignment
Tag: works-with-format::plaintext:asn1
Description: Abstract Syntax Notation 1
Tag: works-with-format::plaintext:codata
Tag: works-with-format::plaintext:dbid
Tag: works-with-format::plaintext:embl
Description: EMBL nucleotide sequence database format
This field should only be used for data that
is storing nucleotide data. Other historiacal
derivatives (PRINTS, UniProt, ...) should have
their own respective entry.
Tag: works-with-format::plaintext:experiment
Tag: works-with-format::plaintext:fasta
Tag: works-with-format::plaintext:gcg
Tag: works-with-format::plaintext:gff
Tag: works-with-format::plaintext:hennig86
Tag: works-with-format::plaintext:ij
Tag: works-with-format::plaintext:intact
Description: an interaction database -- Steffen
Tag: works-with-format::plaintext:interpro
Description: protein domain meta database -- Steffen
Tag: works-with-format::plaintext:jackknifer
Tag: works-with-format::plaintext:jackknifernon
Tag: works-with-format::plaintext:mega
Tag: works-with-format::plaintext:meganon
Tag: works-with-format::plaintext:msf
Tag: works-with-format::plaintext:nbrf
Tag: works-with-format::plaintext:ncbi
Tag: works-with-format::plaintext:nexus
Tag: works-with-format::plaintext:nexusnon
Tag: works-with-format::plaintext:paup
Tag: works-with-format::plaintext:paupnon
Tag: works-with-format::plaintext:pdb
Description: Protein structure data
Tag: works-with-format::plaintext:pfam
Tag: works-with-format::plaintext:phylip
Tag: works-with-format::plaintext:phylipnon
Description: what is this? -- Steffen
Tag: works-with-format::plaintext:pir
Tag: works-with-format::plaintext:prints # protein domain/family database
Tag: works-with-format::plaintext:raw
Description: OK, that one advocates against the use of plaintext.
Tag: works-with-format::plaintext:selex
Tag: works-with-format::plaintext:staden
Tag: works-with-format::plaintext:strider
Tag: works-with-format::plaintext:stockholm
Tag: works-with-format::plaintext:swissprot
Tag: works-with-format::plaintext:treecon
Tag: works-with-format::plaintext:affy:dat
Tag: works-with-format::plaintext:affy:cel
Tag: works-with-format::medicine:practice:xDT
Tag: works-with-format::medicine:DICOM
Description: or works-with-format::dicom ? -- Charles
I would like to prepare for SNOMED, ICD10 and several others -- Steffen
# an earlier idea of mine to separate different
# kinds of prediction methods - appreciated?
# "use" rather than "made-of"?
# -- Steffen
Tag: made-of::algorithm:dynamic-programming
Tag: made-of::algorithm:hashing
Tag: made-of::algorithm:hidden-markov-model
Tag: made-of::algorithm:neural-network
Tag: made-of::algorithm:dimension-reduction
Description: Comprises ICA and PCA, spring-embedding -- Steffen
# some less arguable debtags:
Tag: made-of::data:r # For the statistics environment http://www.r-project.org
= Export of Debian annotation to the bio.tools repository =
This folder collects tools to automated the transformation of Debian
package annotation in the a syntax of the ELIXIR registry 'bio.tools' [3].
The tools are tailored to packages curated by the Debian Med project.
A key technology in this process is the EDAM ontology [1], This addresses
the categorisation of tools and collections of tools that contribute to
computational biology in its broadest sense.
The bio.tools entry can retrieve some information directly from the
available annotation by e.g. using dpkg-parsechangelog. The EDAM
annotation is however external to Debian and considered sufficiently
beneficial to the Debian packages to have these annotated along the
regular packaging. Since package annotation is immediately amendable
via the git repository of Debian Med [4], this shall also invite
Debian-external contributors.
== Tools ==
The following tools are available
* packages.list.update.sh
* registry-tool.py
* registry-tool-iterator.sh
The packages.list.update.sh script retrieves a list of binry packages
(the ones with code executed by the user) from the Debian Med tasks pages
and determines the source packages for these (the ones with the source
code and especially also the package annotation). A list of packages is
created as the file 'packages.list.txt'.
The registry-tool.py script is not meant to be executed directly.
It translates all information gathere from a single package source tree
into a single json file. The latter is provided in a form that may be
directly uploaded to the bio.tools repository.
The registry-tool-iterator.sh reads the packages.list.txt file and checks
out the master branch of each such referenced package. The iterator
checks the format of each EDAM file and in a second iteration creates
the json files mean to export from Debian to the bio.tools repository.
== Data flow ==
While the upload of packages is at ease for packages that are yet unknown
to the bio.tools registry, the information for entries already existing
demands a manual act of merging. There is yet no means in the bio.tools
repository to support that process (i.e. provenance management).
To the rescue comes a git repository [5] to which the files created by
the registry-tool are submitted. The information in bio.tools placed
in an independent branch. A third branch merges the two to prepare
the submission.
Steffen Moeller, Matus Kalas
St. Malo/Bergen/Rostock/Lyngby/Trondheim/Luebeck/Bucharest 2015-2017
[1] http://edamontology.org/
[2] http://www.yaml.org/
[3] https://bio.tools
[4] https://anonscm.debian.org/cgit/debian-med
[5] https://github.com/bio-tools-community/json-buffer
#!/bin/bash -e
# A routine to facilitate the output to STDERR instead of the default STDIN
function STDERR () {
cat - 1>&2
}
# echoindent outputs a series of blanks to STDOUT. An optional
# second argument is echoed after those blanks if present.
function echoindent () {
for i in $(seq 1 $1)
do
echo -n " "
done
if [ "" != "$2" ]; then
echo $2
fi
}
level=0
# helper to properly close an open paranthesis
function closeParenthesis () {
level=$(($level-1))
echoindent $level
echo -n "}"
if [ -n "$1" ]; then
echo "# $1"
else
echo
fi
}
function echoTerm(){
level=$(($level-1))
echoindent $level
echo "{\"uri\": \"$1\", \"term\": \"Pippi Langstrumpf\"}"
}
# Key argument indicating the debian directory from which to retrieve all the
# information
pathToDebian=$1
#verbose="yes"
verbose=""
# Variable keeping usage information
USAGE=<<EOUSAGE
debian2edam [--upload] <path to 'debian' directory>
server=https://
Environment variables:
elixir_cat_username
elixir_cat_password
EOUSAGE
filename=$(basename "$pathToDebian")
if [ "edam" = "$filename" ]; then
pathToDebian=$(dirname "$pathToDebian") # upstream
pathToDebian=$(dirname "$pathToDebian") # debian
fi
if [ -z "$pathToDebian" ]; then
echo "$USAGE" | STDERR
echo "E: Please specify debian directory in which to find EDAM annotation." | STDERR
exit -1
fi
if [ ! -d "$pathToDebian" ]; then
echo "$USAGE" | STDERR
echo "E: Could not find directory '$pathToDebian'" | STDERR
exit -1
fi
if [ ! -r "$pathToDebian/changelog" ]; then
echo "$USAGE" | STDERR
echo "E: Could not find a changelog file expected at '$pathToDebian/changelog'" | STDERR
exit -1
fi
cd $(dirname "$pathToDebian")
edamfile="debian/upstream/edam"
if [ ! -r "$edamfile" ]; then
echo "$USAGE" | STDERR
echo "E: Could not access file '$edamfile' from $(pwd)" | STDERR
exit -1
fi
sourcepackage=$(dpkg-parsechangelog |grep ^Source | sed -e 's/`^Source: //' )
version=$(dpkg-parsechangelog |grep ^Version | cut -f2 -d\ | sed -e 's/-[^-][^-]*//' )
declare -a descriptions
declare -a packages
if [ -n "$debug" ]; then cat debian/control; fi
while read pack; do
p=$(echo "$pack"|sed -e 's/^[^:]*: *//')
echo Package: $p
packages[${#packages[*]}]="$p"
done < <(grep "^Package:" debian/control )
while read desc; do
d=$(echo "$desc"|sed -e 's/^[^:]*: *//')
echo Description: $d
descriptions[${#descriptions[*]}]="$d"
#descriptions[1]="$d"
#descriptions="$d"
done < <(grep "^Description:" debian/control )
#echo "DESCRIPTIONS: ${descriptions[*]}"
#echo "PACKAGES: ${packages[*]}"
#echo "DESCRIPTIONS: $descriptions}"
#echo "PACKAGES: $packages"
if [ ${#packages[*]} != ${#descriptions[*]} ]; then
echo "E: Internal error - expected same number of packages (${#packagesp[*]}) as for their descriptions (${#descriptions[*]})" | STDERR
exit -1
fi
(
if [ -n "$verbose" ]; then
for packageno in $(seq 0 ${#descriptions[*]})
do
echo "# $packageno"
echo Packages[$packageno]: ${packages[$packageno]}
echo Descriptions[$packageno]: ${descriptions[$packageno]}
done
fi
) | STDERR
prevstate="start";
previndent=0
currentscope=""
currenttopic=""
opentopic=0
openfunction=0
openscope=0
indentlen=0
# Core part of the program
# It reads every line of the EDAM file (see end of loop for the redirection)
# and decides what to print to STDOUT.
while IFS='' read -r line
do
if [ -z "$line" ]; then
echo "Read empty line"
continue
fi
if [ -n "$verbose" ]; then
echo "line: '$line'" | STDERR
fi
# retrieve different parts of the description
blanks=$(echo "$line"|sed -e 's/^\( *\)\([^ :]\+\): *\([^ ]\+\).*$/\1/')
type=$(echo "$line"|sed -e 's/^\( *\)\([^ :]\+\): *\([^ ]\+\).*$/\2/')
val=$(echo "$line"|sed -e 's/^\( *\)\([^ :]\+\): *\([^ ]\+\).*$/\3/')
if echo "$val" | grep -q : ; then
echo "W: found colon in ID of line '$line' - transscribing to underscore" | STDERR
val=$(echo "$val"|tr ":" "_")
fi
#echo "Indent='$blanks'"
#echo "Indentlength='$indentlen'"
#echo "Type='$type'"
#echo "Val='$val'"
if [ -n "$currentscope" -a "*" != "$currentscope" -a "summary" != "$currentscope" -a "scope" != "$type" ]; then
echo "I: Wrong scope ($currentscope) - ignored '$line'" | STDERR
continue
fi
indentlen=${#blanks}
if [ "scope" = "$type" ]; then
if [ $openfunction -gt 0 ]; then closeParenthesis "openfunction($openfunction) in scope"; fi
currentscope="$val"
resourcename=$sourcepackage
if [ "*"!=$val -a "summary"!="$val" ];then
resourcename=$val
fi
if [ "summary" != "$val" -a "*" != "$val" ]; then
echo "I: treatment of multiple scopes not yet implemented" | STDERR
else
echo "{"
# Some decent comparison of package names with scope is not implemented
level=$((level+1))
echoindent
echo "Package $resourcename"
echoindent
echo "\"version\": \"$version\","
echoindent
echo "\"description\": \"${descriptions[0]}\","
echoindent
echo "\"topic\": \"{$currenttopic}\""
openscope=1
fi
elif [ "topic" = "$type" ]; then
if [ $openfunction -gt 0 ]; then closeParenthesis "openfunction($openfunction) in topic"; openfunction=0; fi
if [ $openscope -gt 0 ]; then closeParenthesis "openscope($openscope) after loop"; openscope=0; fi
if [ "start" != "$prevstate" ]; then
closeParenthesis "topic with prior state - weird"
fi
currenttopic="$val"
# at some laterimplementation point, bits generated here would be cached and then distributed
# to various lower-level scopes
elif [ "function" = "$type" ]; then
if [ $openfunction -gt 0 ]; then
closeParenthesis "openfunction($openfunction) in function"
openfunction=0
fi
echoindent $level
echo "{function: [ { \"functionName\": ["
echoTerm $val
echo "] }],"
level=$((level+1))
openfunction=1
elif [ "input" = "$type" -o "output" = "$type" ]; then
if [ $prevstate = $type ]; then
echo "},{"
fi
if [ $prevstate = 'function' ]; then
echo "\"$type\": [{"
fi
echoindent $level
echo "($type $val)"
else
echo "W: unknown type '$type' - ignored" | STDERR
fi
prevstate=$type
#echo "indentlen='$indentlen'"
done < $edamfile
if [ $openfunction -gt 0 ]; then
closeParenthesis "openfunction($openfunction) after loop"
openfunction=0
fi
if [ $openscope -gt 0 ]; then
#echo "I: treatment of multiple scopes not yet implemented"|STDERR
closeParenthesis "openscope($openscope) after loop"
openscope=0
fi
#echo "indentlen='$indentlen'" | STDERR
if [ $opentopic -gt 0 ]; then
opentopic=0
fi
#for i in $(seq $(($indentlen-$openfunction-$openscope-$opentopic)) -1 1)
#do
# closeParenthesis "indent $i"
#done
abacas
abyss
acedb
adapterremoval
aevol
alien-hunter
alter-sequence-alignment
altree
amap-align
ampliconnoise
anfo
aragorn
arb
arden
art-nextgen-simulation-tools
artemis
artfastqgenerator
autodock-vina
autodocksuite
autodocktools
ball
bamtools
barrnap
bcftools
beast-mcmc
bedtools
blasr
blimps
bowtie
bowtie2
boxshade
brig
bwa
cassiopee
cd-hit
cdbfasta
cgview
circlator
circos
clearcut
clonalframe
clustalo
clustalw
clustalx
cluster3
coils
concavity
conservation-code
cufflinks
daligner
dascrubber
dazzdb
dialign
dialign-t
discosnp
disulfinder
dnaclust
dssp
ea-utils
edtsurf
eigensoft
embassy-domainatrix
embassy-domalign
embassy-domsearch
embassy-phylip
emboss
epcr
exonerate
falcon
fastaq
fastdnaml
fastlink
fastml
fastqc
fasttree
fastx-toolkit
ffindex
figtree
filo
fitgcp
flexbar
freecontact
fsm-lite
gamgi
garlic
gasic
gbrowse
gdpc
genometools
gentle
gff2aplot
gff2ps
giira
glam2
gmap
graphlan
grinder
gromacs
gubbins
harvest-tools
hhsuite
hilive
hmmer
hmmer2
htslib
hyphy
idba
igv
infernal
ipig
iqtree
jaligner
jalview
jellyfish
jmodeltest
jmol
kalign
kineticstools
kissplice
kmer
kraken
last-align
lefse
libfastahack
librg-utils-perl
libsmithwaterman
libssw
libvcflib
logol
loki
ltrsift
macs
maffilter
mafft
mapdamage
mapsembler2
maq
maqview
massxpert
mauve-aligner
melting
metaphlan2
metastudent
mgltools-cadd
mgltools-pmv
mgltools-vision
mhap
microbegps
microbiomeutil
minia
mipe
mira
mlv-smile
mothur
mrbayes
mummer
murasaki
muscle
mustang
nanopolish
ncbi-blast+
ncbi-seg
ncbi-tools6
neobio
njplot
norsnet
norsp
openms
paml
paraclu
parsinsert
parsnp
pbalign
pbbarcode
pbdagcon
pbgenomicconsensus
pbh5tools
pbsuite
pdb2pqr
perlprimer
perm
phipack
phylip
phyml
phyutility
picard-tools
placnet
plasmidomics
plast
plink
plink1.9
poa
populations
poretools
prank
predictnls
predictprotein
prime-phylo
primer3
proalign
probabel
probalign
probcons
proda
prodigal
profbval
profisis
profnet
profphd
profphd-utils
proftmb
progressivemauve
proteinortho
prottest
pycorrfit
pymol
pynast
pyscanfcs
python-cogent
python-cutadapt
python-dendropy
qiime
r-bioc-biostrings
r-bioc-cummerbund
r-bioc-edger
r-bioc-gviz
r-bioc-hilbertvis
r-bioc-limma
r-bioc-rtracklayer
r-cran-adegenet
r-cran-adephylo
r-cran-ape
r-cran-distory
r-cran-genabel
r-cran-phangorn
r-cran-qtl
r-cran-seqinr
r-cran-treescape
r-cran-vegan
r-other-mott-happy
raccoon
rasmol
raster3d
rate4site
raxml
ray
readseq
relion
repeatmasker-recon
reprof
rna-star
rnahybrid
roary
rtax
saint
samtools
seaview
seer
seq-gen
seqan
seqtk
sga
sibsim4
sickle
sift
sigma-align
sim4
smalt
#smrtanalysis
snap
sniffles
snp-sites
soapdenovo
soapdenovo2
spades
sprai
spread-phy
squizz
sra-sdk
ssake
stacks
staden
staden-io-lib
subread
surankco
t-coffee
tantan
theseus
tigr-glimmer
tm-align
tophat
transtermhp
tree-puzzle
treeview
treeviewx
trimmomatic
uc-echo
ugene
varscan
vcftools
velvet
velvetoptimiser
vsearch
wise
zalign
#!/bin/bash
set -e
dest="$(dirname $0)/packages.list.txt"
echo "Writing packages list to '$dest'"
(
wget -O - https://anonscm.debian.org/cgit/blends/projects/med.git/plain/tasks/bio | grep Depends | cut -f2 -d: | tr ", " "\n" | tr -d "|" | sort -u | while
read packagename
do
a=$(apt-cache show $packagename | head -n 2)
p=""
q=""
if echo $a|grep -q "Source: "; then
q=$(echo $a | sed -e '/^.*Source:/s/^.*Source: *//' -e 's/ *Version:.*//' -e 's/ *(.*) *//')
echo $q
else
p=$(echo $a | sed -e '/Package/s/^Package: *//' -e 's/ *Version:.*//')
echo $p
fi
# for debugging
#echo "p='$p'"
#echo "q='$q'"
#echo
done
) | grep -v "^E:" | sort -u | egrep -v "^ *$"> $dest
#!/bin/bash
if [ ! -x /usr/bin/realpath ]; then
echo "E: Please install realpath"
exit 1
fi
if [ ! -x /usr/bin/yamllint ]; then
echo "E: Please install yamllint"
exit 1
fi
DONTUPDATE=true
#DONTUPDATE=false
DONTOVERWRITE=true
# Set to true if no new repositories shall be downloaded
DONTCLONE=true
#DONTCLONE=false
TOOLDIR=$(realpath $(dirname $0))
set -e
#EDAMPACKAGESINGIT="mummer fastaq barrnap muscle fastqc uc-echo arden artemis sra-sdk bowtie2 rna-star trimmomatic fastx-toolkit mothur jalview snpomatic condetri picard-tools dindel"
# And also:
# filo
GITDIR=$HOME/git/debian-med
JSONBUFFERDIR=$HOME/git/json-buffer
JSONBUFFERSUBDIR=records
if [ ! -r EDAM.owl ]; then
echo "I: Retrieving current version of EDAM ontology"
wget http://www.edamontology.org/EDAM.owl
fi
edamversion=$(grep doap:Version EDAM.owl | cut -f2 -d\> | cut -f1 -d\<)
echo "I: Comparing terms against EDAM version '$edamversion'"
if [ ! -d "$GITDIR" ]; then
echo "E: Directory '$GITDIR' is not existing. Expected a whole range of git repositories from Debian Med here. Please check."
exit -1
fi
if [ ! -d "$JSONBUFFERDIR" ]; then
echo "E: The directory destined to hold the generated records is not existing."
echo " Please consider running "
echo " git clone https://github.com/bio-tools-community/json-buffer '$JSONBUFFERDIR'"
exit -1
fi
dest="$JSONBUFFERDIR"/"$JSONBUFFERSUBDIR"
if [ ! -d "$JSONBUFFERDIR"/"$JSONBUFFERSUBDIR" ]; then
echo "W: Creating directory '$dest'"
mkdir "$dest"
else
echo "I: Found destination directory '$dest'"
fi
unset dest
if [ ! -r "$TOOLDIR/packages.list.txt" ]; then
echo "E: Expected list of packages to work on in '$GITDIR/packages.list.txt'. Fie not found/readable."
exit 1
fi
echo
echo "I: *** Retrieving package source tree from Debian Med git repository ***"
echo
#for p in $EDAMPACKAGESINGIT
cat "$TOOLDIR/packages.list.txt" | while read p
do
cd "$GITDIR" # We may have moved into a subdir
echo -n "I: Preparing package '$p'"
origin="https://anonscm.debian.org/git/debian-med/$p.git"
#origin="ssh://anonscm.debian.org/git/debian-med/$p.git"
if [ -d "$GITDIR"/"$p" ]; then
if $DONTUPDATE; then
echo " is existing, will not check for any later version"
else
echo " is existing, will pull latest version from Debian Med git repository '$origin'"
cd "$GITDIR"/"$p"
if ! git pull; then
echo
echo "W: Could not pull latest revision for '$p' from $origin - skipped, git status shown below"
git status
continue
fi
if ! git gc; then
echo
echo "E: Could not garbage-collect package '$p' - fix this"
exit 1
fi
fi
else
echo -n " is not existing "
if $DONTCLONE; then
echo " [skipped]"
continue
else
echo -n ", will clone from Debian Med git repository '$origin'"
if ! git clone --quiet --branch=master --single-branch $origin; then
echo
echo "E: Could not clone package '$p' from $origin - skipped"
continue
fi
cd $p
if ! git gc; then
echo
echo "E: Could not garbage-collect freshly cloned package '$p' - fix this"
exit 1
fi
fi
fi
cd "$GITDIR"/"$p"
git checkout master
if [ ! -r debian/upstream/edam ]; then
echo "W: The package '$p' suprisingly does not feature an EDAM annotation file"
continue
fi
if ! yamllint debian/upstream/edam; then
echo
echo "E: The package '$p' has a syntactic problem with its EDAM annotation. Please fix."
exit 1
fi
done
echo
echo "I: *** Repository of Debian Med packages is in shape, now transcribing for bio.tools ***"
echo
#for p in $EDAMPACKAGESINGIT
cat "$TOOLDIR/packages.list.txt" | grep -v ^# | while read p
do
dest="$JSONBUFFERDIR"/"$JSONBUFFERSUBDIR"/"$p".json
if $DONTOVERWRITE && [ -r "$dest" ] ; then
echo " not overwriting exiting '$dest'"
continue
fi
echo -n "I: Package '$p'"
if [ ! -d "$GITDIR"/"$p" ]; then
echo " not existing in '$GITDIR/$p' - skipped"
continue
fi
if [ ! -r "$GITDIR"/"$p"/debian/control ]; then
echo " with incomplete local repository, searched for $GITDIR/$p/debian/control - skipped"
continue
fi
echo -n " writing to $dest"
cd "$GITDIR"/"$p"
#git checkout master
python "$TOOLDIR"/registry-tool.py "$GITDIR"/"$p" > $dest
echo " [OK]"
unset dest
done
#!/usr/bin/env python
import json
import yaml
import argparse
import requests
import os.path
import getpass
import re
from lxml import etree
from debian import deb822
#parsing and declaring namespaces...
EDAM_NS = {'owl' : 'http://www.w3.org/2002/07/owl#',
'rdf':"http://www.w3.org/1999/02/22-rdf-syntax-ns#",
'rdfs':"http://www.w3.org/2000/01/rdf-schema#",
'oboInOwl': "http://www.geneontology.org/formats/oboInOwl#"}
#EDAM_DOC = doc = etree.parse("/home/hmenager/edamontology/EDAM_1.13_dev.owl")
#EDAM_DOC = doc = etree.parse("EDAM.owl")
#doc = etree.parse("/home/moeller/debian-med/community/edam/EDAM.owl")
doc = etree.parse("/home/kalas/edam/EDAM.owl")
EDAM_DOC = doc.getroot()
def check_id(label, axis):
#xpath_query = "//owl:Class[translate(rdfs:label/text(),'abcdefghijklmnopqrstuvwxyz','ABCDEFGHIJKLMNOPQRSTUVWXYZ')=translate('" + label\
# + "','abcdefghijklmnopqrstuvwxyz','ABCDEFGHIJKLMNOPQRSTUVWXYZ') and starts-with(@rdf:about, 'http://edamontology.org/" + axis + "')]/@rdf:about"
xpath_query = "//owl:Class[rdfs:label/text()='"+label+"' and starts-with(@rdf:about, 'http://edamontology.org/" + axis + "')]/@rdf:about"
matching_terms = EDAM_DOC.xpath(xpath_query, namespaces=EDAM_NS)
if len(matching_terms)==0:
sys.stderr.write("\nE: No matching " + axis + " term for label " + label + "!"+"\n")
# print(xpath_query)
elif len(matching_terms)>1:
sys.stderr.write("\nE: More than one " + axis + " term for label " + label + "!"+"\n")
else:
term_id = matching_terms[0]
if len(EDAM_DOC.xpath("//owl:Class[@rdf:about='"+ term_id +"' and owl:deprecated='true']", namespaces=EDAM_NS))>0:
sys.stderr.write("\nE: Term " + term_id + " term for label " + label + " is deprecated!\n")
else:
return term_id
import re, sys
def getUpstreamName(changelogFile):
p = re.compile("Upstream-Name: *([A-Za-z0-9]+) *")
for l in open(changelogFile):
m = p.findall(l)
if m:
return(m[0])
return("")
def doc_to_dict(pack_dir):
debian_path = os.path.join(pack_dir, 'debian')
control_path = os.path.join(debian_path, 'control')
changelog_path = os.path.join(debian_path, 'changelog')
copyright_path = os.path.join(debian_path, 'copyright')
edam_path = os.path.join(debian_path, 'upstream', 'edam')
metadata_path = os.path.join(debian_path, 'upstream', 'metadata')
control_iterator = deb822.Packages.iter_paragraphs(open(control_path))
control_description=""
control_homepage=""
control_name=""
for p in control_iterator:
if p.has_key("Source"):
control_source=p.get("Source")
if p.has_key("Homepage"):
control_homepage=p.get("Homepage")
if p.has_key("Description"):
control_description=p.get("Description")
break;
version_line = open(changelog_path).readline()
version_debian = re.split('[()]', version_line)[1]
m = re.match('^([0-9]+:)?(.*)-[^-]+$', version_debian)
if (m is None):
sys.stderr.write("E: Bad version in "+changelog_path+"\n")
sys.exit(1)
version_upstream = m.groups()[m.lastindex-1]
resource_name=getUpstreamName(copyright_path)
if "" == resource_name:
resource_name=control_source
resource = {'name': resource_name,
'homepage': control_homepage,
'version': version_debian,
'collection': 'DebianMed',
#'interface': {}, #TODO
'description': control_description,
'sourceRegistry': '',
'function': []
}
resource['publications'] = {}
try:
metadata = yaml.load(open(metadata_path))
#print metadata
try:
resource['publications']['publicationsPrimaryID'] = metadata['Reference']['DOI'],
except KeyError:
resource['publications']['publicationsPrimaryID'] = metadata['Reference']['doi'],
except TypeError:
#sys.stderr.write("W: " + resource_name + "shows TypeError for DOI - presumed harmless")
try:
resource['publications']['publicationsPrimaryID'] = metadata['Reference'][0]['DOI'],
except KeyError:
resource['publications']['publicationsPrimaryID'] = metadata['Reference'][0]['doi'],
if len( metadata['Reference'])>1:
resource['publications']['publicationsOtherID'] = []
for pos in range(1,len(metadata['Reference'])):
try:
resource['publications']['publicationsOtherID'] = metadata['Reference'][pos]['DOI']
except KeyError:
try:
resource['publications']['publicationsOtherID'] = metadata['Reference'][pos]['doi']
except KeyError:
sys.stderr.write("\nW: No DOI at pos %d in '%s'\n" % (pos,metadata_path))
except KeyError:
# already done - assignment of none to publication
resource['publications']['publicationsPrimaryID'] = "None"
except IOError:
sys.stderr.write("\nW: No metadata file found (looked for "+metadata_path+")\n")
resource['publications']['publicationsPrimaryID'] = "None"
try:
edam = yaml.load(open(edam_path))
#print(edam)
topicORtopics = []
try:
topicORtopics = edam['topics']
except KeyError:
topicORtopics = edam['topic']
resource['topic']=[{'uri':check_id(topic_label,'topic')} for topic_label in topicORtopics]
scopeORscopes = []
try:
scopeORscopes = edam['scopes']
except KeyError:
scopeORscopes = edam['scope']
for scope in scopeORscopes:
function = {}
function['functionHandle'] = scope['name']
function['functionName'] = [{'uri':check_id(function_label,'operation')} for function_label in scope.get('function')]
function['input'] = []
if not scope.get('inputs') is None:
for el in scope.get('inputs'):
v={}
v['dataType']={'uri':check_id(el['data'],'data')}
if 'formats' in el:
v['dataFormat']=[{'uri':check_id(format_el,'format')} for format_el in el['formats']]
elif 'format' in el:
v['dataFormat']=[{'uri':check_id(format_el,'format')} for format_el in el['format']]
function['input'].append(v)
function['output'] = []
if not scope.get('outputs') is None:
for el in scope.get('outputs'):
v={}
v['dataType'] = {'uri':check_id(el['data'],'data')}
if 'formats' in el:
v['dataFormat'] = [{'uri':check_id(format_el,'format')} for format_el in el['formats']]
elif 'format' in el:
v['dataFormat'] = [{'uri':check_id(format_el,'format')} for format_el in el['format']]
function['output'].append(v)
resource['function'].append(function)
except IOError:
sys.stderr.write("\nW: No EDAM file found (looked for "+edam_path+")\n")
return resource
if __name__ == '__main__':
parser = argparse.ArgumentParser(
description='ELIXIR registry tool for Debian Med packages')
parser.add_argument('package_dirs', help="Debian package directory", nargs='+')
args = parser.parse_args()
if args.package_dirs:
package_dirs = args.package_dirs
for package_dir in package_dirs:
#print "processing %s..." % package_dir
res = doc_to_dict(package_dir)
print json.dumps(res, indent=True)
#print "done processing %s..." % package_dir
#../debian2edam ../../../packages/muscle/trunk/debian/upstream/edam
../debian2edam ./test_muscle/debian
This is not real - testing the EDAM ontology parsing only.
muscle (1:3.8.31-2) UNRELEASED; urgency=low
[ Andreas Tille ]
* debian/upstream: Added citations
* debian/control:
- Fixed Vcs-Svn
- Standards-Version: 3.9.3 (no changes needed)
[ Steffen Moeller ]
* debian/changelog: added upstream-name and -contact
-- Andreas Tille <tille@debian.org> Sat, 05 May 2012 08:09:12 +0200
muscle (1:3.8.31-1) unstable; urgency=low
[ Charles Plessy ]
* New upstream release (Closes: #643443).
* debian/control: Enhances: t-coffee.
* Changed the doc-base section according to the new policy.
* Updated my email address.
* Updated debian/watch to new version scheme and download location.
* Repack usptream archive and implemented a get-orig-source target
(debian/rules, debian/README.source, debian/copyright).
* Use Debhelper 8 (debian/control, debian/compat).
* Build directly from debian/rules targets.
* Corrected VCS URLs in debian/control.
* Conforms to Debian Policy 3.9.2 (debian/control, no changes needed).
[ David Paleino ]
* removed myself from Uploaders (debian/control).
-- Charles Plessy <plessy@debian.org> Sun, 13 Nov 2011 18:38:06 +0900
muscle (3.70+fix1-2) unstable; urgency=low
* debian/control Conflicts: and Replaces: muscle-doc (Closes: #465607)
-- Charles Plessy <charles-debian-nospam@plessy.org> Thu, 14 Feb 2008 10:44:17 +0900
muscle (3.70+fix1-1) unstable; urgency=low
[ Charles Plessy ]
* New upstream version, buildable with GCC 4.3 (Closes: #462707)
The version number was not increased upstream when the sources were
changed. We name this new version in Debian "3.70+fix1".
* Updated manual page.
* Converted the source package to CDBS, dropped Makefile patch.
* Fused muscle and muscle-doc.
[ Nelson A. de Oliveira ]
* Fixed watch file (Closes: #462827)
-- Charles Plessy <charles-debian-nospam@plessy.org> Wed, 06 Feb 2008 12:04:31 +0900
muscle (3.70-1) unstable; urgency=low
[ Charles Plessy ]
* New upstream release (bugfixes plus undocumented new features).
* debian/control:
- Add Subversion repository.
- Swiched to quilt.
- Enhaces: seaview because SeaView can call muscle to re-align sequences.
- Moved the Homepage: field out from the package's description.
- Using debhelper 5.
- Removed [Biology] from package description as there are Debtags now.
- Checked that muscle conforms to Policy 3.7.3.
- Updated Steffen's email address.
* Handling nostrip build option (policy 10.1) (Closes: #437599).
* Updated manpage.
* debian/copyright made machine-readable.
[ Nelson A. de Oliveira ]
* Added watch file.
[ David Paleino ]
* debian/manpage.xml moved to debian/muscle.1.xml
* debian/muscle.1 added - statically built
* debian/manpages removed - passing arguments to dh_installman
directly
* debian/control:
- B-D updated (see above)
- added myself to Uploaders
- moved XS-Vcs-* to Vcs-*
* debian/rules:
- manpages statically built
- minor changes
-- Charles Plessy <charles-debian-nospam@plessy.org> Sat, 12 Jan 2008 16:55:48 +0900
muscle (3.60-1) unstable; urgency=low
* New upstram release (Closes: Bug#361742).
* New maintainers email addresses.
-- Charles Plessy <charles-debian-nospam@plessy.org> Sat, 5 Aug 2006 09:57:27 +0900
muscle (3.52-2) unstable; urgency=low
* Added missing build dependencies (Closes: Bug#287684).
-- Steffen Moeller <moeller@pzr.uni-rostock.de> Wed, 29 Dec 2004 21:50:47 +0200
muscle (3.52-1) unstable; urgency=low
* New upstream version.
* Fix build on arch other than Pentium (Closes: Bug#285000).
-- Steffen Moeller <moeller@pzr.uni-rostock.de> Sun, 18 Dec 2004 00:06:00 +0200
muscle (3.51-1) unstable; urgency=low
* Initial Release (Closes: Bug#280411).
-- Steffen Moeller <moeller@pzr.uni-rostock.de> Sun, 19 Sep 2004 00:51:19 +0200
Source: muscle
Section: science
Priority: optional
Maintainer: Debian Med Packaging Team <debian-med-packaging@lists.alioth.debian.org>
DM-Upload-Allowed: yes
Uploaders: Steffen Moeller <moeller@debian.org>,
Charles Plessy <plessy@debian.org>
Build-Depends: debhelper (>= 8), cdbs
Standards-Version: 3.9.3
Vcs-Browser: http://svn.debian.org/wsvn/debian-med/trunk/packages/muscle
Vcs-Svn: svn://svn.debian.org/debian-med/trunk/packages/muscle/trunk/
Homepage: http://www.drive5.com/muscle/
Package: muscle
Architecture: any
Depends: ${shlibs:Depends}, ${misc:Depends}
Conflicts: muscle-doc
Replaces: muscle-doc
Provides: muscle-doc
Enhances: seaview, t-coffee
Description: Multiple alignment program of protein sequences
MUSCLE is a multiple alignment program for protein sequences. MUSCLE
stands for multiple sequence comparison by log-expectation. In the
authors tests, MUSCLE achieved the highest scores of all tested
programs on several alignment accuracy benchmarks, and is also one of
the fastest programs out there.
Format: http://dep.debian.net/deps/dep5/
Source: http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_src.tar.gz
Comment: This release contains a potentially sourceless binary file, muscle21, that was removed.
Upstream-Name: MUSCLE
Upstream-Contact: Robert C. Edgar <robert@drive5.com>
Files: *
Copyright: © Robert C. Edgar "Bob" <muscle@drive5.com>
License: PD-dedication
MUSCLE is public domain software
The MUSCLE software, including object and source code, is hereby donated
to the public domain.
.
Disclaimer of warranty
THIS SOFTWARE IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
Files: debian/*
Copyright: © 2004 Steffen Moeller <steffen_moeller@gmx.de>
© 2007 Nelson A. de Oliveira <naoliv@debian.org>
© 2007 David Paleino <d.paleino@gmail.com>
© 2006-2008 Charles Plessy <charles-debian-nospam@plessy.org>
License: PD-dedication
Please treat this work as if it were in public domain.
ontology: EDAM (1.12)
topic:
- Sequence analysis
scopes:
- name: summary
function:
- Multiple Sequence Alignment
inputs:
- data: Sequence
formats: [FASTA]
outputs:
- data: Alignment
formats:
- FASTA-aln
- Clustalw format
- Phylip format
Reference:
- Author: Robert C. Edgar
Title: "MUSCLE: multiple sequence alignment with high accuracy and high throughput"
Journal: Nucleic Acids Research
Year: 2004
Volume: 32
Number: 5
Pages: 1792-1797
DOI: 10.1093/nar/gkh340
PMID: 15034147
URL: http://nar.oxfordjournals.org/content/32/5/1792
eprint: http://nar.oxfordjournals.org/content/32/5/1792.full.pdf+html
- Author: Robert C. Edgar
Title: "MUSCLE: a multiple sequence alignment method with reduced time and space complexity"
Journal: BMC Bioinformatics
Year: 2004
Volume: 19
Number: 5
Pages: 113
DOI: 10.1186/1471-2105-5-113
PMID: 15318951
URL: http://www.biomedcentral.com/1471-2105/5/113
eprint: http://www.biomedcentral.com/content/pdf/1471-2105-5-113.pdf
The experts for preparing Live CDs are probably within the
live-helper group. They have prepared a Wiki page underneath
http://debian-live.alioth.debian.org/.
This folder is meant ot collect scripts to
prepare live CDs for our needs
share ideas, images, ..
to achieve such.
Somebody coming in saying that this whole approach is rotten
and some functionality provided by the live-helper team already
should be used instead is much welcome.
Steffen <moeller@debian.org> 2008
# GPLed (C) Steffen Moeller <moeller@debian.org>
# Determine a two-letter country code for the Debian-mirror
# from which to retrieve the data.
country="de"
# Select an architecture to work with
architecture="i386"
#architecture="amd64" # has compatibility issues with VirtualBox
# Specify additional servers to download packages from, comma-separated
# if there are multiple ones
#unofficialServers="http://pc02.inb.uni-luebeck.de:8080/~moeller/debian/unstable"
# What Debian distribution should be taken?
#distribution="sarge"
#distribution="etch"
#distribution="lenny"
distribution="sid"
# which packages from main should be added?
# let selection begin with a blank if not empty
main="gnuplot med-bio med-bio-dev"
# which packages from contrib should be added?
# let selection begin with a blank if not empty
contrib=""
# which packages from non-free should be added?
# let selection begin with a blank if not empty
nonfree="clustalw"
# which packages from experimental should be added?
# let selection begin with a blank if not empty
experimental="infernal"
# what selection of extraneous packages should be considered?
extraneous="tomcat5.5"
# what set of scripts shall be executed after the preparation of the chroot?
hooks="01_remove_gnustep.sh 10_remove_considerably_exotic_tools.sh 9999_apt-get_autoremove.sh"
#!/bin/sh
echo Executing hooks $0
(echo Y; echo Y) | aptitude --full-resolver purge gnustep-base-runtime gnustep-common gnustep-gpbs gnustep-base-common adun.app
#!/bin/sh
echo Executing hooks $0
dpkg --purge garlic
#!/bin/sh
echo Executing hooks $0
echo Y | apt-get autoremove