Skip to content
Commits on Source (4)
......@@ -2,15 +2,20 @@ language: python
sudo: false
python:
- '2.7'
- '3.4'
- '3.5'
- '3.6'
- '3.7'
- '3.8'
services:
- mongodb
- elasticsearch
branches:
except:
- "/^feature.*$/"
before_install:
- sudo apt-get install -y libgnutls-dev
addons:
apt:
update: true
install:
- pip install flake8
- pip install -r requirements.txt
......
3.1.14:
Add repair option
3.1.13:
Add process name and status in logs
PR #116 update to use download 3.1.0
3.1.12:
In case of multiple matches for release regexp, try to determine most recent one
#115 Correctly use save_as for release file name
3.1.11:
Increase one log level
#110 Allow ftps and directftps protocols (needs biomaj-download 3.0.26 and biomaj-core 3.0.19)
#111 locked bank after bad update command
Ignore UTF-8 errors in release file
Add plugin support via biomaj-plugins repo (https://github.com/genouest/biomaj-plugins) to get release and list of files to download from a plugin script.
Add support for protocol options in global and bank properties (options.names=x,y options.x=val options.y=val). Options may be ignored or used differently depending on used protocol.
3.1.10:
Allow to use hardlinks when reusing files from previous releases
3.1.9:
Fix remote.files recursion
3.1.8:
Fix uncompress when saved files contains subdirectory
3.1.7:
......
......@@ -49,20 +49,19 @@ Application Features
====================
* Synchronisation:
* Multiple remote protocols (ftp, sftp, http, local copy, etc.)
* Multiple remote protocols (ftp, ftps, http, local copy, etc.)
* Data transfers integrity check
* Release versioning using a incremental approach
* Multi threading
* Data extraction (gzip, tar, bzip)
* Data tree directory normalisation
* Plugins support for custom downloads
* Pre &Post processing :
* Advanced workflow description (D.A.G)
* Post-process indexation for various bioinformatics software (blast, srs, fastacmd, readseq, etc.)
* Easy integration of personal scripts for bank post-processing automation
* Supervision:
* Optional Administration web interface (biomaj-watcher)
* CLI management
......@@ -75,7 +74,6 @@ Application Features
* Monolithic (local install) or microservice architecture (remote access to a BioMAJ server)
* Microservice installation allows per process scalability and supervision (number of process in charge of download, execution, etc.)
* Remote access:
* Optional FTP server providing authenticated or anonymous data access
......@@ -83,6 +81,7 @@ Dependencies
============
Packages:
* Debian: libcurl-dev, gcc
* CentOs: libcurl-devel, openldap-devel, gcc
......@@ -115,7 +114,6 @@ From packages:
pip install biomaj biomaj-cli biomaj-daemon
You should consider using a Python virtual environment (virtualenv) to install BioMAJ.
In tools/examples, copy the global.properties and update it to match your local
......@@ -123,13 +121,11 @@ installation.
The tools/process contains example process files (python and shell).
Docker
======
You can use BioMAJ with Docker (genouest/biomaj)
docker pull genouest/biomaj
docker pull mongo
docker run --name biomaj-mongodb -d mongo
......@@ -147,6 +143,51 @@ No default bank property file or process are available in the container.
Examples are available at https://github.com/genouest/biomaj-data
Import bank templates
=====================
Once biomaj is installed, it is possible to import some bank examples with the biomaj client
# List available templates
biomaj-cli ... --data-list
# Import a bank template
biomaj-cli ... --data-import --bank alu
# then edit bank template in config directory if needed and launch bank update
biomaj-cli ... --update --bank alu
Plugins
=======
BioMAJ support python plugins to manage custom downloads where supported protocols
are not enough (http page with unformatted listing, access to protected pages, etc.).
Example of plugins and how to configure them are available on [biomaj-plugins](https://github.com/genouest/biomaj-plugins) repository.
Plugins can define a specific way to:
* retreive release
* list remote files to download
* download remote files
Plugin can define one or many of those features.
Basically, one defined in bank property file:
# Location of plugins
plugins_dir=/opt/biomaj-plugins
# Use plugin to fetch release
release.plugin=github
# List of arguments of plugin function with key=value format, comma separated
release.plugin_args=repo=osallou/goterra-cli
Plugins are used when related workflow step is used:
* release.plugin <= returns remote release
* remote.plugin <= returns list of files to download
* download.plugin <= download files from list of files
API documentation
=================
......@@ -172,7 +213,6 @@ Execute unit tests but disable ones needing network access
nosetests -a '!network'
Monitoring
==========
......@@ -195,14 +235,12 @@ A-GPL v3+
Remarks
=======
Biomaj uses libcurl, for sftp libcurl must be compiled with sftp support
To delete elasticsearch index:
curl -XDELETE 'http://localhost:9200/biomaj_test/'
curl -XDELETE '<http://localhost:9200/biomaj_test/'>
Credits
======
=======
Special thanks for tuco at Pasteur Institute for the intensive testing and new ideas.
Thanks to the old BioMAJ team for the work they have done.
......
......@@ -13,6 +13,7 @@ from biomaj.mongo_connector import MongoConnector
from biomaj.session import Session
from biomaj.workflow import UpdateWorkflow
from biomaj.workflow import RemoveWorkflow
from biomaj.workflow import RepairWorkflow
from biomaj.workflow import Workflow
from biomaj.workflow import ReleaseCheckWorkflow
from biomaj_core.config import BiomajConfig
......@@ -502,8 +503,10 @@ class Bank(object):
# Insert session
if self.session.get('action') == 'update':
action = 'last_update_session'
if self.session.get('action') == 'remove':
elif self.session.get('action') == 'remove':
action = 'last_remove_session'
else:
action = 'last_update_session'
cache_dir = self.config.get('cache.dir')
download_files = self.session.get('download_files')
......@@ -1112,6 +1115,74 @@ class Bank(object):
return res
def repair(self):
"""
Launch a bank repair
:return: bool
"""
logging.warning('Bank:' + self.name + ':Repair')
start_time = datetime.now()
start_time = time.mktime(start_time.timetuple())
if not self.is_owner():
logging.error('Not authorized, bank owned by ' + self.bank['properties']['owner'])
raise Exception('Not authorized, bank owned by ' + self.bank['properties']['owner'])
self.run_depends = False
self.controls()
if self.options.get_option('release'):
logging.info('Bank:' + self.name + ':Release:' + self.options.get_option('release'))
s = self.get_session_from_release(self.options.get_option('release'))
# No session in prod
if s is None:
logging.error('Release does not exists: ' + self.options.get_option('release'))
return False
self.load_session(UpdateWorkflow.FLOW, s)
else:
logging.info('Bank:' + self.name + ':Release:latest')
self.load_session(UpdateWorkflow.FLOW)
self.session.set('action', 'update')
res = self.start_repair()
self.session.set('workflow_status', res)
self.save_session()
try:
self.__stats()
except Exception:
logging.exception('Failed to send stats')
end_time = datetime.now()
end_time = time.mktime(end_time.timetuple())
self.history.insert({
'bank': self.name,
'error': not res,
'start': start_time,
'end': end_time,
'action': 'repair',
'updated': self.session.get('update')
})
return res
def start_repair(self):
"""
Start an repair workflow
"""
workflow = RepairWorkflow(self)
if self.options and self.options.get_option('redis_host'):
redis_client = redis.StrictRedis(
host=self.options.get_option('redis_host'),
port=self.options.get_option('redis_port'),
db=self.options.get_option('redis_db'),
decode_responses=True
)
workflow.redis_client = redis_client
workflow.redis_prefix = self.options.get_option('redis_prefix')
if redis_client.get(self.options.get_option('redis_prefix') + ':' + self.name + ':action:cancel'):
logging.warn('Cancel requested, stopping update')
redis_client.delete(self.options.get_option('redis_prefix') + ':' + self.name + ':action:cancel')
return False
return workflow.start()
def update(self, depends=False):
"""
Launch a bank update
......@@ -1165,7 +1236,9 @@ class Bank(object):
if not reset:
logging.info("Process %s not found in %s" % (str(proc), task['name']))
return False
if not set_to_false:
logging.error('No task found named %s' % (self.options.get_option('from_task')))
return False
self.session.set('action', 'update')
res = self.start_update()
self.session.set('workflow_status', res)
......
......@@ -262,8 +262,11 @@ class MetaProcess(threading.Thread):
processes_status[bprocess] = res
self.set_progress(bmaj_process.name, res)
if not res:
logging.info("PROC:META:RUN:PROCESS:ERROR:" + bmaj_process.name)
self.global_status = False
break
else:
logging.info("PROC:META:RUN:PROCESS:OK:" + bmaj_process.name)
if not self.simulate:
if self._lock:
self._lock.acquire()
......
This diff is collapsed.
biomaj:
global_properties: '/pasteur/services/policy01/banques/biomaj3/global.properties'
global_properties: '/etc/biomaj/global.properties'
rabbitmq:
host: '127.0.0.1'
......
biomaj3 (3.1.14-1) unstable; urgency=medium
* New upstream release
-- Olivier Sallou <osallou@debian.org> Tue, 12 Nov 2019 10:29:09 +0000
biomaj3 (3.1.8-1) unstable; urgency=medium
* New upstream release
......
......@@ -29,9 +29,9 @@ Architecture: all
Depends: ${misc:Depends},
${python3:Depends},
unzip
Recommends: ${python3:Recommends},
python3-biomaj3-cli
Recommends: ${python3:Recommends}
Suggests: ${python3:Suggests},
python3-biomaj3-cli,
python3-gunicorn,
mongodb,
redis-server
......
......@@ -31,6 +31,17 @@ redis.port=6379
redis.db=0
redis.prefix=biomaj
# options to send to downloaders (protocol dependent)
# options.names=option1,option2
# options.option1=value1
# options.option2=value2
#
# Example supported options for ftp/http
# tcp_keepalive: number of seconds for curl keep alive (maps to curl option TCP_KEEPALIVE, TCP_KEEPINTVL and TCP_KEEPIDLE)
# ssl_verifyhost: boolean, check or skips SSL host (maps to curl option SSL_VERIFYHOST)
# ssl_verifypeer: boolean, SSL checks (maps to curl option SSL_VERIFYPEER)
# ssl_server_cert: path to certificate (maps to curl option CAINFO).
# Influxdb configuration (optional)
# User and db must be manually created in influxdb before use
......@@ -102,6 +113,9 @@ bank.num.threads=4
#Number of threads to use for downloading and processing
files.num.threads=4
#Timeout for a download, in seconds (applies for each file, not for the whole downloads)
timeout.download=3600
#to keep more than one release increase this value
keep.old.version=0
......@@ -136,6 +150,9 @@ ftp.active.mode=false
# Bank default access
visibility.default=public
# use hard links instead of copy
# use_hardlinks=0
[loggers]
keys = root, biomaj
......
......@@ -31,11 +31,12 @@ except UnicodeDecodeError:
config = {
'description': 'BioMAJ',
'long_description': README + '\n\n' + CHANGES,
'long_description_content_type': 'text/markdown',
'author': 'Olivier Sallou',
'url': 'http://biomaj.genouest.org',
'download_url': 'http://biomaj.genouest.org',
'author_email': 'olivier.sallou@irisa.fr',
'version': '3.1.8',
'version': '3.1.14',
'classifiers': [
# How mature is this project? Common values are
# 3 - Alpha
......@@ -72,7 +73,8 @@ config = {
'requests',
'redis',
'elasticsearch',
'influxdb'
'influxdb',
'Yapsy==1.12.2'
],
'tests_require': ['nose', 'mock'],
'test_suite': 'nose.collector',
......
......@@ -21,12 +21,6 @@ from biomaj.workflow import Workflow
from biomaj.workflow import UpdateWorkflow
from biomaj.workflow import ReleaseCheckWorkflow
from biomaj_core.utils import Utils
from biomaj_download.download.ftp import FTPDownload
from biomaj_download.download.direct import DirectFTPDownload
from biomaj_download.download.direct import DirectHttpDownload
from biomaj_download.download.http import HTTPDownload
from biomaj_download.download.localcopy import LocalDownload
from biomaj_download.download.downloadthreads import DownloadThread
from biomaj_core.config import BiomajConfig
from biomaj.process.processfactory import PostProcessFactory
from biomaj.process.processfactory import PreProcessFactory
......@@ -103,7 +97,11 @@ class UtilsForTest():
os.chmod(to_file, stat.S_IRWXU)
# Manage local bank test, use bank test subdir as remote
properties = ['multi.properties', 'computederror.properties', 'error.properties', 'local.properties', 'localprocess.properties', 'testhttp.properties', 'computed.properties', 'computed2.properties', 'sub1.properties', 'sub2.properties']
properties = ['multi.properties', 'computederror.properties',
'error.properties', 'local.properties',
'localprocess.properties', 'testhttp.properties',
'computed.properties', 'computed2.properties',
'sub1.properties', 'sub2.properties']
for prop in properties:
from_file = os.path.join(curdir, prop)
to_file = os.path.join(self.conf_dir, prop)
......@@ -421,6 +419,58 @@ class TestBiomajFunctional(unittest.TestCase):
b.update()
self.assertTrue(b.session.get('update'))
def test_update_hardlinks(self):
"""
Update a bank twice with hard links. Files copied from previous release
must be links.
"""
b = Bank('local')
b.config.set('keep.old.version', '3')
b.config.set('use_hardlinks', '1')
# Create a file in bank dir (which is the source dir) so we can manipulate
# it. The pattern is taken into account by the bank configuration.
# Note that this file is created in the source tree so we remove it after
# or if this test fails in between.
tmp_remote_file = b.config.get('remote.dir') + 'test.safe_to_del'
if os.path.exists(tmp_remote_file):
os.remove(tmp_remote_file)
open(tmp_remote_file, "w")
# First update
b.update()
self.assertTrue(b.session.get('update'))
old_release = b.session.get_full_release_directory()
# Touch tmp_remote_file to force update. We set the date to tomorrow so we
# are sure that a new release will be detected.
tomorrow = time.time() + 3660 * 24 # 3660s for safety (leap second, etc.)
os.utime(tmp_remote_file, (tomorrow, tomorrow))
# Second update
try:
b.update()
self.assertTrue(b.session.get('update'))
new_release = b.session.get_full_release_directory()
# Test that files in both releases are links to the the same file.
# We can't use tmp_remote_file because it's the source of update and we
# can't use test.fasta.gz because it is uncompressed and then not the
# same file.
for f in ['test2.fasta', 'test_100.txt']:
file_old_release = os.path.join(old_release, 'flat', f)
file_new_release = os.path.join(new_release, 'flat', f)
try:
self.assertTrue(os.path.samefile(file_old_release, file_new_release))
except AssertionError:
msg = "In %s: copy worked but hardlinks were not used." % self.id()
logging.info(msg)
# Test that no links are done for tmp_remote_file
file_old_release = os.path.join(old_release, 'flat', 'test.safe_to_del')
file_new_release = os.path.join(new_release, 'flat', 'test.safe_to_del')
self.assertFalse(os.path.samefile(file_old_release, file_new_release))
except Exception:
raise
finally:
# Remove file
if os.path.exists(tmp_remote_file):
os.remove(tmp_remote_file)
def test_fromscratch_update(self):
"""
Try updating twice, at second time, bank should be updated (force with fromscratc)
......@@ -736,6 +786,7 @@ class TestBiomajFunctional(unittest.TestCase):
def test_multi(self):
b = Bank('multi')
res = b.update()
self.assertTrue(res)
with open(os.path.join(b.session.get_full_release_directory(),'flat/test1.json'), 'r') as content_file:
content = content_file.read()
my_json = json.loads(content)
......