Skip to content
Commits on Source (11)
......@@ -2,9 +2,9 @@ language: python
sudo: false
python:
- '2.7'
- '3.4'
- '3.5'
- '3.6'
- '3.7'
- '3.8'
services:
- redis
branches:
......
3.1.2:
#18 Add a protocol option to set CURLOPT_FTP_FILEMETHOD
#19 Rename protocol options to options
Fix copy of production files instead of download when files are in subdirectories
3.1.1:
#17 Support MDTM command in directftp
3.1.0:
#16 Don't change name after download in DirectHTTPDownloader
PR #7 Refactor downloaders (*WARNING* breaks API)
......
......@@ -58,3 +58,47 @@ If you cloned the repository and installed it via python setup.py install, just
Web processes should be behind a proxy/load balancer, API base url /api/download
Prometheus endpoint metrics are exposed via /metrics on web server
# Download options
Since version 3.0.26, you can use the `set_options` method to pass a dictionary of downloader-specific options.
The following list shows some options and their effect (the option to set is the key and the parameter is the associated value):
* **skip_check_uncompress**:
* parameter: bool.
* downloader(s): all.
* effect: If true, don't test the archives after download.
* default: false (i.e. test the archives).
* **ssl_verifyhost**:
* parameter: bool.
* downloader(s): `CurlDownloader`, `DirectFTPDownload`, `DirectHTTPDownload`.
* effect: If false, don't check that the name of the remote server is the same than in the SSL certificate.
* default: true (i.e. check host name).
* note: It's generally a bad idea to disable this verification. However some servers are badly configured. See [here](https://curl.haxx.se/libcurl/c/CURLOPT_SSL_VERIFYHOST.html) for the corresponding cURL option.
* **ssl_verifypeer**:
* parameter: bool.
* downloader(s): `CurlDownloader`, `DirectFTPDownload`, `DirectHTTPDownload`.
* effect: If false, don't check the authenticity of the peer's certificate.
* default: true (i.e. check authenticity).
* note: It's generally a bad idea to disable this verification. However some servers are badly configured. See [here](https://curl.haxx.se/libcurl/c/CURLOPT_SSL_VERIFYPEER.html) for the corresponding cURL option.
* **ssl_server_cert**:
* parameter: filename of the certificate.
* downloader(s): `CurlDownloader`, `DirectFTPDownload`, `DirectHTTPDownload`.
* effect: Pass a file holding one or more certificates to verify the peer with.
* default: use OS certificates.
* note: See [here](https://curl.haxx.se/libcurl/c/CURLOPT_CAINFO.html) for the corresponding cURL option.
* **tcp_keepalive**:
* parameter: int.
* downloader(s): `CurlDownloader`, `DirectFTPDownload`, `DirectHTTPDownload`.
* effect: Sets the interval, in seconds, that the operating system will wait between sending keepalive probes.
* default: cURL default (60s at the time of this writing).
* note: See [here](https://curl.haxx.se/libcurl/c/CURLOPT_TCP_KEEPINTVL.html) for the corresponding cURL option.
* **ftp_method**:
* parameter: one of `default`, `multicwd`, `nocwd`, `singlecwd` (case insensitive).
* downloader(s): `CurlDownloader`, `DirectFTPDownload`, `DirectHTTPDownload`.
* effect: Sets the method to use to reach a file on a FTP(S) server (`nocwd` and `singlecwd` are usually faster but not always supported).
* default: `default` (which is `multicwd` at the time of this writing)
* note: See [here](https://curl.haxx.se/libcurl/c/CURLOPT_FTP_FILEMETHOD.html) for the corresponding cURL option.
Those options can be set in bank properties.
See file `global.properties.example` in [biomaj module](https://github.com/genouest/biomaj).
......@@ -113,6 +113,14 @@ class CurlDownload(DownloadInterface):
ftputil.stat.MSParser(),
]
# Valid values for ftp_method options as string and int
VALID_FTP_FILEMETHOD = {
"default": pycurl.FTPMETHOD_DEFAULT,
"multicwd": pycurl.FTPMETHOD_MULTICWD,
"nocwd": pycurl.FTPMETHOD_NOCWD,
"singlecwd": pycurl.FTPMETHOD_SINGLECWD,
}
def __init__(self, curl_protocol, host, rootdir, http_parse=None):
"""
Initialize a CurlDownloader.
......@@ -162,7 +170,9 @@ class CurlDownload(DownloadInterface):
# This object is shared by all operations to use the cache.
# Before using it, call method:`_basic_curl_configuration`.
self.crl = pycurl.Curl()
#
# Initialize options
#
# Should we skip SSL verification (cURL -k/--insecure option)
self.ssl_verifyhost = True
self.ssl_verifypeer = True
......@@ -170,6 +180,8 @@ class CurlDownload(DownloadInterface):
self.ssl_server_cert = None
# Keep alive
self.tcp_keepalive = 0
# FTP method (cURL --ftp-method option)
self.ftp_method = pycurl.FTPMETHOD_DEFAULT # Use cURL default
def _basic_curl_configuration(self):
"""
......@@ -208,6 +220,9 @@ class CurlDownload(DownloadInterface):
# CURLOPT_CAPATH is for a directory of certificates.
self.crl.setopt(pycurl.CAINFO, self.ssl_server_cert)
# Configure ftp method
self.crl.setopt(pycurl.FTP_FILEMETHOD, self.ftp_method)
# Configure timeouts
self.crl.setopt(pycurl.CONNECTTIMEOUT, 300)
self.crl.setopt(pycurl.TIMEOUT, self.timeout)
......@@ -248,16 +263,23 @@ class CurlDownload(DownloadInterface):
super(CurlDownload, self).set_server(server)
self.url = self.curl_protocol + '://' + self.server
def set_options(self, protocol_options):
super(CurlDownload, self).set_options(protocol_options)
if "ssl_verifyhost" in protocol_options:
self.ssl_verifyhost = Utils.to_bool(protocol_options["ssl_verifyhost"])
if "ssl_verifypeer" in protocol_options:
self.ssl_verifypeer = Utils.to_bool(protocol_options["ssl_verifypeer"])
if "ssl_server_cert" in protocol_options:
self.ssl_server_cert = protocol_options["ssl_server_cert"]
if "tcp_keepalive" in protocol_options:
self.tcp_keepalive = Utils.to_int(protocol_options["tcp_keepalive"])
def set_options(self, options):
super(CurlDownload, self).set_options(options)
if "ssl_verifyhost" in options:
self.ssl_verifyhost = Utils.to_bool(options["ssl_verifyhost"])
if "ssl_verifypeer" in options:
self.ssl_verifypeer = Utils.to_bool(options["ssl_verifypeer"])
if "ssl_server_cert" in options:
self.ssl_server_cert = options["ssl_server_cert"]
if "tcp_keepalive" in options:
self.tcp_keepalive = Utils.to_int(options["tcp_keepalive"])
if "ftp_method" in options:
# raw_val is a string which contains the name of the option as in the CLI.
# We always convert raw_val to a valid integer
raw_val = options["ftp_method"].lower()
if raw_val not in self.VALID_FTP_FILEMETHOD:
raise ValueError("Invalid value for ftp_method")
self.ftp_method = self.VALID_FTP_FILEMETHOD[raw_val]
def _append_file_to_download(self, rfile):
# Add url and root to the file if needed (for safety)
......
......@@ -22,6 +22,7 @@ import pycurl
import re
import hashlib
import sys
import os
from biomaj_download.download.curl import CurlDownload
from biomaj_core.utils import Utils
......@@ -76,11 +77,49 @@ class DirectFTPDownload(CurlDownload):
raise ValueError(msg)
return super(DirectFTPDownload, self).set_files_to_download(files_to_download)
def _file_url(self, rfile):
# rfile['root'] is set to self.rootdir if needed but may be different.
# We don't use os.path.join because rfile['name'] may starts with /
return self.url + '/' + rfile['root'] + rfile['name']
def list(self, directory=''):
'''
FTP protocol does not give us the possibility to get file date from remote url
'''
# TODO: are we sure about this implementation ?
self._basic_curl_configuration()
for rfile in self.files_to_download:
if self.save_as is None:
self.save_as = os.path.basename(rfile['name'])
rfile['save_as'] = self.save_as
file_url = self._file_url(rfile)
try:
self.crl.setopt(pycurl.URL, file_url)
except Exception:
self.crl.setopt(pycurl.URL, file_url.encode('ascii', 'ignore'))
self.crl.setopt(pycurl.URL, file_url)
self.crl.setopt(pycurl.OPT_FILETIME, True)
self.crl.setopt(pycurl.NOBODY, True)
# Very old servers may not support the MDTM commands. Therefore,
# cURL will raise an error. In that case, we simply skip the rest
# of the function as it was done before. Download will work however.
# Note that if the file does not exist, it will be skipped too
# (that was the case before too). Of course, download will fail in
# this case.
try:
self.crl.perform()
except Exception:
continue
timestamp = self.crl.getinfo(pycurl.INFO_FILETIME)
dt = datetime.datetime.fromtimestamp(timestamp)
size_file = int(self.crl.getinfo(pycurl.CONTENT_LENGTH_DOWNLOAD))
rfile['year'] = dt.year
rfile['month'] = dt.month
rfile['day'] = dt.day
rfile['size'] = size_file
rfile['hash'] = hashlib.md5(str(timestamp).encode('utf-8')).hexdigest()
return (self.files_to_download, [])
def match(self, patterns, file_list, dir_list=None, prefix='', submatch=False):
......
......@@ -68,7 +68,7 @@ class DownloadInterface(object):
self.server = None
self.offline_dir = None
# Options
self.protocol_options = {}
self.options = {} # This field is used to forge the download message
self.skip_check_uncompress = False
#
......@@ -128,16 +128,16 @@ class DownloadInterface(object):
'''
self.credentials = userpwd
def set_options(self, protocol_options):
def set_options(self, options):
"""
Set protocol specific options.
Set download options.
Subclasses that override this method must call the
parent implementation.
Subclasses that override this method must call this implementation.
"""
self.protocol_options = protocol_options
if "skip_check_uncompress" in protocol_options:
self.skip_check_uncompress = Utils.to_bool(protocol_options["skip_check_uncompress"])
# Copy the option dict
self.options = options
if "skip_check_uncompress" in options:
self.skip_check_uncompress = Utils.to_bool(options["skip_check_uncompress"])
#
# File operations (match, list, download) and associated hook methods
......@@ -284,7 +284,10 @@ class DownloadInterface(object):
new_files_to_download.append(dfile)
index += 1
else:
if not check_exists or os.path.exists(os.path.join(root_dir, dfile['name'])):
fileName = dfile["name"]
if dfile["name"].startswith('/'):
fileName = dfile["name"][1:]
if not check_exists or os.path.exists(os.path.join(root_dir, fileName)):
dfile['root'] = root_dir
self.logger.debug('Copy file instead of downloading it: %s' % (os.path.join(root_dir, dfile['name'])))
self.files_to_copy.append(dfile)
......@@ -293,7 +296,10 @@ class DownloadInterface(object):
else:
# Copy everything
for dfile in self.files_to_download:
if not check_exists or os.path.exists(os.path.join(root_dir, dfile['name'])):
fileName = dfile["name"]
if dfile["name"].startswith('/'):
fileName = dfile["name"][1:]
if not check_exists or os.path.exists(os.path.join(root_dir, fileName)):
dfile['root'] = root_dir
self.files_to_copy.append(dfile)
else:
......
from biomaj_download.download.interface import DownloadInterface
from irods.session import iRODSSession
from irods.exception import iRODSException
from irods.models import DataObject, User
......@@ -73,17 +74,10 @@ class IRODSDownload(DownloadInterface):
file_to_get = rfile['root'] + "/" + rfile['name']
# Write the file to download in the wanted file_dir with the
# python-irods iget
obj = session.data_objects.get(file_to_get, file_dir)
except ExceptionIRODS as e:
self.logger.error(self.__class__.__name__ + ":Download:Error:Can't get irods object " + str(obj))
self.logger.error(self.__class__.__name__ + ":Download:Error:" + str(e))
session.data_objects.get(file_to_get, file_dir)
except iRODSException as e:
error = True
self.logger.error(self.__class__.__name__ + ":Download:Error:Can't get irods object " + file_to_get)
self.logger.error(self.__class__.__name__ + ":Download:Error:" + repr(e))
session.cleanup()
return(error)
class ExceptionIRODS(Exception):
def __init__(self, exception_reason):
self.exception_reason = exception_reason
def __str__(self):
return self.exception_reason
......@@ -130,7 +130,7 @@ class DownloadService(object):
credentials=None, http_parse=None, http_method=None, param=None,
proxy=None, proxy_auth='',
save_as=None, timeout_download=None, offline_dir=None,
protocol_options={}):
options={}):
protocol = downmessage_pb2.DownloadFile.Protocol.Value(protocol_name.upper())
downloader = None
if protocol in [0, 1]: # FTP, SFTP
......@@ -190,9 +190,9 @@ class DownloadService(object):
# Set the name of the BioMAJ protocol to which we respond.
downloader.set_protocol(protocol_name)
if protocol_options is not None:
self.logger.debug("Received protocol options: " + str(protocol_options))
downloader.set_options(protocol_options)
if options is not None:
self.logger.debug("Received options: " + str(options))
downloader.set_options(options)
downloader.logger = self.logger
downloader.set_files_to_download(remote_files)
......@@ -243,7 +243,7 @@ class DownloadService(object):
save_as=biomaj_file_info.remote_file.save_as,
timeout_download=biomaj_file_info.timeout_download,
offline_dir=biomaj_file_info.local_dir,
protocol_options=biomaj_file_info.protocol_options
options=biomaj_file_info.options
)
def clean(self, biomaj_file_info=None):
......
......@@ -123,6 +123,6 @@ message DownloadFile {
optional HTTP_METHOD http_method = 8 [ default = GET];
map<string, string> protocol_options = 9;
map<string, string> options = 9;
}
......@@ -19,7 +19,7 @@ DESCRIPTOR = _descriptor.FileDescriptor(
package='biomaj.download',
syntax='proto2',
serialized_options=None,
serialized_pb=_b('\n\x11\x64ownmessage.proto\x12\x0f\x62iomaj.download\"\x9d\x02\n\x04\x46ile\x12\x0c\n\x04name\x18\x01 \x02(\t\x12\x0c\n\x04root\x18\x02 \x01(\t\x12\x0f\n\x07save_as\x18\x03 \x01(\t\x12\x0b\n\x03url\x18\x04 \x01(\t\x12\x30\n\x08metadata\x18\x05 \x01(\x0b\x32\x1e.biomaj.download.File.MetaData\x1a\xa8\x01\n\x08MetaData\x12\x13\n\x0bpermissions\x18\x01 \x01(\t\x12\r\n\x05group\x18\x02 \x01(\t\x12\x0c\n\x04size\x18\x03 \x01(\x03\x12\x0c\n\x04hash\x18\x04 \x01(\t\x12\x0c\n\x04year\x18\x05 \x01(\x05\x12\r\n\x05month\x18\x06 \x01(\x05\x12\x0b\n\x03\x64\x61y\x18\x07 \x01(\x05\x12\x0e\n\x06\x66ormat\x18\x08 \x01(\t\x12\x0b\n\x03md5\x18\t \x01(\t\x12\x15\n\rdownload_time\x18\n \x01(\x03\"0\n\x08\x46ileList\x12$\n\x05\x66iles\x18\x01 \x03(\x0b\x32\x15.biomaj.download.File\"\xaa\x02\n\tOperation\x12\x32\n\x04type\x18\x01 \x02(\x0e\x32$.biomaj.download.Operation.OPERATION\x12/\n\x08\x64ownload\x18\x02 \x01(\x0b\x32\x1d.biomaj.download.DownloadFile\x12)\n\x07process\x18\x03 \x01(\x0b\x32\x18.biomaj.download.Process\x12/\n\x05trace\x18\x04 \x01(\x0b\x32 .biomaj.download.Operation.Trace\x1a*\n\x05Trace\x12\x10\n\x08trace_id\x18\x01 \x02(\t\x12\x0f\n\x07span_id\x18\x02 \x02(\t\"0\n\tOPERATION\x12\x08\n\x04LIST\x10\x00\x12\x0c\n\x08\x44OWNLOAD\x10\x01\x12\x0b\n\x07PROCESS\x10\x02\"\x17\n\x07Process\x12\x0c\n\x04\x65xec\x18\x01 \x02(\t\"\xad\x0b\n\x0c\x44ownloadFile\x12\x0c\n\x04\x62\x61nk\x18\x01 \x02(\t\x12\x0f\n\x07session\x18\x02 \x02(\t\x12\x11\n\tlocal_dir\x18\x03 \x02(\t\x12\x18\n\x10timeout_download\x18\x04 \x01(\x05\x12=\n\x0bremote_file\x18\x05 \x02(\x0b\x32(.biomaj.download.DownloadFile.RemoteFile\x12\x32\n\x05proxy\x18\x06 \x01(\x0b\x32#.biomaj.download.DownloadFile.Proxy\x12\x43\n\x0bhttp_method\x18\x08 \x01(\x0e\x32).biomaj.download.DownloadFile.HTTP_METHOD:\x03GET\x12L\n\x10protocol_options\x18\t \x03(\x0b\x32\x32.biomaj.download.DownloadFile.ProtocolOptionsEntry\x1a$\n\x05Param\x12\x0c\n\x04name\x18\x01 \x02(\t\x12\r\n\x05value\x18\x02 \x02(\t\x1a\xcd\x03\n\tHttpParse\x12\x91\x01\n\x08\x64ir_line\x18\x01 \x02(\t:\x7f<img[\\s]+src=\"[\\S]+\"[\\s]+alt=\"\\[DIR\\]\"[\\s]*/?>[\\s]*<a[\\s]+href=\"([\\S]+)/\"[\\s]*>.*([\\d]{2}-[\\w\\d]{2,5}-[\\d]{4}\\s[\\d]{2}:[\\d]{2})\x12\xa5\x01\n\tfile_line\x18\x02 \x02(\t:\x91\x01<img[\\s]+src=\"[\\S]+\"[\\s]+alt=\"\\[[\\s]+\\]\"[\\s]*/?>[\\s]<a[\\s]+href=\"([\\S]+)\".*([\\d]{2}-[\\w\\d]{2,5}-[\\d]{4}\\s[\\d]{2}:[\\d]{2})[\\s]+([\\d\\.]+[MKG]{0,1})\x12\x13\n\x08\x64ir_name\x18\x03 \x02(\x05:\x01\x31\x12\x13\n\x08\x64ir_date\x18\x04 \x02(\x05:\x01\x32\x12\x14\n\tfile_name\x18\x05 \x02(\x05:\x01\x31\x12\x14\n\tfile_date\x18\x06 \x02(\x05:\x01\x32\x12\x18\n\x10\x66ile_date_format\x18\x07 \x01(\t\x12\x14\n\tfile_size\x18\x08 \x02(\x05:\x01\x33\x1a\xb8\x02\n\nRemoteFile\x12$\n\x05\x66iles\x18\x01 \x03(\x0b\x32\x15.biomaj.download.File\x12\x38\n\x08protocol\x18\x02 \x02(\x0e\x32&.biomaj.download.DownloadFile.Protocol\x12\x0e\n\x06server\x18\x03 \x02(\t\x12\x12\n\nremote_dir\x18\x04 \x02(\t\x12\x0f\n\x07save_as\x18\x05 \x01(\t\x12\x32\n\x05param\x18\x06 \x03(\x0b\x32#.biomaj.download.DownloadFile.Param\x12;\n\nhttp_parse\x18\x07 \x01(\x0b\x32\'.biomaj.download.DownloadFile.HttpParse\x12\x13\n\x0b\x63redentials\x18\x08 \x01(\t\x12\x0f\n\x07matches\x18\t \x03(\t\x1a*\n\x05Proxy\x12\r\n\x05proxy\x18\x01 \x02(\t\x12\x12\n\nproxy_auth\x18\x02 \x01(\t\x1a\x36\n\x14ProtocolOptionsEntry\x12\x0b\n\x03key\x18\x01 \x01(\t\x12\r\n\x05value\x18\x02 \x01(\t:\x02\x38\x01\"\x93\x01\n\x08Protocol\x12\x07\n\x03\x46TP\x10\x00\x12\x08\n\x04\x46TPS\x10\x01\x12\x08\n\x04HTTP\x10\x02\x12\t\n\x05HTTPS\x10\x03\x12\r\n\tDIRECTFTP\x10\x04\x12\x0e\n\nDIRECTHTTP\x10\x05\x12\x0f\n\x0b\x44IRECTHTTPS\x10\x06\x12\t\n\x05LOCAL\x10\x07\x12\t\n\x05RSYNC\x10\x08\x12\t\n\x05IRODS\x10\t\x12\x0e\n\nDIRECTFTPS\x10\n\" \n\x0bHTTP_METHOD\x12\x07\n\x03GET\x10\x00\x12\x08\n\x04POST\x10\x01')
serialized_pb=_b('\n\x11\x64ownmessage.proto\x12\x0f\x62iomaj.download\"\x9d\x02\n\x04\x46ile\x12\x0c\n\x04name\x18\x01 \x02(\t\x12\x0c\n\x04root\x18\x02 \x01(\t\x12\x0f\n\x07save_as\x18\x03 \x01(\t\x12\x0b\n\x03url\x18\x04 \x01(\t\x12\x30\n\x08metadata\x18\x05 \x01(\x0b\x32\x1e.biomaj.download.File.MetaData\x1a\xa8\x01\n\x08MetaData\x12\x13\n\x0bpermissions\x18\x01 \x01(\t\x12\r\n\x05group\x18\x02 \x01(\t\x12\x0c\n\x04size\x18\x03 \x01(\x03\x12\x0c\n\x04hash\x18\x04 \x01(\t\x12\x0c\n\x04year\x18\x05 \x01(\x05\x12\r\n\x05month\x18\x06 \x01(\x05\x12\x0b\n\x03\x64\x61y\x18\x07 \x01(\x05\x12\x0e\n\x06\x66ormat\x18\x08 \x01(\t\x12\x0b\n\x03md5\x18\t \x01(\t\x12\x15\n\rdownload_time\x18\n \x01(\x03\"0\n\x08\x46ileList\x12$\n\x05\x66iles\x18\x01 \x03(\x0b\x32\x15.biomaj.download.File\"\xaa\x02\n\tOperation\x12\x32\n\x04type\x18\x01 \x02(\x0e\x32$.biomaj.download.Operation.OPERATION\x12/\n\x08\x64ownload\x18\x02 \x01(\x0b\x32\x1d.biomaj.download.DownloadFile\x12)\n\x07process\x18\x03 \x01(\x0b\x32\x18.biomaj.download.Process\x12/\n\x05trace\x18\x04 \x01(\x0b\x32 .biomaj.download.Operation.Trace\x1a*\n\x05Trace\x12\x10\n\x08trace_id\x18\x01 \x02(\t\x12\x0f\n\x07span_id\x18\x02 \x02(\t\"0\n\tOPERATION\x12\x08\n\x04LIST\x10\x00\x12\x0c\n\x08\x44OWNLOAD\x10\x01\x12\x0b\n\x07PROCESS\x10\x02\"\x17\n\x07Process\x12\x0c\n\x04\x65xec\x18\x01 \x02(\t\"\x94\x0b\n\x0c\x44ownloadFile\x12\x0c\n\x04\x62\x61nk\x18\x01 \x02(\t\x12\x0f\n\x07session\x18\x02 \x02(\t\x12\x11\n\tlocal_dir\x18\x03 \x02(\t\x12\x18\n\x10timeout_download\x18\x04 \x01(\x05\x12=\n\x0bremote_file\x18\x05 \x02(\x0b\x32(.biomaj.download.DownloadFile.RemoteFile\x12\x32\n\x05proxy\x18\x06 \x01(\x0b\x32#.biomaj.download.DownloadFile.Proxy\x12\x43\n\x0bhttp_method\x18\x08 \x01(\x0e\x32).biomaj.download.DownloadFile.HTTP_METHOD:\x03GET\x12;\n\x07options\x18\t \x03(\x0b\x32*.biomaj.download.DownloadFile.OptionsEntry\x1a$\n\x05Param\x12\x0c\n\x04name\x18\x01 \x02(\t\x12\r\n\x05value\x18\x02 \x02(\t\x1a\xcd\x03\n\tHttpParse\x12\x91\x01\n\x08\x64ir_line\x18\x01 \x02(\t:\x7f<img[\\s]+src=\"[\\S]+\"[\\s]+alt=\"\\[DIR\\]\"[\\s]*/?>[\\s]*<a[\\s]+href=\"([\\S]+)/\"[\\s]*>.*([\\d]{2}-[\\w\\d]{2,5}-[\\d]{4}\\s[\\d]{2}:[\\d]{2})\x12\xa5\x01\n\tfile_line\x18\x02 \x02(\t:\x91\x01<img[\\s]+src=\"[\\S]+\"[\\s]+alt=\"\\[[\\s]+\\]\"[\\s]*/?>[\\s]<a[\\s]+href=\"([\\S]+)\".*([\\d]{2}-[\\w\\d]{2,5}-[\\d]{4}\\s[\\d]{2}:[\\d]{2})[\\s]+([\\d\\.]+[MKG]{0,1})\x12\x13\n\x08\x64ir_name\x18\x03 \x02(\x05:\x01\x31\x12\x13\n\x08\x64ir_date\x18\x04 \x02(\x05:\x01\x32\x12\x14\n\tfile_name\x18\x05 \x02(\x05:\x01\x31\x12\x14\n\tfile_date\x18\x06 \x02(\x05:\x01\x32\x12\x18\n\x10\x66ile_date_format\x18\x07 \x01(\t\x12\x14\n\tfile_size\x18\x08 \x02(\x05:\x01\x33\x1a\xb8\x02\n\nRemoteFile\x12$\n\x05\x66iles\x18\x01 \x03(\x0b\x32\x15.biomaj.download.File\x12\x38\n\x08protocol\x18\x02 \x02(\x0e\x32&.biomaj.download.DownloadFile.Protocol\x12\x0e\n\x06server\x18\x03 \x02(\t\x12\x12\n\nremote_dir\x18\x04 \x02(\t\x12\x0f\n\x07save_as\x18\x05 \x01(\t\x12\x32\n\x05param\x18\x06 \x03(\x0b\x32#.biomaj.download.DownloadFile.Param\x12;\n\nhttp_parse\x18\x07 \x01(\x0b\x32\'.biomaj.download.DownloadFile.HttpParse\x12\x13\n\x0b\x63redentials\x18\x08 \x01(\t\x12\x0f\n\x07matches\x18\t \x03(\t\x1a*\n\x05Proxy\x12\r\n\x05proxy\x18\x01 \x02(\t\x12\x12\n\nproxy_auth\x18\x02 \x01(\t\x1a.\n\x0cOptionsEntry\x12\x0b\n\x03key\x18\x01 \x01(\t\x12\r\n\x05value\x18\x02 \x01(\t:\x02\x38\x01\"\x93\x01\n\x08Protocol\x12\x07\n\x03\x46TP\x10\x00\x12\x08\n\x04\x46TPS\x10\x01\x12\x08\n\x04HTTP\x10\x02\x12\t\n\x05HTTPS\x10\x03\x12\r\n\tDIRECTFTP\x10\x04\x12\x0e\n\nDIRECTHTTP\x10\x05\x12\x0f\n\x0b\x44IRECTHTTPS\x10\x06\x12\t\n\x05LOCAL\x10\x07\x12\t\n\x05RSYNC\x10\x08\x12\t\n\x05IRODS\x10\t\x12\x0e\n\nDIRECTFTPS\x10\n\" \n\x0bHTTP_METHOD\x12\x07\n\x03GET\x10\x00\x12\x08\n\x04POST\x10\x01')
)
......@@ -103,8 +103,8 @@ _DOWNLOADFILE_PROTOCOL = _descriptor.EnumDescriptor(
],
containing_type=None,
serialized_options=None,
serialized_start=1975,
serialized_end=2122,
serialized_start=1950,
serialized_end=2097,
)
_sym_db.RegisterEnumDescriptor(_DOWNLOADFILE_PROTOCOL)
......@@ -125,8 +125,8 @@ _DOWNLOADFILE_HTTP_METHOD = _descriptor.EnumDescriptor(
],
containing_type=None,
serialized_options=None,
serialized_start=2124,
serialized_end=2156,
serialized_start=2099,
serialized_end=2131,
)
_sym_db.RegisterEnumDescriptor(_DOWNLOADFILE_HTTP_METHOD)
......@@ -468,8 +468,8 @@ _DOWNLOADFILE_PARAM = _descriptor.Descriptor(
extension_ranges=[],
oneofs=[
],
serialized_start=1057,
serialized_end=1093,
serialized_start=1040,
serialized_end=1076,
)
_DOWNLOADFILE_HTTPPARSE = _descriptor.Descriptor(
......@@ -547,8 +547,8 @@ _DOWNLOADFILE_HTTPPARSE = _descriptor.Descriptor(
extension_ranges=[],
oneofs=[
],
serialized_start=1096,
serialized_end=1557,
serialized_start=1079,
serialized_end=1540,
)
_DOWNLOADFILE_REMOTEFILE = _descriptor.Descriptor(
......@@ -633,8 +633,8 @@ _DOWNLOADFILE_REMOTEFILE = _descriptor.Descriptor(
extension_ranges=[],
oneofs=[
],
serialized_start=1560,
serialized_end=1872,
serialized_start=1543,
serialized_end=1855,
)
_DOWNLOADFILE_PROXY = _descriptor.Descriptor(
......@@ -670,26 +670,26 @@ _DOWNLOADFILE_PROXY = _descriptor.Descriptor(
extension_ranges=[],
oneofs=[
],
serialized_start=1874,
serialized_end=1916,
serialized_start=1857,
serialized_end=1899,
)
_DOWNLOADFILE_PROTOCOLOPTIONSENTRY = _descriptor.Descriptor(
name='ProtocolOptionsEntry',
full_name='biomaj.download.DownloadFile.ProtocolOptionsEntry',
_DOWNLOADFILE_OPTIONSENTRY = _descriptor.Descriptor(
name='OptionsEntry',
full_name='biomaj.download.DownloadFile.OptionsEntry',
filename=None,
file=DESCRIPTOR,
containing_type=None,
fields=[
_descriptor.FieldDescriptor(
name='key', full_name='biomaj.download.DownloadFile.ProtocolOptionsEntry.key', index=0,
name='key', full_name='biomaj.download.DownloadFile.OptionsEntry.key', index=0,
number=1, type=9, cpp_type=9, label=1,
has_default_value=False, default_value=_b("").decode('utf-8'),
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
serialized_options=None, file=DESCRIPTOR),
_descriptor.FieldDescriptor(
name='value', full_name='biomaj.download.DownloadFile.ProtocolOptionsEntry.value', index=1,
name='value', full_name='biomaj.download.DownloadFile.OptionsEntry.value', index=1,
number=2, type=9, cpp_type=9, label=1,
has_default_value=False, default_value=_b("").decode('utf-8'),
message_type=None, enum_type=None, containing_type=None,
......@@ -707,8 +707,8 @@ _DOWNLOADFILE_PROTOCOLOPTIONSENTRY = _descriptor.Descriptor(
extension_ranges=[],
oneofs=[
],
serialized_start=1918,
serialized_end=1972,
serialized_start=1901,
serialized_end=1947,
)
_DOWNLOADFILE = _descriptor.Descriptor(
......@@ -768,7 +768,7 @@ _DOWNLOADFILE = _descriptor.Descriptor(
is_extension=False, extension_scope=None,
serialized_options=None, file=DESCRIPTOR),
_descriptor.FieldDescriptor(
name='protocol_options', full_name='biomaj.download.DownloadFile.protocol_options', index=7,
name='options', full_name='biomaj.download.DownloadFile.options', index=7,
number=9, type=11, cpp_type=10, label=3,
has_default_value=False, default_value=[],
message_type=None, enum_type=None, containing_type=None,
......@@ -777,7 +777,7 @@ _DOWNLOADFILE = _descriptor.Descriptor(
],
extensions=[
],
nested_types=[_DOWNLOADFILE_PARAM, _DOWNLOADFILE_HTTPPARSE, _DOWNLOADFILE_REMOTEFILE, _DOWNLOADFILE_PROXY, _DOWNLOADFILE_PROTOCOLOPTIONSENTRY, ],
nested_types=[_DOWNLOADFILE_PARAM, _DOWNLOADFILE_HTTPPARSE, _DOWNLOADFILE_REMOTEFILE, _DOWNLOADFILE_PROXY, _DOWNLOADFILE_OPTIONSENTRY, ],
enum_types=[
_DOWNLOADFILE_PROTOCOL,
_DOWNLOADFILE_HTTP_METHOD,
......@@ -789,7 +789,7 @@ _DOWNLOADFILE = _descriptor.Descriptor(
oneofs=[
],
serialized_start=703,
serialized_end=2156,
serialized_end=2131,
)
_FILE_METADATA.containing_type = _FILE
......@@ -809,11 +809,11 @@ _DOWNLOADFILE_REMOTEFILE.fields_by_name['param'].message_type = _DOWNLOADFILE_PA
_DOWNLOADFILE_REMOTEFILE.fields_by_name['http_parse'].message_type = _DOWNLOADFILE_HTTPPARSE
_DOWNLOADFILE_REMOTEFILE.containing_type = _DOWNLOADFILE
_DOWNLOADFILE_PROXY.containing_type = _DOWNLOADFILE
_DOWNLOADFILE_PROTOCOLOPTIONSENTRY.containing_type = _DOWNLOADFILE
_DOWNLOADFILE_OPTIONSENTRY.containing_type = _DOWNLOADFILE
_DOWNLOADFILE.fields_by_name['remote_file'].message_type = _DOWNLOADFILE_REMOTEFILE
_DOWNLOADFILE.fields_by_name['proxy'].message_type = _DOWNLOADFILE_PROXY
_DOWNLOADFILE.fields_by_name['http_method'].enum_type = _DOWNLOADFILE_HTTP_METHOD
_DOWNLOADFILE.fields_by_name['protocol_options'].message_type = _DOWNLOADFILE_PROTOCOLOPTIONSENTRY
_DOWNLOADFILE.fields_by_name['options'].message_type = _DOWNLOADFILE_OPTIONSENTRY
_DOWNLOADFILE_PROTOCOL.containing_type = _DOWNLOADFILE
_DOWNLOADFILE_HTTP_METHOD.containing_type = _DOWNLOADFILE
DESCRIPTOR.message_types_by_name['File'] = _FILE
......@@ -897,10 +897,10 @@ DownloadFile = _reflection.GeneratedProtocolMessageType('DownloadFile', (_messag
))
,
ProtocolOptionsEntry = _reflection.GeneratedProtocolMessageType('ProtocolOptionsEntry', (_message.Message,), dict(
DESCRIPTOR = _DOWNLOADFILE_PROTOCOLOPTIONSENTRY,
OptionsEntry = _reflection.GeneratedProtocolMessageType('OptionsEntry', (_message.Message,), dict(
DESCRIPTOR = _DOWNLOADFILE_OPTIONSENTRY,
__module__ = 'downmessage_pb2'
# @@protoc_insertion_point(class_scope:biomaj.download.DownloadFile.ProtocolOptionsEntry)
# @@protoc_insertion_point(class_scope:biomaj.download.DownloadFile.OptionsEntry)
))
,
DESCRIPTOR = _DOWNLOADFILE,
......@@ -912,8 +912,8 @@ _sym_db.RegisterMessage(DownloadFile.Param)
_sym_db.RegisterMessage(DownloadFile.HttpParse)
_sym_db.RegisterMessage(DownloadFile.RemoteFile)
_sym_db.RegisterMessage(DownloadFile.Proxy)
_sym_db.RegisterMessage(DownloadFile.ProtocolOptionsEntry)
_sym_db.RegisterMessage(DownloadFile.OptionsEntry)
_DOWNLOADFILE_PROTOCOLOPTIONSENTRY._options = None
_DOWNLOADFILE_OPTIONSENTRY._options = None
# @@protoc_insertion_point(module_scope)
biomaj3-download (3.1.2-1) UNRELEASED; urgency=medium
* Team upload.
* New upstream version
* debhelper-compat 12
* Standards-Version: 4.4.1
* Respect DEB_BUILD_OPTIONS in override_dh_auto_test target
* Remove trailing whitespace in debian/changelog
* Remove empty debian/patches/series.
* Move the autodep8 autopkgtest to an explict one, as
the module name (biomaj-downloand) doesn't match the package name
(python3-biomaj3-download).
-- Michael R. Crusoe <michael.crusoe@gmail.com> Wed, 15 Jan 2020 09:15:29 +0100
biomaj3-download (3.1.0-1) unstable; urgency=medium
[ Olivier Sallou ]
......
......@@ -2,9 +2,8 @@ Source: biomaj3-download
Maintainer: Debian Med Packaging Team <debian-med-packaging@lists.alioth.debian.org>
Uploaders: Olivier Sallou <osallou@debian.org>
Section: python
Testsuite: autopkgtest-pkg-python
Priority: optional
Build-Depends: debhelper (>= 12~),
Build-Depends: debhelper-compat (= 12),
dh-python,
protobuf-compiler,
python3-all,
......@@ -28,7 +27,7 @@ Build-Depends: debhelper (>= 12~),
python3-biomaj3-zipkin,
python3-ftputil,
rsync
Standards-Version: 4.3.0
Standards-Version: 4.4.1
Vcs-Browser: https://salsa.debian.org/med-team/biomaj3-download
Vcs-Git: https://salsa.debian.org/med-team/biomaj3-download.git
Homepage: https://github.com/genouest/biomaj-download
......@@ -37,9 +36,7 @@ Package: python3-biomaj3-download
Architecture: all
Depends: ${misc:Depends},
${python3:Depends}
Recommends: ${python3:Recommends}
Suggests: ${python3:Suggests},
python3-gunicorn,
Suggests: python3-gunicorn,
mongodb,
redis-server
Description: BioMAJ download management library
......@@ -53,4 +50,3 @@ Description: BioMAJ download management library
.
This package contains the library and microservice to manage downloads
in BioMAJ3
XB-Python-Egg-Name: biomaj-download
Subject: python irods not available, remove it from supported protocols
Description: biomaj supports irods as download protocol but irods is not
available in Debian. In the meanwhile remove support for this protocol
Author: Olivier Sallou <osallou@debian.org>
Last-Updated: 2019-03-09
Forwarded: no
--- a/requirements.txt
+++ b/requirements.txt
@@ -14,4 +14,3 @@
biomaj_zipkin
flake8
humanfriendly
-python-irodsclient
--- a/setup.py
+++ b/setup.py
@@ -54,8 +54,7 @@
'prometheus_client>=0.0.18',
'protobuf',
'requests',
- 'humanfriendly',
- 'python-irodsclient'
+ 'humanfriendly'
],
'tests_require': ['nose', 'mock'],
'test_suite': 'nose.collector',
--- a/biomaj_download/download/protocolirods.py
+++ b/biomaj_download/download/protocolirods.py
@@ -5,8 +5,6 @@
from biomaj_core.utils import Utils
from biomaj_download.download.interface import DownloadInterface
-from irods.session import iRODSSession
-from irods.models import Collection, DataObject, User
class IRODSDownload(DownloadInterface):
@@ -31,27 +29,9 @@
self.zone = str(param['zone'])
def list(self, directory=''):
- session = iRODSSession(host=self.server, port=self.port, user=self.user, password=self.password, zone=self.zone)
rfiles = []
rdirs = []
- rfile = {}
- date = None
- for result in session.query(Collection.name, DataObject.name, DataObject.size, DataObject.owner_name, DataObject.modify_time).filter(User.name == self.user).get_results():
- # if the user is biomaj : he will have access to all the irods data (biomaj ressource) : drwxr-xr-x
- # Avoid duplication
- if rfile != {} and rfile['name'] == str(result[DataObject.name]) and date == str(result[DataObject.modify_time]).split(" ")[0].split('-'):
- continue
- rfile = {}
- date = str(result[DataObject.modify_time]).split(" ")[0].split('-')
- rfile['permissions'] = "-rwxr-xr-x"
- rfile['size'] = int(result[DataObject.size])
- rfile['month'] = int(date[1])
- rfile['day'] = int(date[2])
- rfile['year'] = int(date[0])
- rfile['name'] = str(result[DataObject.name])
- rfile['download_path'] = str(result[Collection.name])
- rfiles.append(rfile)
- session.cleanup()
+ raise Exception("IRODS:NotSupported")
return (rfiles, rdirs)
def download(self, local_dir, keep_dirs=True):
@@ -65,67 +45,10 @@
:return: list of downloaded files
'''
logging.debug('IRODS:Download')
- try:
- os.chdir(local_dir)
- except TypeError:
- logging.error("IRODS:list:Could not find offline_dir")
- nb_files = len(self.files_to_download)
- cur_files = 1
- # give a working directory to copy the file from irods
- remote_dir = self.remote_dir
- for rfile in self.files_to_download:
- if self.kill_received:
- raise Exception('Kill request received, exiting')
- file_dir = local_dir
- if 'save_as' not in rfile or rfile['save_as'] is None:
- rfile['save_as'] = rfile['name']
- if keep_dirs:
- file_dir = local_dir + os.path.dirname(rfile['save_as'])
- file_path = file_dir + '/' + os.path.basename(rfile['save_as'])
- # For unit tests only, workflow will take in charge directory creation before to avoid thread multi access
- if not os.path.exists(file_dir):
- os.makedirs(file_dir)
-
- logging.debug('IRODS:Download:Progress:' + str(cur_files) + '/' + str(nb_files) + ' downloading file ' + rfile['name'])
- logging.debug('IRODS:Download:Progress:' + str(cur_files) + '/' + str(nb_files) + ' save as ' + rfile['save_as'])
- cur_files += 1
- start_time = datetime.now()
- start_time = time.mktime(start_time.timetuple())
- self.remote_dir = rfile['root']
- error = self.irods_download(file_dir, str(self.remote_dir), str(rfile['name']))
- if error:
- rfile['download_time'] = 0
- rfile['error'] = True
- raise Exception("IRODS:Download:Error:" + rfile['root'] + '/' + rfile['name'])
- else:
- archive_status = Utils.archive_check(file_path)
- if not archive_status:
- self.logger.error('Archive is invalid or corrupted, deleting file')
- rfile['error'] = True
- if os.path.exists(file_path):
- os.remove(file_path)
- raise Exception("IRODS:Download:Error:" + rfile['root'] + '/' + rfile['name'])
-
- end_time = datetime.now()
- end_time = time.mktime(end_time.timetuple())
- rfile['download_time'] = end_time - start_time
- self.set_permissions(file_path, rfile)
- self.remote_dir = remote_dir
- return(self.files_to_download)
+ raise Exception("IRODS:NotSupported")
def irods_download(self, file_dir, file_path, file_to_download):
- error = False
- logging.debug('IRODS:IRODS DOWNLOAD')
- session = iRODSSession(host=self.server, port=self.port, user=self.user, password=self.password, zone=self.zone)
- try:
- file_to_get = str(file_path) + str(file_to_download)
- # Write the file to download in the wanted file_dir : with the python-irods iget
- obj = session.data_objects.get(file_to_get, file_dir)
- except ExceptionIRODS as e:
- logging.error("RsyncError:" + str(e))
- logging.error("RsyncError: irods object" + str(obj))
- session.cleanup()
- return(error)
+ return("irods not supported")
class ExceptionIRODS(Exception):
py_bcrypt python3-bcrypt
......@@ -16,4 +16,6 @@ override_dh_install:
sed -i '1s;^;#!/usr/bin/python3\n;' debian/python3-biomaj3-download/usr/bin/biomaj_download_consumer.py
override_dh_auto_test:
ifeq (,$(filter nocheck,$(DEB_BUILD_OPTIONS)))
nosetests3 -a !network
endif
Test-Command: set -e ; for py in $(py3versions -r 2>/dev/null) ; do cd "$AUTOPKGTEST_TMP" ; echo "Testing with $py:" ; $py -c "import biomaj-download; print(biomaj-download)" ; done
Depends: python3-all, python3-biomaj3-download
Restrictions: allow-stderr, superficial
......@@ -22,7 +22,7 @@ config = {
'url': 'http://biomaj.genouest.org',
'download_url': 'http://biomaj.genouest.org',
'author_email': 'olivier.sallou@irisa.fr',
'version': '3.1.0',
'version': '3.1.2',
'classifiers': [
# How mature is this project? Common values are
# 3 - Alpha
......
......@@ -639,6 +639,20 @@ class TestBiomajFTPDownload(unittest.TestCase):
ftpd.close()
self.assertTrue(len(ftpd.files_to_download) == 1)
def test_download_ftp_method(self):
"""
Test setting ftp_method (it probably doesn't change anything here but we
test that there is no obvious mistake in the code).
"""
ftpd = CurlDownload("ftp", "test.rebex.net", "/")
ftpd.set_options(dict(ftp_method="nocwd"))
ftpd.set_credentials("demo:password")
(file_list, dir_list) = ftpd.list()
ftpd.match(["^readme.txt$"], file_list, dir_list)
ftpd.download(self.utils.data_dir)
ftpd.close()
self.assertTrue(len(ftpd.files_to_download) == 1)
@attr('ftps')
@attr('network')
......