Skip to content
Commits on Source (4)
*.pyc
/src/*.egg-info
/build
/dist
/docs/_build
__pycache__
.eggs/
.cache/
/test-report.xml
/test-report-*.xml
/venv/
/src/toil/test/cwl/spec
/cwltool_deps/
/docs/generated_rst/
/docker/Dockerfile
/docker/toil-*.tar.gz
/src/toil/version.py
language: python
python:
- "2.7"
install:
- make prepare
- make develop extras=[aws,google] # adding extras to avoid import errors
script:
- TOIL_TEST_QUICK=True make test_offline
# Contributor Covenant Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
## Our Standards
Examples of behavior that contributes to creating a positive environment include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a professional setting
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
## Scope
This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at bd2k-genomics@googlegroups.com. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version]
[homepage]: http://contributor-covenant.org
[version]: http://contributor-covenant.org/version/1/4/
Contribution Guidelines
=======================
Before proposing a pull request, please read our [Contributor's Guide][1].
[1]: https://toil.readthedocs.io/en/latest/contributing/contributing.html#contributing "Toil Contributor's Guide"
Copyright (C) 2011-15 by UCSC Computational Genomics Lab
Contributors: Benedict Paten, Hannes Schmidt, John Vivian,
Christopher Ketchum, Joel Armstrong and co-authors (benedictpaten@gmail.com)
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
\ No newline at end of file
# Copyright (C) 2015 UCSC Computational Genomics Lab
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
define help
Supported targets: prepare, develop, docs, sdist, clean, test, pypi, docker and push_docker.
Please note that all build targets require a virtualenv to be active.
The 'prepare' target installs Toil's build requirements into the current virtualenv.
The 'develop' target creates an editable install of Toil and its runtime requirements in the
current virtualenv. The install is called 'editable' because changes to the source code
immediately affect the virtualenv. Set the 'extras' variable to ensure that the 'develop' target
installs support for extras. Consult setup.py for the list of supported extras. To install Toil
in develop mode with all extras, run
make develop extras=[mesos,aws,google,azure,cwl,encryption]
The 'sdist' target creates a source distribution of Toil. It is used for some unit tests and for
installing the currently checked out version of Toil into the appliance image.
The 'clean' target cleans up the side effects of 'develop', 'sdist', 'docs', 'pypi' and 'docker'
on this machine. It does not undo externally visible effects like removing packages already
uploaded to PyPI.
The 'docs' target uses Sphinx to create HTML documentation in the docs/_build directory
The 'test' target runs Toil's unit tests serially with pytest. It will run some docker tests and
setup. If you wish to avoid this, use the 'test_offline' target instead. Note: this target does not
capture output from the terminal. For any of the test targets, set the 'tests' variable to run a
particular test, e.g.
make test tests=src/toil/test/sort/sortTest.py::SortTest::testSort
The 'test_offline' target is similar to 'test' but it skips the docker dependent tests and their
setup.
The 'integration_test_local' target runs toil's integration tests. These are more thorough but also
more costly than the regular unit tests. For the AWS integration tests to run, the environment
variable 'TOIL_AWS_KEYNAME' must be set. This user will be charged for expenses acrued during the
test. This test does not capture terminal output.
The 'integration_test' target is the same as the previous except that it does capture output.
The 'test_parallel' target runs Toil's unit tests in parallel and generates an XML test report
from the results. It is designed to be used only in Jenkins.
The 'pypi' target publishes the current commit of Toil to PyPI after enforcing that the working
copy and the index are clean.
The 'docker' target builds the Docker images that make up the Toil appliance. You may set the
TOIL_DOCKER_REGISTRY variable to override the default registry that the 'push_docker' target pushes
the appliance images to, for example:
TOIL_DOCKER_REGISTRY=quay.io/USER make docker
If Docker is not installed, Docker-related targets tasks and tests will be skipped. The
same can be achieved by setting TOIL_DOCKER_REGISTRY to an empty string.
The 'push_docker' target pushes the Toil appliance images to a remote Docker registry. It
requires the TOIL_DOCKER_REGISTRY variable to be set to a value other than the default to avoid
accidentally pushing to the official Docker registry for Toil.
The TOIL_DOCKER_NAME environment variable can be set to customize the appliance image name that
is created by the 'docker' target and pushed by the 'push_docker' target. The Toil team's
continuous integration system overrides this variable to avoid conflicts between concurrently
executing builds for the same revision, e.g. toil-pr and toil-it.
endef
export help
help:
@printf "$$help"
# This Makefile uses bash features like printf and <()
SHELL=bash
python=python2.7
pip=pip2.7
tests=src
tests_local=src/toil/test
# do slightly less than travis timeout of 10 min.
pytest_args_local=-vv --timeout=530
extras=
dist_version:=$(shell $(python) version_template.py distVersion)
sdist_name:=toil-$(dist_version).tar.gz
docker_tag:=$(shell $(python) version_template.py dockerTag)
default_docker_registry:=$(shell $(python) version_template.py dockerRegistry)
docker_path:=$(strip $(shell which docker))
ifdef docker_path
ifdef docker_registry
export TOIL_DOCKER_REGISTRY?=$(docker_registry)
else
export TOIL_DOCKER_REGISTRY?=$(default_docker_registry)
endif
else
$(warning Cannot find 'docker' executable. Docker-related targets will be skipped.)
export TOIL_DOCKER_REGISTRY:=
endif
export TOIL_DOCKER_NAME?=$(shell $(python) version_template.py dockerName)
# Note that setting TOIL_DOCKER_REGISTRY to an empty string yields an invalid TOIL_APPLIANCE_SELF
# which will coax the @needs_appliance decorator to skip the test.
export TOIL_APPLIANCE_SELF:=$(TOIL_DOCKER_REGISTRY)/$(TOIL_DOCKER_NAME):$(docker_tag)
ifndef BUILD_NUMBER
green=\033[0;32m
normal=\033[0m
red=\033[0;31m
cyan=\033[0;36m
endif
develop: check_venv
$(pip) install -e .$(extras)
clean_develop: check_venv
- $(pip) uninstall -y toil
- rm -rf src/*.egg-info
- rm src/toil/version.py
sdist: dist/$(sdist_name)
dist/$(sdist_name): check_venv
@test -f dist/$(sdist_name) && mv dist/$(sdist_name) dist/$(sdist_name).old || true
$(python) setup.py sdist
@test -f dist/$(sdist_name).old \
&& ( cmp -s <(tar -xOzf dist/$(sdist_name)) <(tar -xOzf dist/$(sdist_name).old) \
&& mv dist/$(sdist_name).old dist/$(sdist_name) \
&& printf "$(cyan)No significant changes to sdist, reinstating backup.$(normal)\n" \
|| rm dist/$(sdist_name).old ) \
|| true
clean_sdist:
- rm -rf dist
- rm src/toil/version.py
# This target will skip building docker and all docker based tests
test_offline: check_venv check_build_reqs
@printf "$(cyan)All docker related tests will be skipped.$(normal)\n"
TOIL_SKIP_DOCKER=True \
$(python) -m pytest $(pytest_args_local) $(tests_local)
# The hot deployment test needs the docker appliance
test: check_venv check_build_reqs docker
TOIL_APPLIANCE_SELF=$(docker_registry)/$(docker_base_name):$(docker_tag) \
$(python) -m pytest $(pytest_args_local) $(tests)
# For running integration tests locally in series (uses the -s argument for pyTest)
integration_test_local: check_venv check_build_reqs sdist push_docker
TOIL_TEST_INTEGRATIVE=True \
$(python) run_tests.py --local integration-test $(tests)
# These two targets are for backwards compatibility but will be removed shortly
# FIXME when they are removed add check_running_on_jenkins to the jenkins targets
test_parallel: jenkins_test_parallel
integration_test: jenkins_test_integration
# This target is designed only for use on Jenkins
jenkins_test_parallel: check_venv check_build_reqs docker
$(python) run_tests.py test $(tests)
# This target is designed only for use on Jenkins
jenkins_test_integration: check_venv check_build_reqs sdist push_docker
TOIL_TEST_INTEGRATIVE=True $(python) run_tests.py integration-test $(tests)
pypi: check_venv check_clean_working_copy check_running_on_jenkins
$(pip) install setuptools --upgrade
$(python) setup.py egg_info sdist bdist_egg upload
clean_pypi:
- rm -rf build/
ifdef TOIL_DOCKER_REGISTRY
docker_image:=$(TOIL_DOCKER_REGISTRY)/$(TOIL_DOCKER_NAME)
docker_short_tag:=$(shell $(python) version_template.py dockerShortTag)
docker_minimal_tag:=$(shell $(python) version_template.py dockerMinimalTag)
grafana_image:=$(TOIL_DOCKER_REGISTRY)/toil-grafana
prometheus_image:=$(TOIL_DOCKER_REGISTRY)/toil-prometheus
mtail_image:=$(TOIL_DOCKER_REGISTRY)/toil-mtail
define tag_docker
@printf "$(cyan)Removing old tag $2. This may fail but that's expected.$(normal)\n"
-docker rmi $2
docker tag $1 $2
@printf "$(green)Tagged appliance image $1 as $2.$(normal)\n"
endef
docker: docker/Dockerfile
@set -ex \
; cd docker \
; docker build --tag=$(docker_image):$(docker_tag) -f Dockerfile .
@set -ex \
; cd dashboard/prometheus \
; docker build --tag=$(prometheus_image):$(docker_tag) -f Dockerfile .
@set -ex \
; cd dashboard/grafana \
; docker build --tag=$(grafana_image):$(docker_tag) -f Dockerfile .
@set -ex \
; cd dashboard/mtail \
; docker build --tag=$(mtail_image):$(docker_tag) -f Dockerfile .
ifdef BUILD_NUMBER
$(call tag_docker,$(docker_image):$(docker_tag),$(docker_image):$(docker_short_tag))
$(call tag_docker,$(docker_image):$(docker_tag),$(docker_image):$(docker_minimal_tag))
endif
docker/$(sdist_name): dist/$(sdist_name)
cp $< $@
docker/Dockerfile: docker/Dockerfile.py docker/$(sdist_name)
_TOIL_SDIST_NAME=$(sdist_name) $(python) docker/Dockerfile.py > $@
clean_docker:
-rm docker/Dockerfile docker/$(sdist_name)
-docker rmi $(docker_image):$(docker_tag)
obliterate_docker: clean_docker
-@set -x \
; docker images $(docker_image) \
| tail -n +2 | awk '{print $$1 ":" $$2}' | uniq \
| xargs docker rmi
-docker images -qf dangling=true | xargs docker rmi
push_docker: docker check_docker_registry
for i in $$(seq 1 5); do docker push $(docker_image):$(docker_tag) && break || sleep 60; done
for i in $$(seq 1 5); do docker push $(grafana_image):$(docker_tag) && break || sleep 60; done
for i in $$(seq 1 5); do docker push $(prometheus_image):$(docker_tag) && break || sleep 60; done
for i in $$(seq 1 5); do docker push $(mtail_image):$(docker_tag) && break || sleep 60; done
else
docker docker_push clean_docker:
@printf "$(cyan)Skipping '$@' target as TOIL_DOCKER_REGISTRY is empty or Docker is not installed.$(normal)\n"
endif
docs: check_venv check_build_reqs
# Strange, but seemingly benign Sphinx warning floods stderr if not filtered:
cd docs && make html
clean_docs: check_venv
- cd docs && make clean
clean: clean_develop clean_sdist clean_pypi clean_docs
check_build_reqs:
@$(python) -c 'import mock; import pytest' \
|| ( printf "$(red)Build requirements are missing. Run 'make prepare' to install them.$(normal)\n" ; false )
prepare: check_venv
$(pip) install sphinx==1.5.5 mock==1.0.1 pytest==2.8.3 stubserver==1.0.1 \
pytest-timeout==1.2.0
check_venv:
@$(python) -c 'import sys; sys.exit( int( not (hasattr(sys, "real_prefix") or ( hasattr(sys, "base_prefix") and sys.base_prefix != sys.prefix ) ) ) )' \
|| ( printf "$(red)A virtualenv must be active.$(normal)\n" ; false )
check_clean_working_copy:
@printf "$(green)Checking if your working copy is clean ...$(normal)\n"
@git diff --exit-code > /dev/null \
|| ( printf "$(red)Your working copy looks dirty.$(normal)\n" ; false )
@git diff --cached --exit-code > /dev/null \
|| ( printf "$(red)Your index looks dirty.$(normal)\n" ; false )
@test -z "$$(git ls-files --other --exclude-standard --directory)" \
|| ( printf "$(red)You have untracked files:$(normal)\n" \
; git ls-files --other --exclude-standard --directory \
; false )
check_running_on_jenkins:
@printf "$(green)Checking if running on Jenkins ...$(normal)\n"
@test -n "$$BUILD_NUMBER" \
|| ( printf "$(red)This target should only be invoked on Jenkins.$(normal)\n" ; false )
check_docker_registry:
@test "$(default_docker_registry)" != "$(TOIL_DOCKER_REGISTRY)" || test -n "$$BUILD_NUMBER" \
|| ( printf '$(red)Please set TOIL_DOCKER_REGISTRY to a value other than \
$(default_docker_registry) and ensure that you have permissions to push \
to that registry. Only CI builds should push to $(default_docker_registry).$(normal)\n' ; false )
check_cpickle:
# fail if cPickle.dump(s) called without HIGHEST_PROTOCOL
# https://github.com/BD2KGenomics/toil/issues/1503
! find . -iname '*.py' | xargs grep 'cPickle.dump' | grep --invert-match HIGHEST_PROTOCOL
.PHONY: help \
prepare \
check_cpickle \
develop clean_develop \
sdist clean_sdist \
test test_offline test_parallel integration_test \
jenkins_test_parallel jenkins_test_integration \
pypi clean_pypi \
docs clean_docs \
clean \
check_venv \
check_clean_working_copy \
check_running_on_jenkins \
check_build_reqs \
docker clean_docker push_docker
Metadata-Version: 1.0
Name: toil
Version: 3.5.0a1.dev321
Summary: Pipeline management software for clusters.
Home-page: https://github.com/BD2KGenomics/toil
Author: Benedict Paten
Author-email: benedict@soe.usc.edu
License: UNKNOWN
Description: UNKNOWN
Platform: UNKNOWN
......@@ -4,11 +4,19 @@
Toil is a scalable, efficient, cross-platform pipeline management system,
written entirely in Python, and designed around the principles of functional
programming. Full documentation for the latest stable release can be found at
programming.
* Check the `website`_ for a description of Toil and its features.
* Full documentation for the latest stable release can be found at
`Read the Docs`_.
* See our occasional `blog`_ for tutorials.
* Google Groups discussion `forum`_ and videochat `invite list`_.
.. _website: http://toil.ucsc-cgl.org/
.. _Read the Docs: http://toil.readthedocs.org/
.. _forum: https://groups.google.com/forum/#!forum/toil-community
.. _invite list: https://groups.google.com/forum/#!forum/toil-community-videochats
.. _blog: https://toilpipelines.wordpress.com/
.. image:: https://badges.gitter.im/bd2k-genomics-toil/Lobby.svg
:alt: Join the chat at https://gitter.im/bd2k-genomics-toil/Lobby
......
# Near Term (In Progress, Estimated Completion Date?)
* Ansible provisioning.
- [ ] Azure
- [ ] AWS
- [ ] Google
* Libcloud provisioning
- [ ] Google
- [ ] AWS
- [ ] Azure
- [ ] Fix flaky tests
- [ ] Run massive workflows
- [ ] Better feedback (error messages, logging).
# Medium Term (~ 6-month goals, by ~June 2018?)
* Batch systems
- [ ] Google Pipelines
- [ ] Azure Batch
- [ ] AWS Batch
- [ ] Containerize leader (work with Consonance)
- [ ] Change the thread pool model to improve single machine usage.
* Improve the development process.
- [ ] Add a linter
- [ ] Add a code coverage tool.
- [ ] Organize tests.
- [ ] Better access to tests for external developers.
- [ ] TES support
- [ ] WES Support (if Consonance does not work well)
# Longer Term
- [ ] Better track versions of specifications (e.g. CWL, WDL) and dependencies.
- [ ] Add other provisioners: OpenStack
- [ ] Singularity support.
- [ ] Uniform configuration (i.e. not just environment variables).
- [ ] Add management and monitoring UIs.
- [ ] Python 3 support.
- [ ] Add URL wrapping for streaming instead of copying.
# Completed
- [x] Basic WDL support.
- [x] Travis CI for commits.
- [x] Run Toil within Popper (https://cross.ucsc.edu/tag/popper/).
- [x] Grafana for workflow monitoring
- [x] Update the Azure jobStore.
- [x] Finish Google jobStore (GCP)
This diff is collapsed.
from __future__ import absolute_import
import os
import sys
import ast
import tempfile
import shutil
def enable_absolute_imports(script, script_name):
"""
Empty modules
>>> enable_absolute_imports('')
'from __future__ import absolute_import\\n'
Ignore empty lines
>>> enable_absolute_imports('\\n')
'from __future__ import absolute_import\\n'
Append after initial comments, like shebangs
>>> enable_absolute_imports('#foo\\n')
'#foo\\nfrom __future__ import absolute_import\\n'
Insert before regular comments
>>> enable_absolute_imports('#foo\\nimport bar\\n')
'#foo\\nfrom __future__ import absolute_import\\nimport bar\\n'
Insert before non-import statements
>>> enable_absolute_imports('if False:\\n pass\\n')
'from __future__ import absolute_import\\nif False:\\n pass\\n'
Idempotence
>>> enable_absolute_imports('from __future__ import absolute_import\\n') is None
True
Other __future__ imports
>>> enable_absolute_imports('from __future__ import print_function\\n')
'from __future__ import absolute_import\\nfrom __future__ import print_function\\n'
Insert before from ... immport statements
>>> enable_absolute_imports('from blah import fasel\\n')
'from __future__ import absolute_import\\nfrom blah import fasel\\n'
Insert before multiple future imports
>>> enable_absolute_imports('from __future__ import print_function\\nfrom __future__ import nested_scopes\\n')
'from __future__ import absolute_import\\nfrom __future__ import print_function\\nfrom __future__ import nested_scopes\\n'
Insert before wrapped multi-name future import
>>> enable_absolute_imports('from __future__ import (print_function,\\n nested_scopes)\\n')
'from __future__ import absolute_import\\nfrom __future__ import (print_function,\\n nested_scopes)\\n'
Usually docstrings show up as attributes of other nodes but unassociated docstring become
Expr nodes in the AST.
>>> enable_absolute_imports("#foo\\n\\n'''bar'''\\n\\npass")
"#foo\\n\\nfrom __future__ import absolute_import\\n'''bar'''\\n\\npass\\n"
Unassociated multiline docstring
>>> enable_absolute_imports("#foo\\n\\n'''bar\\n'''\\n\\npass")
"#foo\\n\\nfrom __future__ import absolute_import\\n'''bar\\n'''\\n\\npass\\n"
"""
tree = ast.parse(script, filename=script_name)
lines = script.split('\n')
while lines and lines[-1] == "":
lines.pop()
node = None
for child in ast.iter_child_nodes(tree):
if isinstance(child, ast.Import):
node = child
break
elif isinstance(child, ast.ImportFrom):
assert child.level == 0 # don't know what this means
if child.module == '__future__':
if any(alias.name == 'absolute_import' for alias in child.names):
return None
else:
if node is None: node = child
else:
node = child
break
if node is None:
if len(tree.body) == 0:
node = ast.stmt()
node.lineno = len(lines) + 1
else:
node = tree.body[0]
# This crazy heuristic tries to handle top-level docstrings with newlines in them
# for which lineno is the line where the docstring ends
if isinstance(node, ast.Expr) and isinstance(node.value, ast.Str):
node.lineno -= node.value.s.count('\n')
line = 'from __future__ import absolute_import'
lines.insert(node.lineno - 1, line)
lines.append("")
return '\n'.join(lines)
def main(root_path):
for dir_path, dir_names, file_names in os.walk(root_path):
for file_name in file_names:
if file_name.endswith('.py') and file_name != 'setup.py':
file_path = os.path.join(dir_path, file_name)
with open(file_path) as file:
script = file.read()
new_script = enable_absolute_imports(script, file_name)
if new_script is not None:
temp_handle, temp_file_path = tempfile.mkstemp(prefix=file_name, dir=dir_path)
try:
with os.fdopen(temp_handle, 'w') as temp_file:
temp_file.write(new_script)
except:
os.unlink(temp_file_path)
raise
else:
shutil.copymode(file_path,temp_file_path)
os.rename(temp_file_path, file_path)
if __name__ == '__main__':
main(sys.argv[1])
from __future__ import absolute_import
from argparse import ArgumentParser
import os
import logging
import random
from toil.common import Toil
from toil.job import Job
def setup(job, input_file_id, n, down_checkpoints):
"""Sets up the sort.
Returns the FileID of the sorted file
"""
# Write the input file to the file store
job.fileStore.logToMaster("Starting the merge sort")
return job.addChildJobFn(down,
input_file_id, n,
down_checkpoints=down_checkpoints,
memory='1000M').rv()
def down(job, input_file_id, n, down_checkpoints):
"""Input is a file and a range into that file to sort and an output location in which
to write the sorted file.
If the range is larger than a threshold N the range is divided recursively and
a follow on job is then created which merges back the results. Otherwise,
the file is sorted and placed in the output.
"""
# Read the file
input_file = job.fileStore.readGlobalFile(input_file_id, cache=False)
length = os.path.getsize(input_file)
if length > n:
# We will subdivide the file
job.fileStore.logToMaster("Splitting file: %s of size: %s"
% (input_file_id, length), level=logging.CRITICAL)
# Split the file into two copies
mid_point = get_midpoint(input_file, 0, length)
t1 = job.fileStore.getLocalTempFile()
with open(t1, 'w') as fH:
copy_subrange_of_file(input_file, 0, mid_point + 1, fH)
t2 = job.fileStore.getLocalTempFile()
with open(t2, 'w') as fH:
copy_subrange_of_file(input_file, mid_point + 1, length, fH)
# Call the down function recursively
return job.addFollowOnJobFn(up, job.addChildJobFn(down, job.fileStore.writeGlobalFile(t1), n,
down_checkpoints=down_checkpoints, memory='1000M').rv(),
job.addChildJobFn(down, job.fileStore.writeGlobalFile(t2), n,
down_checkpoints=down_checkpoints,
memory='1000M').rv()).rv()
else:
# We can sort this bit of the file
job.fileStore.logToMaster("Sorting file: %s of size: %s"
% (input_file_id, length), level=logging.CRITICAL)
# Sort the copy and write back to the fileStore
output_file = job.fileStore.getLocalTempFile()
sort(input_file, output_file)
return job.fileStore.writeGlobalFile(output_file)
def up(job, input_file_id_1, input_file_id_2):
"""Merges the two files and places them in the output.
"""
with job.fileStore.writeGlobalFileStream() as (fileHandle, output_id):
with job.fileStore.readGlobalFileStream(input_file_id_1) as inputFileHandle1:
with job.fileStore.readGlobalFileStream(input_file_id_2) as inputFileHandle2:
job.fileStore.logToMaster("Merging %s and %s to %s"
% (input_file_id_1, input_file_id_2, output_id))
merge(inputFileHandle1, inputFileHandle2, fileHandle)
# Cleanup up the input files - these deletes will occur after the completion is successful.
job.fileStore.deleteGlobalFile(input_file_id_1)
job.fileStore.deleteGlobalFile(input_file_id_2)
return output_id
# convenience functions
def sort(in_file, out_file):
"""Sorts the given file.
"""
filehandle = open(in_file, 'r')
lines = filehandle.readlines()
filehandle.close()
lines.sort()
filehandle = open(out_file, 'w')
for line in lines:
filehandle.write(line)
filehandle.close()
def merge(filehandle_1, filehandle_2, output_filehandle):
"""Merges together two files maintaining sorted order.
"""
line2 = filehandle_2.readline()
for line1 in filehandle_1.readlines():
while line2 != '' and line2 <= line1:
output_filehandle.write(line2)
line2 = filehandle_2.readline()
output_filehandle.write(line1)
while line2 != '':
output_filehandle.write(line2)
line2 = filehandle_2.readline()
def copy_subrange_of_file(input_file, file_start, file_end, output_filehandle):
"""Copies the range (in bytes) between fileStart and fileEnd to the given
output file handle.
"""
with open(input_file, 'r') as fileHandle:
fileHandle.seek(file_start)
data = fileHandle.read(file_end - file_start)
assert len(data) == file_end - file_start
output_filehandle.write(data)
def get_midpoint(file, file_start, file_end):
"""Finds the point in the file to split.
Returns an int i such that fileStart <= i < fileEnd
"""
filehandle = open(file, 'r')
mid_point = (file_start + file_end) / 2
assert mid_point >= file_start
filehandle.seek(mid_point)
line = filehandle.readline()
assert len(line) >= 1
if len(line) + mid_point < file_end:
return mid_point + len(line) - 1
filehandle.seek(file_start)
line = filehandle.readline()
assert len(line) >= 1
assert len(line) + file_start <= file_end
return len(line) + file_start - 1
def make_file_to_sort(file_name, lines, line_length):
with open(file_name, 'w') as fileHandle:
for _ in xrange(lines):
line = "".join(random.choice('actgACTGNXYZ') for _ in xrange(line_length - 1)) + '\n'
fileHandle.write(line)
def main():
parser = ArgumentParser()
Job.Runner.addToilOptions(parser)
parser.add_argument('--num-lines', default=1000, help='Number of lines in file to sort.', type=int)
parser.add_argument('--line-length', default=50, help='Length of lines in file to sort.', type=int)
parser.add_argument("--N",
help="The threshold below which a serial sort function is used to sort file. "
"All lines must of length less than or equal to N or program will fail",
default=10000)
options = parser.parse_args()
if int(options.N) <= 0:
raise RuntimeError("Invalid value of N: %s" % options.N)
file_name = 'file_to_sort.txt'
make_file_to_sort(file_name=file_name, lines=options.num_lines, line_length=options.line_length)
with Toil(options) as toil:
sort_file_url = 'file://' + os.path.abspath('file_to_sort.txt')
if not toil.options.restart:
sort_file_id = toil.importFile(sort_file_url)
sorted_file_id = toil.start(Job.wrapJobFn(setup, sort_file_id, int(options.N), False, memory='1000M'))
else:
sorted_file_id = toil.restart()
toil.exportFile(sorted_file_id, sort_file_url)
if __name__ == '__main__':
main()
The MIT License (MIT)
Copyright (c) 2015 Microsoft Azure
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
# Mesos cluster with Toil
This Microsoft Azure template creates an Apache Mesos cluster with Toil on a configurable number of machines.
<a href="https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FBD2KGenomics%2Ftoil%2Fmaster%2Fcontrib%2Fazure%2Fazuredeploy.json" target="_blank">
<img src="http://azuredeploy.net/deploybutton.png"/>
</a>
Once your cluster has been created you will have a resource group containing 3 parts:
1. A set of 1 (default), 3, or 5 masters in a master specific availability set. Each master's SSH can be accessed via the public dns address at ports 2211..2215
2. A set of agents in an agent specific availability set. The agent VMs must be accessed through the master, or jumpbox
3. (Optional) a windows or linux jumpbox
The following image is an example of a cluster with 1 jumpbox, 3 masters, and 3 agents:
![Image of Mesos cluster on azure](images/mesos.png)
You can see the following parts:
1. **Mesos on port 5050** - Mesos is the distributed systems kernel that abstracts cpu, memory and other resources, and offers these to services named "frameworks" for scheduling of workloads. **Note that Mesos masters only listen on the 10.0.0.0/18 subnet**. In particular, **note that Mesos does not listen on localhost**. If you run a Toil job on the master, you will want to pass `--batchSystem mesos --mesosMaster 10.0.0.5:5050`.
2. **Docker on port 2375** - The Docker engine runs containerized workloads and each Master and Agent run the Docker engine. Mesos runs Docker workloads, and examples on how to do this are provided in the Marathon and Chronos walkthrough sections of this readme.
3. **(Optional) Marathon on port 8080** - Marathon is a scheduler for Mesos that is equivalent to init on a single linux machine: it schedules long running tasks for the whole cluster. The Swarm framework is disabled by default.
4. **(Optional) Chronos on port 4400** - Chronos is a scheduler for Mesos that is equivalent to cron on a single linux machine: it schedules periodic tasks for the whole cluster. The Swarm framework is disabled by default.
5. **(Optional) Swarm on port 2376** - Swarm is an experimental framework from Docker used for scheduling docker style workloads. The Swarm framework is disabled by default because it has [a showstopper bug where it grabs all the resources](https://github.com/docker/swarm/issues/1183). As a workaround, you will notice in the walkthrough below, you can run your Docker workloads in Marathon and Chronos.
All VMs are on the same private subnet, 10.0.0.0/18, and fully accessible to each other. The masters start at 10.0.0.5, and the agents at 10.0.0.50.
# Installation Notes
Here are notes for troubleshooting:
* the installation log for the linux jumpbox, masters, and agents are in /var/log/azure/cluster-bootstrap.log
* event though the VMs finish quickly Mesos can take 5-15 minutes to install, check /var/log/azure/cluster-bootstrap.log for the completion status.
* the linux jumpbox is based on https://github.com/Azure/azure-quickstart-templates/tree/master/ubuntu-desktop and will take 1 hour to configure. Visit https://github.com/Azure/azure-quickstart-templates/tree/master/ubuntu-desktop to learn how to know when setup is completed, and then how to access the desktop via VNC and an SSH tunnel.
# Template Parameters
When you launch the installation of the cluster, you need to specify the following parameters:
* `newStorageAccountNamePrefix`: make sure this is a unique identifier. Azure Storage's accounts are global so make sure you use a prefix that is unique to your account otherwise there is a good change it will clash with names already in use.
* `adminUsername`: self-explanatory. This is the account used on all VMs in the cluster including the jumpbox
* `adminPassword`: self-explanatory
* `dnsNameForMastersPublicIP`: this is the public DNS name for the public IP that the masters sit behind. If you don't set up a jumpbox, you will probably SSH to port 2211 at this hostname. You just need to specify an unique name, the FQDN will be created by adding the necessary subdomains based on where the cluster is going to be created. For example, you might specify <userID>ToilCluster, and Azure will add something like .westus.cloudapp.azure.com to create the FQDN for the cluster.
* `dnsNameForJumpboxPublicIP`: this is the public DNS name for the jumpbox, a more full-featured host on the cluster network that can be used for debugging or a GUI. It is only consulted if a jumpbox is to be used, so if you aren't making one just fill it in with a dummy value.
* `agentCount`: the number of Mesos Agents that you want to create in the cluster
* `masterCount`: Number of Masters. Currently the template supports 3 configurations: 1, 3 and 5 Masters cluster configuration.
* `jumpboxConfiguration`: You can choose if you want a jumpbox, and if so, whether the jumpbox should be Windows or Linux.
* `masterConfiguration`: You can specify if you want Masters to be Agents as well. This is a Mesos supported configuration. By default, Masters will not be used to run workloads.
* `agentVMSize`: The type of VM that you want to use for each node in the cluster. The default size is A5 (2 cores, 14GB RAM) but you can change that (perhaps to one of the G-type instances) if you expect to run workloads that require more RAM or CPU resources.
* `masterVMSize`: size of the master machines; the default is A5 (2 cores, 14GB RAM)
* `jumpboxVMSize`: size of the jumpbox machine, if used; the default is A5 (2 cores, 14GB RAM)
* `clusterPrefix`: this is the prefix that will be used to create all VM names. You can use the prefix to easily identify the machines that belongs to a specific cluster. If, for instance, prefix is 'c1', machines will be created as c1master1, c1master2, ...c1agent1, c1agent5, ...
* `swarmEnabled`: true if you want to enable Swarm as a framework on the cluster
* `marathonEnabled`: true if you want to enable the Marathon framework on the cluster
* `chronosEnabled`: true if you want to enable the Chronos framework on the cluster
* `toilEnabled`: false if you want to disable the Toil framework on the cluster
# Questions
**Q.** Does this cluster have a shared filesystem? Can I use the `fileJobStore`?
**A.** No. You should probably try out the `azureJobStore`.
**Q.** My tasks on the agents can't connect to the `azureJobStore`!
**A.** That's not a question. But, make sure either that you have distributed a `.toilAzureCredentials` file to each agent manually, or that you set the `AZURE_ACCOUNT_KEY` environment variable for your Toil master and are running a version of Toil that makes the master environment available at worker startup.
**Q.** How do I get to the Mesos web UI?
**A.** If you set up a jumpbox, run your browser on the juimpbox, and browse to http://10.0.0.5:5050/. If you did not set up a jumpbox, the easiest way is to use SSH port forwarding as a SOCKS proxy. From your local machine, run `ssh <youruser>@<yourcluster>.<yourzone>.cloudapp.azure.com -p 2211 -D8080`, and then set your browser to use the SOCKS proxy this creates at `localhost:8080` (perhaps with a proxy switcher extension). Then, browse to http://10.0.0.5:5050/. Make sure to turn off the proxy when you close your SSH session, or your browser won't be able to load any pages. Also note that this routes all your web traffic through Azure.
**Q.** How do I get to the slave pages in the Mesos web UI?
**A.** You're probably using the proxy method above. Make sure your browser is doing DNS through the proxy. The Mesos web UI expects to be able to resolve names like `yourclusteragent1`, which you probably can't do on your machine. In Firefox, you need to set `network.proxy.socks_remote_dns` to `true` in `about:config`.
**Q.** My cluster just completed but Mesos is not up/has no slaves!
**A.** That's not a question either. But after your template finishes, your cluster is still running installation. That installation might fail, if bugs still exist in the setup code or if needed Internet resources disappear. You can run "tail -f /var/log/azure/cluster-bootstrap.log" on each master or agent to see how the installation is going and what might have gone wrong.
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json# ",
"contentVersion": "1.0.0.0",
"parameters": {
"newStorageAccountNamePrefix": {
"type": "string",
"metadata": {
"description": "Unique DNS Name Prefix for the Storage Account where the Virtual Machine's disks will be placed. StorageAccounts may contain at most variables('vmsPerStorageAccount')"
}
},
"adminUsername": {
"type": "string",
"metadata": {
"description": "Username for the Virtual Machine."
}
},
"adminPassword": {
"type": "securestring",
"metadata": {
"description": "Password for the Virtual Machine."
}
},
"agentVMSize": {
"type": "string",
"metadata": {
"description": "The VM role size of the agent node(s)"
}
},
"agentCount": {
"type": "int",
"metadata": {
"description": "The count of agent nodes"
}
},
"masterCount": {
"type": "int",
"metadata": {
"description": "The count of master nodes"
}
},
"subnetPrefix": {
"type": "string",
"metadata": {
"description": "The network subnet"
}
},
"subnetRef": {
"type": "string",
"metadata": {
"description": "The network subnet reference"
}
},
"agentFirstAddr": {
"type": "int",
"metadata": {
"description": "The value of the 4th IPv4 octet of the first agent within the subnet"
}
},
"masterVMNamePrefix": {
"type": "string",
"metadata": {
"description": "The vm name prefix of the master"
}
},
"agentVMNamePrefix": {
"type": "string",
"metadata": {
"description": "The vm name prefix of the agent"
}
},
"osImagePublisher": {
"type": "string",
"metadata": {
"description": "The publisher name to identify the OS image."
}
},
"osImageOffer": {
"type": "string",
"metadata": {
"description": "The offer name to identify the OS image."
}
},
"osImageSKU": {
"type": "string",
"metadata": {
"description": "The sku to identify the OS image."
}
},
"osImageVersion": {
"type": "string",
"metadata": {
"description": "The version to identify the OS image."
}
},
"customScriptLocation": {
"type": "string",
"metadata": {
"description": "The github location for the shell scripts."
}
},
"swarmEnabled": {
"type": "string",
"metadata": {
"description": "Flag for enabling the Docker Swarm framework."
}
},
"marathonEnabled": {
"type": "string",
"metadata": {
"description": "Flag for enabling the Marathon framework."
}
},
"chronosEnabled": {
"type": "string",
"metadata": {
"description": "Flag for enabling the Chronos framework."
}
},
"toilEnabled": {
"type": "string",
"metadata": {
"description": "Flag for enabling the Toil framework."
}
},
"sshRSAPublicKey": {
"type": "string",
"metadata": {
"description": "Configure all linux machines with the SSH rsa public key string. Use 'disabled' to not configure access with SSH rsa public key."
}
},
"githubSource": {
"type": "string",
"metadata": {
"description": "User and repo name on Github to pull cluster setup scripts and Toil from."
}
},
"githubBranch": {
"type": "string",
"metadata": {
"description": "Branch on Github to pull cluster setup scripts and Toil from."
}
},
"pythonPackages": {
"type": "string",
"metadata": {
"description": "Extra Python package specifiers to install, like 'pysam>=1.0'. Space separated."
}
},
"omsStorageAccount": {
"type": "string",
"metadata": {
"description": "The storage account for OMS log data."
}
},
"omsStorageAccountKey": {
"type": "securestring",
"metadata": {
"description": "The storage account primary or secondary key for OMS log data."
}
}
},
"variables": {},
"resources": []
}
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json# ",
"contentVersion": "1.0.0.0",
"parameters": {
"newStorageAccountNamePrefix": {
"type": "string",
"metadata": {
"description": "Unique DNS Name Prefix for the Storage Account where the Virtual Machine's disks will be placed. StorageAccounts may contain at most variables('vmsPerStorageAccount')"
}
},
"adminUsername": {
"type": "string",
"metadata": {
"description": "Username for the Virtual Machine."
}
},
"adminPassword": {
"type": "securestring",
"metadata": {
"description": "Password for the Virtual Machine."
}
},
"agentVMSize": {
"type": "string",
"metadata": {
"description": "The VM role size of the agent node(s)"
}
},
"agentCount": {
"type": "int",
"metadata": {
"description": "The count of agent nodes"
}
},
"masterCount": {
"type": "int",
"metadata": {
"description": "The count of master nodes"
}
},
"subnetPrefix": {
"type": "string",
"metadata": {
"description": "The network subnet"
}
},
"subnetRef": {
"type": "string",
"metadata": {
"description": "The network subnet reference"
}
},
"agentFirstAddr": {
"type": "int",
"metadata": {
"description": "The value of the 4th IPv4 octet of the first agent within the subnet"
}
},
"masterVMNamePrefix": {
"type": "string",
"metadata": {
"description": "The vm name prefix of the master"
}
},
"agentVMNamePrefix": {
"type": "string",
"metadata": {
"description": "The vm name prefix of the agent"
}
},
"osImagePublisher": {
"type": "string",
"metadata": {
"description": "The publisher name to identify the OS image."
}
},
"osImageOffer": {
"type": "string",
"metadata": {
"description": "The offer name to identify the OS image."
}
},
"osImageSKU": {
"type": "string",
"metadata": {
"description": "The sku to identify the OS image."
}
},
"osImageVersion": {
"type": "string",
"metadata": {
"description": "The version to identify the OS image."
}
},
"customScriptLocation": {
"type": "string",
"metadata": {
"description": "The github location for the shell scripts."
}
},
"swarmEnabled": {
"type": "string",
"metadata": {
"description": "Flag for enabling the Docker Swarm framework."
}
},
"marathonEnabled": {
"type": "string",
"metadata": {
"description": "Flag for enabling the Marathon framework."
}
},
"chronosEnabled": {
"type": "string",
"metadata": {
"description": "Flag for enabling the Chronos framework."
}
},
"toilEnabled": {
"type": "string",
"metadata": {
"description": "Flag for enabling the Toil framework."
}
},
"sshRSAPublicKey": {
"type": "string",
"metadata": {
"description": "Configure all linux machines with the SSH rsa public key string. Use 'disabled' to not configure access with SSH rsa public key."
}
},
"githubSource": {
"type": "string",
"metadata": {
"description": "User and repo name on Github to pull cluster setup scripts and Toil from."
}
},
"githubBranch": {
"type": "string",
"metadata": {
"description": "Branch on Github to pull cluster setup scripts and Toil from."
}
},
"pythonPackages": {
"type": "string",
"metadata": {
"description": "Extra Python package specifiers to install, like 'pysam>=1.0'. Space separated."
}
},
"omsStorageAccount": {
"type": "string",
"metadata": {
"description": "The storage account for OMS log data."
}
},
"omsStorageAccountKey": {
"type": "securestring",
"metadata": {
"description": "The storage account primary or secondary key for OMS log data."
}
}
},
"variables": {
"availabilitySet": "agentAvailabilitySet",
"vmsPerStorageAccount": 2,
"storageAccountsCount": "[add(div(parameters('agentCount'), variables('vmsPerStorageAccount')), mod(add(mod(parameters('agentCount'), variables('vmsPerStorageAccount')),2), add(mod(parameters('agentCount'), variables('vmsPerStorageAccount')),1)))]",
"storageAccountPrefix": [
"0","6","c","i","o","u","1","7","d","j","p","v",
"2","8","e","k","q","w","3","9","f","l","r","x",
"4","a","g","m","s","y","5","b","h","n","t","z"
],
"storageAccountPrefixCount": "[length(variables('storageAccountPrefix'))]",
"storageAccountType": "Standard_GRS",
"agentsPerIPv4Octet": 200,
"storageAccountPrefixCount": "[length(variables('storageAccountPrefix'))]",
"wgetCommandPrefix": "[concat('wget --tries 20 --retry-connrefused --waitretry=15 -qO- ', parameters('customScriptLocation'), 'configure-mesos-cluster.sh | nohup /bin/bash -s ')]",
"wgetCommandPostfix": " >> /var/log/azure/cluster-bootstrap.log 2>&1 &'",
"commandPrefix": "/bin/bash -c '"
},
"resources": [
{
"type": "Microsoft.Storage/storageAccounts",
"name": "[concat(variables('storageAccountPrefix')[mod(copyIndex(),variables('storageAccountPrefixCount'))],variables('storageAccountPrefix')[div(copyIndex(),variables('storageAccountPrefixCount'))],parameters('newStorageAccountNamePrefix'),copyIndex(1))]",
"apiVersion": "2015-05-01-preview",
"location": "[resourceGroup().location]",
"copy": {
"name": "vmLoopNode",
"count": "[variables('storageAccountsCount')]"
},
"properties": {
"accountType": "[variables('storageAccountType')]"
}
},
{
"apiVersion": "2015-06-15",
"type": "Microsoft.Compute/availabilitySets",
"name": "[variables('availabilitySet')]",
"location": "[resourceGroup().location]",
"properties": {}
},
{
"apiVersion": "2015-06-15",
"type": "Microsoft.Network/networkInterfaces",
"name": "[concat(parameters('agentVMNamePrefix'), copyIndex(1), '-nic')]",
"location": "[resourceGroup().location]",
"copy": {
"name": "nicLoopNode",
"count": "[parameters('agentCount')]"
},
"properties": {
"ipConfigurations": [
{
"name": "ipConfigNode",
"properties": {
"privateIPAllocationMethod": "Static",
"privateIPAddress": "[concat(split(parameters('subnetPrefix'),'0.0/18')[0], div(copyIndex(),variables('agentsPerIPv4Octet')), '.', add(mod(copyIndex(),variables('agentsPerIPv4Octet')), parameters('agentFirstAddr')))]",
"subnet": {
"id": "[parameters('subnetRef')]"
}
}
}
]
}
},
{
"apiVersion": "2015-06-15",
"type": "Microsoft.Compute/virtualMachines",
"name": "[concat(parameters('agentVMNamePrefix'), copyIndex(1))]",
"location": "[resourceGroup().location]",
"copy": {
"name": "vmLoopNode",
"count": "[parameters('agentCount')]"
},
"dependsOn": [
"[concat('Microsoft.Storage/storageAccounts/', variables('storageAccountPrefix')[mod(div(copyIndex(),variables('vmsPerStorageAccount')),variables('storageAccountPrefixCount'))],variables('storageAccountPrefix')[div(div(copyIndex(),variables('vmsPerStorageAccount')),variables('storageAccountPrefixCount'))],parameters('newStorageAccountNamePrefix'),add(1,div(copyIndex(),variables('vmsPerStorageAccount'))))]",
"[concat('Microsoft.Network/networkInterfaces/', parameters('agentVMNamePrefix'), copyIndex(1), '-nic')]",
"[concat('Microsoft.Compute/availabilitySets/', variables('availabilitySet'))]"
],
"properties": {
"availabilitySet": {
"id": "[resourceId('Microsoft.Compute/availabilitySets',variables('availabilitySet'))]"
},
"hardwareProfile": {
"vmSize": "[parameters('agentVMSize')]"
},
"osProfile": {
"computername": "[concat(parameters('agentVMNamePrefix'), copyIndex(1))]",
"adminUsername": "[parameters('adminUsername')]",
"adminPassword": "[parameters('adminPassword')]"
},
"storageProfile": {
"imageReference": {
"publisher": "[parameters('osImagePublisher')]",
"offer": "[parameters('osImageOffer')]",
"sku": "[parameters('osImageSKU')]",
"version": "[parameters('osImageVersion')]"
},
"osDisk": {
"name": "[concat(parameters('agentVMNamePrefix'), copyIndex(1),'-osdisk')]",
"vhd": {
"uri": "[concat('http://',variables('storageAccountPrefix')[mod(div(copyIndex(),variables('vmsPerStorageAccount')),variables('storageAccountPrefixCount'))],variables('storageAccountPrefix')[div(div(copyIndex(),variables('vmsPerStorageAccount')),variables('storageAccountPrefixCount'))],parameters('newStorageAccountNamePrefix'),add(1,div(copyIndex(),variables('vmsPerStorageAccount'))), '.blob.core.windows.net/vhds/', parameters('masterVMNamePrefix'), copyIndex(1), '-osdisk.vhd')]"
},
"caching": "ReadWrite",
"createOption": "FromImage"
}
},
"networkProfile": {
"networkInterfaces": [
{
"id": "[resourceId('Microsoft.Network/networkInterfaces',concat(parameters('agentVMNamePrefix'), copyIndex(1), '-nic'))]"
}
]
}
}
},
{
"type": "Microsoft.Compute/virtualMachines/extensions",
"name": "[concat(parameters('agentVMNamePrefix'), copyIndex(1), '/configureagent')]",
"apiVersion": "2015-06-15",
"location": "[resourceGroup().location]",
"copy": {
"name": "vmLoopNode",
"count": "[parameters('agentCount')]"
},
"dependsOn": [
"[concat('Microsoft.Compute/virtualMachines/', parameters('agentVMNamePrefix'), copyIndex(1))]"
],
"properties": {
"publisher": "Microsoft.OSTCExtensions",
"type": "CustomScriptForLinux",
"typeHandlerVersion": "1.3",
"settings": {
"fileUris": [],
"commandToExecute": "[concat(variables('commandPrefix'), variables('wgetCommandPrefix'), parameters('masterCount'), ' slaveconfiguration ', parameters('masterVMNamePrefix'), ' ', parameters('swarmEnabled'), ' ', parameters('marathonEnabled'), ' ', parameters('chronosEnabled'), ' ', parameters('toilEnabled'), ' ', parameters('omsStorageAccount'), ' ', parameters('omsStorageAccountKey'), ' ', parameters('adminUsername'), ' \"', parameters('sshRSAPublicKey'), '\" ', parameters('githubSource'), ' ', parameters('githubBranch'), ' \"', parameters('pythonPackages'), '\" ', variables('wgetCommandPostfix'))]"
}
}
}
]
}
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"adminUsername": {
"type": "string",
"defaultValue": "azureuser",
"metadata": {
"description": "User name for the Virtual Machines."
}
},
"adminPassword": {
"type": "securestring",
"metadata": {
"description": "Password for the Virtual Machines."
}
},
"dnsNameForMastersPublicIP": {
"type": "string",
"metadata": {
"description": "Sets the Domain name label for the masters. The concatenation of the domain name label and the regionalized DNS zone make up the fully qualified domain name (like mycluster.centralus.cloudapp.azure.com) associated with the public IP address."
}
},
"jumpboxConfiguration": {
"type": "string",
"defaultValue": "none",
"allowedValues": [
"none",
"linux",
"windows"
],
"metadata": {
"description": "Choose to have a jumpbox for debugging on the private subnet."
}
},
"dnsNameForJumpboxPublicIP": {
"type": "string",
"metadata": {
"description": "Sets the Domain name label for the jumpbox, if used. The concatenation of the domain name label and the regionalized DNS zone make up the fully qualified domain name (like myjumpbox.centralus.cloudapp.azure.com) associated with the public IP address."
}
},
"newStorageAccountNamePrefix": {
"type": "string",
"metadata": {
"description": "Globally Unique DNS Name Prefix for the Storage Account where the Virtual Machine's disks will be placed. An integer will be appended to the end. StorageAccounts may contain at most variables('vmsPerStorageAccount')"
}
},
"agentCount": {
"type": "int",
"defaultValue": 1,
"metadata": {
"description": "The number of Mesos agents for the cluster."
}
},
"agentVMSize": {
"type": "string",
"defaultValue": "Standard_A5",
"allowedValues": [
"Standard_A1",
"Standard_A5",
"Standard_A6",
"Standard_A7",
"Standard_G1",
"Standard_G2",
"Standard_G3",
"Standard_G4",
"Standard_G5"
],
"metadata": {
"description": "The size of the Virtual Machine."
}
},
"masterCount": {
"type": "int",
"defaultValue": 1,
"allowedValues": [
1,
3,
5
],
"metadata": {
"description": "The number of Mesos masters for the cluster."
}
},
"masterVMSize": {
"type": "string",
"defaultValue": "Standard_A5",
"allowedValues": [
"Standard_A1",
"Standard_A5",
"Standard_A6",
"Standard_A7",
"Standard_G1",
"Standard_G2",
"Standard_G3",
"Standard_G4",
"Standard_G5"
],
"metadata": {
"description": "The size of the Virtual Machine for the master."
}
},
"masterConfiguration": {
"type": "string",
"defaultValue": "masters-are-not-agents",
"allowedValues": [
"masters-are-agents",
"masters-are-not-agents"
],
"metadata": {
"description": "Specify whether masters should act as agents or not."
}
},
"jumpboxVMSize": {
"type": "string",
"defaultValue": "Standard_A5",
"allowedValues": [
"Standard_A1",
"Standard_A5",
"Standard_A6",
"Standard_A7",
"Standard_G1",
"Standard_G2",
"Standard_G3",
"Standard_G4",
"Standard_G5"
],
"metadata": {
"description": "The size of the Virtual Machine for the jumpbox."
}
},
"clusterPrefix": {
"type": "string",
"defaultValue": "c1",
"metadata": {
"description": "The prefix to identify the cluster."
}
},
"swarmEnabled": {
"type": "string",
"defaultValue": "false",
"allowedValues": [
"true",
"false"
],
"metadata": {
"description": "Flag for enabling the Docker Swarm framework."
}
},
"marathonEnabled": {
"type": "string",
"defaultValue": "false",
"allowedValues": [
"true",
"false"
],
"metadata": {
"description": "Flag for enabling the Marathon framework."
}
},
"chronosEnabled": {
"type": "string",
"defaultValue": "false",
"allowedValues": [
"true",
"false"
],
"metadata": {
"description": "Flag for enabling the Chronos framework."
}
},
"toilEnabled": {
"type": "string",
"defaultValue": "true",
"allowedValues": [
"true",
"false"
],
"metadata": {
"description": "Flag for enabling the Toil framework."
}
},
"sshRSAPublicKey": {
"type": "string",
"defaultValue": "disabled",
"metadata": {
"description": "Configure all linux machines with the SSH rsa public key string. Use 'disabled' to not configure access with SSH rsa public key."
}
},
"githubSource": {
"type": "string",
"defaultValue": "BD2KGenomics/toil",
"metadata": {
"description": "User and repo name on Github to pull cluster setup scripts and Toil from."
}
},
"githubBranch": {
"type": "string",
"defaultValue": "master",
"metadata": {
"description": "Branch on Github to pull cluster setup scripts and Toil from."
}
},
"pythonPackages": {
"type": "string",
"defaultValue": "",
"metadata": {
"description": "Extra Python package specifiers to install, like 'pysam>=1.0'. Space separated."
}
}
},
"variables": {
"masterVMNamePrefix": "[concat(parameters('clusterPrefix'),'master')]",
"agentVMNamePrefix": "[concat(parameters('clusterPrefix'),'agent')]",
"jumpboxVMNamePrefix": "[concat(parameters('clusterPrefix'),'jumpbox')]",
"osImagePublisher": "Canonical",
"osImageOffer": "UbuntuServer",
"osImageSKU": "14.04.3-LTS",
"osImageVersion": "latest",
"virtualNetworkName": "VNET",
"vnetID": "[resourceId('Microsoft.Network/virtualNetworks',variables('virtualNetworkName'))]",
"subnetName": "Subnet",
"subnetRef": "[concat(variables('vnetID'),'/subnets/',variables('subnetName'))]",
"addressPrefix": "10.0.0.0/16",
"subnetPrefix": "10.0.0.0/18",
"jumpboxAddr": 4,
"masterFirstAddr": 5,
"agentFirstAddr": 50,
"nsgName": "node-nsg",
"nsgID": "[resourceId('Microsoft.Network/networkSecurityGroups',variables('nsgName'))]",
"storageAccountType": "Standard_GRS",
"customScriptLocation": "[concat('https://raw.githubusercontent.com/', parameters('githubSource'), '/', parameters('githubBranch'), '/contrib/azure/')]",
"agentFiles": [
"agent-0.json",
"agent-gt0.json"
],
"agentFile": "[variables('agentFiles')[mod(add(parameters('agentCount'),2),add(parameters('agentCount'),1))]]",
"omsStorageAccount": "none",
"omsStorageAccountKey": "none"
},
"resources": [
{
"type": "Microsoft.Storage/storageAccounts",
"name": "[concat(parameters('newStorageAccountNamePrefix'),'0')]",
"apiVersion": "2015-05-01-preview",
"location": "[resourceGroup().location]",
"properties": {
"accountType": "[variables('storageAccountType')]"
}
},
{
"apiVersion": "2015-06-15",
"type": "Microsoft.Network/networkSecurityGroups",
"name": "[variables('nsgName')]",
"location": "[resourceGroup().location]",
"properties": {
"securityRules": [
{
"name": "ssh",
"properties": {
"description": "SSH",
"protocol": "Tcp",
"sourcePortRange": "*",
"destinationPortRange": "22",
"sourceAddressPrefix": "*",
"destinationAddressPrefix": "*",
"access": "Allow",
"priority": 200,
"direction": "Inbound"
}
},
{
"name": "rdp",
"properties": {
"description": "RDP",
"protocol": "Tcp",
"sourcePortRange": "*",
"destinationPortRange": "3389",
"sourceAddressPrefix": "*",
"destinationAddressPrefix": "*",
"access": "Allow",
"priority": 300,
"direction": "Inbound"
}
}
]
}
},
{
"apiVersion": "2015-06-15",
"type": "Microsoft.Network/virtualNetworks",
"name": "[variables('virtualNetworkName')]",
"location": "[resourceGroup().location]",
"dependsOn": [
"[variables('nsgID')]"
],
"properties": {
"addressSpace": {
"addressPrefixes": [
"[variables('addressPrefix')]"
]
},
"subnets": [
{
"name": "[variables('subnetName')]",
"properties": {
"addressPrefix": "[variables('subnetPrefix')]",
"networkSecurityGroup": {
"id": "[variables('nsgID')]"
}
}
}
]
}
},
{
"apiVersion": "2015-01-01",
"type": "Microsoft.Resources/deployments",
"name": "createMasterNodes",
"dependsOn": [
"[concat('Microsoft.Storage/storageAccounts/', parameters('newStorageAccountNamePrefix'), '0')]",
"[variables('vnetID')]"
],
"properties": {
"mode": "Incremental",
"templateLink": {
"uri": "[concat(variables('customScriptLocation'), 'master.json')]",
"contentVersion": "1.0.0.0"
},
"parameters": {
"newStorageAccountName": {
"value": "[concat(parameters('newStorageAccountNamePrefix'), '0')]"
},
"adminUsername": {
"value": "[parameters('adminUsername')]"
},
"adminPassword": {
"value": "[parameters('adminPassword')]"
},
"dnsNameForMastersPublicIP": {
"value": "[parameters('dnsNameForMastersPublicIP')]"
},
"masterVMSize": {
"value": "[parameters('masterVMSize')]"
},
"masterCount": {
"value": "[parameters('masterCount')]"
},
"masterConfiguration": {
"value": "[parameters('masterConfiguration')]"
},
"subnetPrefix": {
"value": "[variables('subnetPrefix')]"
},
"subnetRef": {
"value": "[variables('subnetRef')]"
},
"masterFirstAddr": {
"value": "[variables('masterFirstAddr')]"
},
"masterVMNamePrefix": {
"value": "[variables('masterVMNamePrefix')]"
},
"osImagePublisher": {
"value": "[variables('osImagePublisher')]"
},
"osImageOffer": {
"value": "[variables('osImageOffer')]"
},
"osImageSKU": {
"value": "[variables('osImageSKU')]"
},
"osImageVersion": {
"value": "[variables('osImageVersion')]"
},
"customScriptLocation": {
"value": "[variables('customScriptLocation')]"
},
"swarmEnabled": {
"value": "[parameters('swarmEnabled')]"
},
"marathonEnabled": {
"value": "[parameters('marathonEnabled')]"
},
"chronosEnabled": {
"value": "[parameters('chronosEnabled')]"
},
"toilEnabled": {
"value": "[parameters('toilEnabled')]"
},
"sshRSAPublicKey": {
"value": "[parameters('sshRSAPublicKey')]"
},
"githubSource": {
"value": "[parameters('githubSource')]"
},
"githubBranch": {
"value": "[parameters('githubBranch')]"
},
"pythonPackages": {
"value": "[parameters('pythonPackages')]"
},
"omsStorageAccount": {
"value": "[variables('omsStorageAccount')]"
},
"omsStorageAccountKey": {
"value": "[variables('omsStorageAccountKey')]"
}
}
}
},
{
"apiVersion": "2015-01-01",
"type": "Microsoft.Resources/deployments",
"name": "createAgents",
"dependsOn": [
"[variables('vnetID')]"
],
"properties": {
"mode": "Incremental",
"templateLink": {
"uri": "[concat(variables('customScriptLocation'), variables('agentFile'))]",
"contentVersion": "1.0.0.0"
},
"parameters": {
"newStorageAccountNamePrefix": {
"value": "[concat(parameters('newStorageAccountNamePrefix'))]"
},
"adminUsername": {
"value": "[parameters('adminUsername')]"
},
"adminPassword": {
"value": "[parameters('adminPassword')]"
},
"agentVMSize": {
"value": "[parameters('agentVMSize')]"
},
"agentCount": {
"value": "[parameters('agentCount')]"
},
"masterCount": {
"value": "[parameters('masterCount')]"
},
"subnetPrefix": {
"value": "[variables('subnetPrefix')]"
},
"subnetRef": {
"value": "[variables('subnetRef')]"
},
"agentFirstAddr": {
"value": "[variables('agentFirstAddr')]"
},
"masterVMNamePrefix": {
"value": "[variables('masterVMNamePrefix')]"
},
"agentVMNamePrefix": {
"value": "[variables('agentVMNamePrefix')]"
},
"osImagePublisher": {
"value": "[variables('osImagePublisher')]"
},
"osImageOffer": {
"value": "[variables('osImageOffer')]"
},
"osImageSKU" : {
"value": "[variables('osImageSKU')]"
},
"osImageVersion" : {
"value": "[variables('osImageVersion')]"
},
"customScriptLocation": {
"value": "[variables('customScriptLocation')]"
},
"swarmEnabled": {
"value": "[parameters('swarmEnabled')]"
},
"marathonEnabled": {
"value": "[parameters('marathonEnabled')]"
},
"chronosEnabled": {
"value": "[parameters('chronosEnabled')]"
},
"toilEnabled": {
"value": "[parameters('toilEnabled')]"
},
"sshRSAPublicKey": {
"value": "[parameters('sshRSAPublicKey')]"
},
"githubSource": {
"value": "[parameters('githubSource')]"
},
"githubBranch": {
"value": "[parameters('githubBranch')]"
},
"pythonPackages": {
"value": "[parameters('pythonPackages')]"
},
"omsStorageAccount": {
"value": "[variables('omsStorageAccount')]"
},
"omsStorageAccountKey": {
"value": "[variables('omsStorageAccountKey')]"
}
}
}
},
{
"apiVersion": "2015-01-01",
"type": "Microsoft.Resources/deployments",
"name": "createJumpbox",
"dependsOn": [
"[concat('Microsoft.Storage/storageAccounts/', parameters('newStorageAccountNamePrefix'), '0')]",
"[variables('vnetID')]"
],
"properties": {
"mode": "Incremental",
"templateLink": {
"uri": "[concat(variables('customScriptLocation'), 'jumpbox-', parameters('jumpboxConfiguration'), '.json')]",
"contentVersion": "1.0.0.0"
},
"parameters": {
"newStorageAccountName": {
"value": "[concat(parameters('newStorageAccountNamePrefix'), '0')]"
},
"adminUsername": {
"value": "[parameters('adminUsername')]"
},
"adminPassword": {
"value": "[parameters('adminPassword')]"
},
"dnsNameForJumpboxPublicIP": {
"value": "[parameters('dnsNameForJumpboxPublicIP')]"
},
"jumpboxVMSize": {
"value": "[parameters('jumpboxVMSize')]"
},
"subnetPrefix": {
"value": "[variables('subnetPrefix')]"
},
"subnetRef": {
"value": "[variables('subnetRef')]"
},
"jumpboxAddr": {
"value": "[variables('jumpboxAddr')]"
},
"jumpboxVMNamePrefix": {
"value": "[variables('jumpboxVMNamePrefix')]"
},
"customScriptLocation": {
"value": "[variables('customScriptLocation')]"
},
"masterVMNamePrefix": {
"value": "[variables('masterVMNamePrefix')]"
},
"sshRSAPublicKey": {
"value": "[parameters('sshRSAPublicKey')]"
},
"githubSource": {
"value": "[parameters('githubSource')]"
},
"githubBranch": {
"value": "[parameters('githubBranch')]"
}
}
}
}
],
"outputs": {
"master1SSH" : {
"type" : "string",
"value": "[concat('ssh ', parameters('adminUsername'), '@', reference('createMasterNodes').outputs.masterHostname.value, ' -p 2211')]"
}
}
}
{
"newStorageAccountNamePrefix": {
"value": "mesosscalable0921g"
},
"adminUsername": {
"value": "azureuser"
},
"adminPassword": {
"value": "password1234$"
},
"dnsNameForMastersPublicIP": {
"value": "mesosscalable0921g"
},
"dnsNameForJumpboxPublicIP": {
"value": "mesosscalablejb0921g"
},
"agentCount": {
"value": 3
},
"masterCount": {
"value": 3
},
"jumpboxConfiguration": {
"value": "none"
},
"masterConfiguration": {
"value": "masters-are-not-agents"
},
"agentVMSize" : {
"value": "Standard_A5"
},
"masterVMSize" : {
"value": "Standard_A5"
},
"jumpboxVMSize": {
"value": "Standard_A5"
},
"clusterPrefix": {
"value": "c2"
},
"swarmEnabled": {
"value": "false"
},
"marathonEnabled": {
"value": "false"
},
"chronosEnabled": {
"value": "false"
},
"toilEnabled": {
"value": "true"
},
"sshRSAPublicKey": {
"value": "disabled"
},
"githubSource": {
"value": "BD2KGenomics/toil"
},
"githubBranch": {
"value": "master"
},
"pythonPackages": {
"value": ""
}
}
#!/bin/bash
###########################################################
# Configure Mesos One Box
#
# This installs the following components
# - zookeepr
# - mesos master
# - marathon
# - mesos agent
# - swarm
# - chronos
# - toil
###########################################################
set -x
echo "starting mesos cluster configuration"
date
ps ax
#############
# Parameters
#############
MASTERCOUNT=$1
MASTERMODE=$2
MASTERPREFIX=$3
SWARMENABLED=$4
MARATHONENABLED=$5
CHRONOSENABLED=$6
TOILENABLED=$7
ACCOUNTNAME=$8
set +x
ACCOUNTKEY=$9
set -x
AZUREUSER=${10}
SSHKEY=${11}
GITHUB_SOURCE=${12}
GITHUB_BRANCH=${13}
PYTHON_PACKAGES="${14}"
HOMEDIR="/home/$AZUREUSER"
VMNAME=`hostname`
VMNUMBER=`echo $VMNAME | sed 's/.*[^0-9]\([0-9]\+\)*$/\1/'`
VMPREFIX=`echo $VMNAME | sed 's/\(.*[^0-9]\)*[0-9]\+$/\1/'`
# TODO: make this configurable?
TARGET_MESOS_VERSION="0.23.0"
BINDINGS_MESOS_VERSION="0.22.0"
echo "Master Count: $MASTERCOUNT"
echo "Master Mode: $MASTERMODE"
echo "Master Prefix: $MASTERPREFIX"
echo "vmname: $VMNAME"
echo "VMNUMBER: $VMNUMBER, VMPREFIX: $VMPREFIX"
echo "SWARMENABLED: $SWARMENABLED, MARATHONENABLED: $MARATHONENABLED, CHRONOSENABLED: $CHRONOSENABLED, TOILENABLED: $TOILENABLED"
echo "ACCOUNTNAME: $ACCOUNTNAME"
echo "TARGET_MESOS_VERSION: $TARGET_MESOS_VERSION"
###################
# setup ssh access
###################
SSHDIR=$HOMEDIR/.ssh
AUTHFILE=$SSHDIR/authorized_keys
if [ `echo $SSHKEY | sed 's/^\(ssh-rsa \).*/\1/'` == "ssh-rsa" ] ; then
if [ ! -d $SSHDIR ] ; then
sudo -i -u $AZUREUSER mkdir $SSHDIR
sudo -i -u $AZUREUSER chmod 700 $SSHDIR
fi
if [ ! -e $AUTHFILE ] ; then
sudo -i -u $AZUREUSER touch $AUTHFILE
sudo -i -u $AZUREUSER chmod 600 $AUTHFILE
fi
echo $SSHKEY | sudo -i -u $AZUREUSER tee -a $AUTHFILE
else
echo "no valid key data"
fi
###################
# Common Functions
###################
ismaster ()
{
if [ "$MASTERPREFIX" == "$VMPREFIX" ]
then
return 0
else
return 1
fi
}
if ismaster ; then
echo "this node is a master"
fi
isagent()
{
if ismaster ; then
if [ "$MASTERMODE" == "masters-are-agents" ]
then
return 0
else
return 1
fi
else
return 0
fi
}
if isagent ; then
echo "this node is an agent"
fi
zkhosts()
{
zkhosts=""
for i in `seq 1 $MASTERCOUNT` ;
do
if [ "$i" -gt "1" ]
then
zkhosts="${zkhosts},"
fi
IPADDR=`getent hosts ${MASTERPREFIX}${i} | awk '{ print $1 }'`
zkhosts="${zkhosts}${IPADDR}:2181"
# due to mesos team experience ip addresses are chosen over dns names
#zkhosts="${zkhosts}${MASTERPREFIX}${i}:2181"
done
echo $zkhosts
}
zkconfig()
{
postfix="$1"
zkhosts=$(zkhosts)
zkconfigstr="zk://${zkhosts}/${postfix}"
echo $zkconfigstr
}
################
# Install Docker
################
echo "Installing and configuring docker and swarm"
for DOCKER_TRY in {1..10}
do
# Remove the config file that will mess up package installation (by
# prompting for overwrite in a weird subshell)
sudo rm -f /etc/default/docker
# Try installing docker
time wget -qO- https://get.docker.com | sh
# Start Docker and listen on :2375 (no auth, but in vnet)
echo 'DOCKER_OPTS="-H unix:///var/run/docker.sock -H 0.0.0.0:2375"' | sudo tee /etc/default/docker
# the following insecure registry is for OMS
echo 'DOCKER_OPTS="$DOCKER_OPTS --insecure-registry 137.135.93.9"' | sudo tee -a /etc/default/docker
sudo service docker restart
ensureDocker()
{
# ensure that docker is healthy
dockerHealthy=1
for i in {1..3}; do
sudo docker info
if [ $? -eq 0 ]
then
# hostname has been found continue
dockerHealthy=0
echo "Docker is healthy"
sudo docker ps -a
break
fi
sleep 10
done
if [ $dockerHealthy -ne 0 ]
then
echo "Docker is not healthy"
fi
}
ensureDocker
if [ "$dockerHealthy" == "0" ]
then
# Contrary to what you might expect, a 0 here means docker is working
# properly. Break out of the loop.
echo "Installed docker successfully."
break
fi
echo "Retrying docker install after a bit."
sleep 120
done
if [ "$dockerHealthy" == "1" ]
then
echo "WARNING: Docker could not be installed! Continuing anyway!"
fi
# Authorize the normal user to use Docker
sudo usermod -aG docker $AZUREUSER
############
# setup OMS
############
if [ $ACCOUNTNAME != "none" ]
then
set +x
EPSTRING="DefaultEndpointsProtocol=https;AccountName=${ACCOUNTNAME};AccountKey=${ACCOUNTKEY}"
docker run --restart=always -d 137.135.93.9/msdockeragentv3 http://${VMNAME}:2375 "${EPSTRING}"
set -x
fi
##################
# Install Mesos
##################
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E56151BF
DISTRO=$(lsb_release -is | tr '[:upper:]' '[:lower:]')
CODENAME=$(lsb_release -cs)
echo "deb http://repos.mesosphere.io/${DISTRO} ${CODENAME} main" | sudo tee /etc/apt/sources.list.d/mesosphere.list
# Mesos needs Marathon which asks for Oracle Java 8; we trick it with OpenJDK
time sudo add-apt-repository -y ppa:openjdk-r/ppa
time sudo apt-get -y update
# Actually install that Java
time sudo apt-get -y install openjdk-8-jre-headless
# Fix Mesos version to one that actually works with Toil
# We need to know the -ubuntuWhatever on the end of the package version we want.
FULL_MESOS_VERSION=`apt-cache policy mesos | grep "${TARGET_MESOS_VERSION}" | cut -d" " -f6`
sudo apt-get -y --force-yes install mesos=${FULL_MESOS_VERSION}
# Don't update it
sudo apt-mark hold mesos
if ismaster ; then
# Masters also need some version of Mesosphere
time sudo apt-get -y --force-yes install mesosphere
fi
#########################
# Configure ZooKeeper
#########################
zkmesosconfig=$(zkconfig "mesos")
echo $zkmesosconfig | sudo tee /etc/mesos/zk
if ismaster ; then
echo $VMNUMBER | sudo tee /etc/zookeeper/conf/myid
for i in `seq 1 $MASTERCOUNT` ;
do
IPADDR=`getent hosts ${MASTERPREFIX}${i} | awk '{ print $1 }'`
echo "server.${i}=${IPADDR}:2888:3888" | sudo tee -a /etc/zookeeper/conf/zoo.cfg
# due to mesos team experience ip addresses are chosen over dns names
#echo "server.${i}=${MASTERPREFIX}${i}:2888:3888" | sudo tee -a /etc/zookeeper/conf/zoo.cfg
done
fi
#########################################
# Configure Mesos Master and Frameworks
#########################################
if ismaster ; then
quorum=`expr $MASTERCOUNT / 2 + 1`
echo $quorum | sudo tee /etc/mesos-master/quorum
hostname -I | sed 's/ /\n/' | grep "^10." | sudo tee /etc/mesos-master/ip
hostname | sudo tee /etc/mesos-master/hostname
echo 'Mesos Cluster on Microsoft Azure' | sudo tee /etc/mesos-master/cluster
fi
if ismaster && [ "$MARATHONENABLED" == "true" ] ; then
# setup marathon
sudo mkdir -p /etc/marathon/conf
sudo cp /etc/mesos-master/hostname /etc/marathon/conf
sudo cp /etc/mesos/zk /etc/marathon/conf/master
zkmarathonconfig=$(zkconfig "marathon")
echo $zkmarathonconfig | sudo tee /etc/marathon/conf/zk
fi
#########################################
# Configure Mesos Master and Frameworks
#########################################
if ismaster ; then
# Download and install mesos-dns
sudo mkdir -p /usr/local/mesos-dns
sudo wget https://github.com/mesosphere/mesos-dns/releases/download/v0.2.0/mesos-dns-v0.2.0-linux-amd64.tgz
sudo tar zxvf mesos-dns-v0.2.0-linux-amd64.tgz
sudo mv mesos-dns-v0.2.0-linux-amd64 /usr/local/mesos-dns/mesos-dns
echo "
{
\"zk\": \"zk://127.0.0.1:2181/mesos\",
\"refreshSeconds\": 1,
\"ttl\": 0,
\"domain\": \"mesos\",
\"port\": 53,
\"timeout\": 1,
\"listener\": \"0.0.0.0\",
\"email\": \"root.mesos-dns.mesos\",
\"externalon\": false
}
" > mesos-dns.json
sudo mv mesos-dns.json /usr/local/mesos-dns/mesos-dns.json
echo "
description \"mesos dns\"
# Start just after the System-V jobs (rc) to ensure networking and zookeeper
# are started. This is as simple as possible to ensure compatibility with
# Ubuntu, Debian, CentOS, and RHEL distros. See:
# http://upstart.ubuntu.com/cookbook/#standard-idioms
start on stopped rc RUNLEVEL=[2345]
respawn
exec /usr/local/mesos-dns/mesos-dns -config /usr/local/mesos-dns/mesos-dns.json" > mesos-dns.conf
sudo mv mesos-dns.conf /etc/init
sudo service mesos-dns start
fi
#########################
# Configure Mesos Agent
#########################
if isagent ; then
# Add docker containerizer
echo "docker,mesos" | sudo tee /etc/mesos-slave/containerizers
# Add resources configuration
if ismaster ; then
echo "ports:[1-21,23-4399,4401-5049,5052-8079,8081-32000]" | sudo tee /etc/mesos-slave/resources
else
echo "ports:[1-21,23-5050,5052-32000]" | sudo tee /etc/mesos-slave/resources
fi
# Our hostname may not resolve yet, so we look at our IPs and find the 10.
# address instead
hostname -I | sed 's/ /\n/' | grep "^10." | sudo tee /etc/mesos-slave/ip
hostname | sudo tee /etc/mesos-slave/hostname
# Mark the node as non-preemptable so Toil won't complain that it doesn't know
# whether the node is preemptable or not
echo "preemptable:False" | sudo tee /etc/mesos-slave/attributes
# Set up the Mesos salve work directory in the ephemeral /mnt
echo "/mnt" | sudo tee /etc/mesos-slave/work_dir
# Set the root reserved fraction of that device to 0 to work around
# <https://github.com/BD2KGenomics/toil/issues/1650> and
# <https://issues.apache.org/jira/browse/MESOS-7420>
sudo tune2fs -m 0 `findmnt --target /mnt -n -o SOURCE`
# Add mesos-dns IP addresses at the top of resolv.conf
RESOLV_TMP=resolv.conf.temp
rm -f $RESOLV_TMP
for i in `seq $MASTERCOUNT` ; do
echo nameserver `getent hosts ${MASTERPREFIX}${i} | awk '{ print $1 }'` >> $RESOLV_TMP
done
cat /etc/resolv.conf >> $RESOLV_TMP
mv $RESOLV_TMP /etc/resolv.conf
fi
##############################################
# configure init rules restart all processes
##############################################
echo "(re)starting mesos and framework processes"
if ismaster ; then
sudo service zookeeper restart
sudo service mesos-master start
if [ "$MARATHONENABLED" == "true" ] ; then
sudo service marathon start
fi
if [ "$CHRONOSENABLED" == "true" ] ; then
sudo service chronos start
fi
else
echo manual | sudo tee /etc/init/zookeeper.override
sudo service zookeeper stop
echo manual | sudo tee /etc/init/mesos-master.override
sudo service mesos-master stop
fi
if isagent ; then
echo "starting mesos-slave"
sudo service mesos-slave start
echo "completed starting mesos-slave with code $?"
else
echo manual | sudo tee /etc/init/mesos-slave.override
sudo service mesos-slave stop
fi
echo "processes after restarting mesos"
ps ax
# Run swarm manager container on port 2376 (no auth)
if [ ismaster ] && [ "$SWARMENABLED" == "true" ] ; then
echo "starting docker swarm"
echo "sleep to give master time to come up"
sleep 10
echo sudo docker run -d -e SWARM_MESOS_USER=root \
--restart=always \
-p 2376:2375 -p 3375:3375 swarm manage \
-c mesos-experimental \
--cluster-opt mesos.address=0.0.0.0 \
--cluster-opt mesos.port=3375 $zkmesosconfig
sudo docker run -d -e SWARM_MESOS_USER=root \
--restart=always \
-p 2376:2375 -p 3375:3375 swarm manage \
-c mesos-experimental \
--cluster-opt mesos.address=0.0.0.0 \
--cluster-opt mesos.port=3375 $zkmesosconfig
sudo docker ps
echo "completed starting docker swarm"
fi
echo "processes at end of script"
ps ax
echo "Finished installing and configuring docker and swarm"
###############################################
# Install Toil
###############################################
if [ "$TOILENABLED" == "true" ] ; then
# Upgrade Python to 2.7.latest
sudo apt-add-repository -y ppa:fkrull/deadsnakes-python2.7
sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
time sudo apt-get -y update
# Install Toil dependencies (and setuptools for easy_install)
time sudo apt-get -y --force-yes install python2.7 python2.7-dev python2.7-dbg python-setuptools build-essential git gcc-4.9 gdb
# Get a reasonably new pip
time sudo easy_install pip
# Upgrade setuptools
time sudo pip install setuptools --upgrade
# Install Toil from Git, retrieving the correct version. If you want a release
# you might be able to use a tag here instead.
echo "Installing branch ${GITHUB_BRANCH} of ${GITHUB_SOURCE} for Toil."
time sudo pip install --pre "git+https://github.com/${GITHUB_SOURCE}@${GITHUB_BRANCH}#egg=toil[mesos,azure]"
# Toil no longer attempts to actually install Mesos's Python bindings itself,
# so we have to do it. First we need the Mesos dependencies.
time sudo pip install protobuf==2.6.1
# Install the right bindings for the Mesos we installed
UBUNTU_VERSION=`lsb_release -rs`
sudo easy_install https://pypi.python.org/packages/source/m/mesos.interface/mesos.interface-${BINDINGS_MESOS_VERSION}.tar.gz
# Easy-install doesn't like this server's ssl for some reason.
sudo wget https://downloads.mesosphere.io/master/ubuntu/${UBUNTU_VERSION}/mesos-${BINDINGS_MESOS_VERSION}-py2.7-linux-x86_64.egg
sudo easy_install mesos-${BINDINGS_MESOS_VERSION}-py2.7-linux-x86_64.egg
sudo rm mesos-${BINDINGS_MESOS_VERSION}-py2.7-linux-x86_64.egg
fi
if [ "x${PYTHON_PACKAGES}" != "x" ] ; then
# Install additional Python packages
time sudo pip install --upgrade ${PYTHON_PACKAGES}
fi
date
echo "completed mesos cluster configuration"