Michael R. Crusoe · Michael R. Crusoe · 6729ed4f · b0aad4a0 · b0aad4a0 · b0aad4a0
--- a/.coveragerc
+++ b/.coveragerc
-[run]
-source = snakemake
-parallel = True
-
-[report]
-omit = tests/*
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -55,16 +55,19 @@ jobs:
          source activate snakemake
          
          # run tests
+          export AWS_DEFAULT_REGION=us-east-1
+          export AWS_ACCESS_KEY_ID=${{ secrets.AWS_ACCESS_KEY_ID }}
+          export AWS_SECRET_ACCESS_KEY=${{ secrets.AWS_SECRET_ACCESS_KEY }}
          coverage run -m pytest tests/test*.py -v -x

          # collect coverage report
-          coverage combine
-          coverage xml
+          #coverage combine
+          #coverage xml

-      - name: Upload coverage report
-        uses: codecov/codecov-action@v1.0.3
-        with:
-          token: ${{secrets.CODECOV_TOKEN}}
+          #- name: Upload coverage report
+          #uses: codecov/codecov-action@v1.0.3
+          #with:
+          #token: ${{secrets.CODECOV_TOKEN}}

      - name: Build container image
        run: docker build .
--- a/CHANGELOG.rst
+++ b/CHANGELOG.rst
+[5.8.1] - 2019-11-15
+====================
+Changed
+-------
+- Fixed a bug by adding a missing module.
+
+[5.8.0] - 2019-11-15
+====================
+Added
+-----
+- Blockchain based caching between workflows (in collaboration with Sven Nahnsen from QBiC), see `the docs <https://snakemake.readthedocs.io/en/v5.8.0/executing/caching.html>`_.
+- New flag --skip-cleanup-scripts, that leads to temporary scripts (coming from script or wrapper directive) are not deleted (by Vanessa Sochat).
+Changed
+-------
+- Various bug fixes.
+
+
 [5.7.4] - 2019-10-23
 ====================
 Changed

--- a/LICENSE.md
+++ b/LICENSE.md
-Copyright (c) 2016 Johannes Köster <johannes.koester@tu-dortmund.de>
+Copyright (c) 2012-2019 Johannes Köster <johannes.koester@tu-dortmund.de>

 Permission is hereby granted, free of charge, to any person obtaining a copy of
 this software and associated documentation files (the "Software"), to deal in

--- a/README.md
+++ b/README.md
-[![CircleCI](https://circleci.com/gh/snakemake/snakemake/tree/master.svg?style=shield)](https://circleci.com/gh/snakemake/snakemake/tree/master)
+[![GitHub actions status](https://github.com/snakemake/snakemake/workflows/CI/badge.svg?branch=master)](https://github.com/snakemake/snakemake/actions?query=branch%3Amaster+workflow%3ACI)
 [![Sonarcloud Status](https://sonarcloud.io/api/project_badges/measure?project=snakemake_snakemake&metric=alert_status)](https://sonarcloud.io/dashboard?id=snakemake_snakemake)
 [![Bioconda](https://img.shields.io/conda/dn/bioconda/snakemake.svg?label=Bioconda)](https://bioconda.github.io/recipes/snakemake/README.html)
 [![Pypi](https://img.shields.io/pypi/pyversions/snakemake.svg)](https://pypi.org/project/snakemake)
@@ -11,7 +11,7 @@

 The Snakemake workflow management system is a tool to create **reproducible and scalable** data analyses.
 Workflows are described via a human readable, Python based language.
-They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition.
+They can be seamlessly scaled to server, cluster, grid and cloud environments without the need to modify the workflow definition.
 Finally, Snakemake workflows can entail a description of required software, which will be automatically deployed to any execution environment.

 **Homepage: https://snakemake.readthedocs.io**

--- a/docs/executing/caching.rst
+++ b/docs/executing/caching.rst
+==========================================================
+Caching and reusing intermediate results between workflows
+==========================================================
+
+Within certain data analysis fields, there are certain intermediate results that reoccur in exactly the same way in many analysis.
+For example, in bioinformatics, reference genomes or annotations are downloaded, and read mapping indexes are built.
+Since such steps are independent of the actual data or measurements that are analyzed, but still computationally or timely expensive to conduct, it has been common practice to externalize their computation and assume the presence of the resulting files before execution of a workflow.
+
+From version 5.8.0 on, Snakemake offers a way to keep those steps inside the actual analysis without requiring from redundant computations.
+By hashing all steps, parameters, software stacks (in terms of conda environments or containers), and raw input required up to a certain step in a `blockchain <https://en.wikipedia.org/wiki/Blockchain>`_, Snakemake is able to recognize **before** the computation whether a certain result is already available in a central cache on the same system.
+**Note that this is explicitly intended for caching results between workflows! There is no need to use this feature to avoid redundant computations within a workflow. Snakemake does this already out of the box.**
+
+Such caching has to be explitly activated per rule, which can be done via the command line interface.
+For example,
+
+.. code-block:: console
+
+    $ export SNAKEMAKE_OUTPUT_CACHE=/mnt/snakemake-cache/
+    $ snakemake --cache download_reference create_index
+
+would instruct Snakemake to cache and reuse the results of the rules ``download_reference``and ``create_index``.
+The environment variable definition that happens in the first line (defining the location of the cache) should of course be done only once and system wide in reality.
+When Snakemake is executed without a shared filesystem (e.g., in the cloud, see :ref:`cloud`), the environment variable has to point to a location compatible with the given remote provider (e.g. an S3 or Google Storage bucket).
+In any case, the provided location should be shared between all workflows of your group, institute or computing environment, in order to benefit from the reuse of previously obtained intermediate results.
+
+Note that only rules with just a single output file are eligible for caching.
+Also note that the rules need to retrieve all their parameters via the ``params`` directive (except input files).
+It is not allowed to directly use ``wildcards``, ``config`` or any global variable in the shell command or script, because these are not captured in the hash (otherwise, reuse would be unnecessarily limited).
+
+Also note that Snakemake will store everything in the cache as readable and writeable for **all users** on the system (except in the remote case, where permissions are not enforced and depend on your storage configuration).
+Hence, caching is not intended for private data, just for steps that deal with publicly available resources.
+
+Finally, be aware that the implementation has to be considered **experimental** until this note is removed.
\ No newline at end of file
--- a/docs/executing/cli.rst
+++ b/docs/executing/cli.rst
+.. _executable:
+
+======================
+Command line interface
+======================
+
+This part of the documentation describes the ``snakemake`` executable.  Snakemake
+is primarily a command-line tool, so the ``snakemake`` executable is the primary way
+to execute, debug, and visualize workflows.
+
+.. user_manual-snakemake_options:
+
+-----------------------------
+Useful Command Line Arguments
+-----------------------------
+
+If called without parameters, i.e.
+
+.. code-block:: console
+
+    $ snakemake
+
+Snakemake tries to execute the workflow specified in a file called ``Snakefile`` in the same directory (instead, the Snakefile can be given via the parameter ``-s``).
+
+By issuing
+
+.. code-block:: console
+
+    $ snakemake -n
+
+a dry-run can be performed.
+This is useful to test if the workflow is defined properly and to estimate the amount of needed computation.
+Further, the reason for each rule execution can be printed via
+
+
+.. code-block:: console
+
+    $ snakemake -n -r
+
+Importantly, Snakemake can automatically determine which parts of the workflow can be run in parallel.
+By specifying the number of available cores, i.e.
+
+.. code-block:: console
+
+    $ snakemake --cores 4
+
+one can tell Snakemake to use up to 4 cores and solve a binary knapsack problem to optimize the scheduling of jobs.
+If the number is omitted (i.e., only ``--cores`` is given), the number of used cores is determined as the number of available CPU cores in the machine.
+
+Dealing with very large workflows
+---------------------------------
+
+If your workflow has a lot of jobs, Snakemake might need some time to infer the dependencies (the job DAG) and which jobs are actually required to run.
+The major bottleneck involved is the filesystem, which has to be queried for existence and modification dates of files.
+To overcome this issue, Snakemake allows to run large workflows in batches.
+This way, fewer files have to be evaluated at once, and therefore the job DAG can be inferred faster.
+By running
+
+.. code-block:: console
+
+    $ snakemake --cores 4 --batch myrule=1/3
+
+you instruct to only compute the first of three batches of the inputs of the rule `myrule`.
+To generate the second batch, run
+
+.. code-block:: console
+
+    $ snakemake --cores 4 --batch myrule=2/3
+
+Finally, when running
+
+
+.. code-block:: console
+
+    $ snakemake --cores 4 --batch myrule=3/3
+
+Snakemake will process beyond the rule `myrule`, because all of its input files have been generated, and complete the workflow.
+Obviously, a good choice of the rule to perform the batching is a rule that has a lot of input files and upstream jobs, for example a central aggregation step within your workflow.
+We advice all workflow developers to inform potential users of the best suited batching rule.
+
+.. _profiles:
+
+--------
+Profiles
+--------
+
+Adapting Snakemake to a particular environment can entail many flags and options.
+Therefore, since Snakemake 4.1, it is possible to specify a configuration profile
+to be used to obtain default options:
+
+.. code-block:: console
+
+   $ snakemake --profile myprofile
+
+Here, a folder ``myprofile`` is searched in per-user and global configuration directories (on Linux, this will be ``$HOME/.config/snakemake`` and ``/etc/xdg/snakemake``, you can find the answer for your system via ``snakemake --help``).
+Alternatively, an absolute or relative path to the folder can be given.
+The profile folder is expected to contain a file ``config.yaml`` that defines default values for the Snakemake command line arguments.
+For example, the file
+
+.. code-block:: yaml
+
+    cluster: qsub
+    jobs: 100
+
+would setup Snakemake to always submit to the cluster via the ``qsub`` command, and never use more than 100 parallel jobs in total.
+Under https://github.com/snakemake-profiles/doc, you can find publicly available profiles.
+Feel free to contribute your own.
+
+The profile folder can additionally contain auxilliary files, e.g., jobscripts, or any kind of wrappers.
+See https://github.com/snakemake-profiles/doc for examples.
+
+.. _all_options:
+
+-----------
+All Options
+-----------
+
+.. argparse::
+   :module: snakemake
+   :func: get_argument_parser
+   :prog: snakemake
+
+   All command line options can be printed by calling ``snakemake -h``.
+
+.. _getting_started-bash_completion:
+
+---------------
+Bash Completion
+---------------
+
+Snakemake supports bash completion for filenames, rulenames and arguments.
+To enable it globally, just append
+
+.. code-block:: bash
+
+    `snakemake --bash-completion`
+
+including the accents to your ``.bashrc``.
+This only works if the ``snakemake`` command is in your path.
\ No newline at end of file
--- a/docs/executable.rst
+++ b/docs/executable.rst
-.. _executable:
-
-===================
-Executing Snakemake
-===================
-
-This part of the documentation describes the ``snakemake`` executable.  Snakemake
-is primarily a command-line tool, so the ``snakemake`` executable is the primary way
-to execute, debug, and visualize workflows.
-
-.. user_manual-snakemake_options:
-
-----------------------------
-Useful Command Line Arguments
-----------------------------
-
-If called without parameters, i.e.
-
-.. code-block:: console
-
-    $ snakemake
-
-Snakemake tries to execute the workflow specified in a file called ``Snakefile`` in the same directory (instead, the Snakefile can be given via the parameter ``-s``).
-
-By issuing
-
-.. code-block:: console
-
-    $ snakemake -n
-
-a dry-run can be performed.
-This is useful to test if the workflow is defined properly and to estimate the amount of needed computation.
-Further, the reason for each rule execution can be printed via
-
-
-.. code-block:: console
-
-    $ snakemake -n -r
-
-Importantly, Snakemake can automatically determine which parts of the workflow can be run in parallel.
-By specifying the number of available cores, i.e.
-
-.. code-block:: console
-
-    $ snakemake -j 4
-
-one can tell Snakemake to use up to 4 cores and solve a binary knapsack problem to optimize the scheduling of jobs.
-If the number is omitted (i.e., only ``-j`` is given), the number of used cores is determined as the number of available CPU cores in the machine.
+===========================
+Cluster and cloud execution
+===========================

+.. _cloud:

 -------------
 Cloud Support
@@ -236,6 +192,8 @@ a job intends to use, such that Tibanna can allocate it to the most cost-effecti
 cloud compute instance available.


+.. _cluster:
+
 -----------------
 Cluster Execution
 -----------------
@@ -316,38 +274,6 @@ When executing a workflow on a cluster using the ``--cluster`` parameter (see be

    os.system("qsub -t {threads} {script}".format(threads=threads, script=jobscript))

-
-.. _profiles:
-
--------
-Profiles
--------
-
-Adapting Snakemake to a particular environment can entail many flags and options.
-Therefore, since Snakemake 4.1, it is possible to specify a configuration profile
-to be used to obtain default options:
-
-.. code-block:: console
-
-   $ snakemake --profile myprofile
-
-Here, a folder ``myprofile`` is searched in per-user and global configuration directories (on Linux, this will be ``$HOME/.config/snakemake`` and ``/etc/xdg/snakemake``, you can find the answer for your system via ``snakemake --help``).
-Alternatively, an absolute or relative path to the folder can be given.
-The profile folder is expected to contain a file ``config.yaml`` that defines default values for the Snakemake command line arguments.
-For example, the file
-
-.. code-block:: yaml
-
-    cluster: qsub
-    jobs: 100
-
-would setup Snakemake to always submit to the cluster via the ``qsub`` command, and never use more than 100 parallel jobs in total.
-Under https://github.com/snakemake-profiles/doc, you can find publicly available profiles.
-Feel free to contribute your own.
-
-The profile folder can additionally contain auxilliary files, e.g., jobscripts, or any kind of wrappers.
-See https://github.com/snakemake-profiles/doc for examples.
-
 .. _getting_started-visualization:

 -------------
@@ -375,60 +301,3 @@ To visualize the whole DAG regardless of the eventual presence of files, the ``f
    $ snakemake --forceall --dag | dot -Tpdf > dag.pdf

 Of course the visual appearance can be modified by providing further command line arguments to ``dot``.
-
-
-.. _cwl_export:
-
----------
-CWL export
----------
-
-Snakemake workflows can be exported to `CWL <http://www.commonwl.org/>`_, such that they can be executed in any `CWL-enabled workflow engine <https://www.commonwl.org/#Implementations>`_.
-Since, CWL is less powerful for expressing workflows than Snakemake (most importantly Snakemake offers more flexible scatter-gather patterns, since full Python can be used), export works such that every Snakemake job is encoded into a single step in the CWL workflow.
-Moreover, every step of that workflow calls Snakemake again to execute the job. The latter enables advanced Snakemake features like scripts, benchmarks and remote files to work inside CWL.
-So, when exporting keep in mind that the resulting CWL file can become huge, depending on the number of jobs in your workflow.
-To export a Snakemake workflow to CWL, simply run
-
-.. code-block:: console
-
-    $ snakemake --export-cwl workflow.cwl
-
-The resulting workflow will by default use the `Snakemake docker image <https://hub.docker.com/r/snakemake/snakemake>`_ for every step, but this behavior can be overwritten via the CWL execution environment.
-Then, the workflow can be executed in the same working directory with, e.g.,
-
-.. code-block:: console
-
-    $ cwltool workflow.cwl
-
-Note that due to limitations in CWL, it seems currently impossible to avoid that all target files (output files of target jobs), are written directly to the workdir, regardless of their relative paths in the Snakefile.
-
-Note that export is impossible in case the workflow contains :ref:`dynamic output files <snakefiles-dynamic_files>` or output files with absolute paths.
-
-.. _all_options:
-
-----------
-All Options
-----------
-
-.. argparse::
-   :module: snakemake
-   :func: get_argument_parser
-   :prog: snakemake
-
-   All command line options can be printed by calling ``snakemake -h``.
-
-.. _getting_started-bash_completion:
-
---------------
-Bash Completion
---------------
-
-Snakemake supports bash completion for filenames, rulenames and arguments.
-To enable it globally, just append
-
-.. code-block:: bash
-
-    `snakemake --bash-completion`
-
-including the accents to your ``.bashrc``.
-This only works if the ``snakemake`` command is in your path.
--- a/docs/executing/interoperability.rst
+++ b/docs/executing/interoperability.rst
+
+================
+Interoperability
+================
+
+.. _cwl_export:
+
+----------
+CWL export
+----------
+
+Snakemake workflows can be exported to `CWL <http://www.commonwl.org/>`_, such that they can be executed in any `CWL-enabled workflow engine <https://www.commonwl.org/#Implementations>`_.
+Since, CWL is less powerful for expressing workflows than Snakemake (most importantly Snakemake offers more flexible scatter-gather patterns, since full Python can be used), export works such that every Snakemake job is encoded into a single step in the CWL workflow.
+Moreover, every step of that workflow calls Snakemake again to execute the job. The latter enables advanced Snakemake features like scripts, benchmarks and remote files to work inside CWL.
+So, when exporting keep in mind that the resulting CWL file can become huge, depending on the number of jobs in your workflow.
+To export a Snakemake workflow to CWL, simply run
+
+.. code-block:: console
+
+    $ snakemake --export-cwl workflow.cwl
+
+The resulting workflow will by default use the `Snakemake docker image <https://hub.docker.com/r/snakemake/snakemake>`_ for every step, but this behavior can be overwritten via the CWL execution environment.
+Then, the workflow can be executed in the same working directory with, e.g.,
+
+.. code-block:: console
+
+    $ cwltool workflow.cwl
+
+Note that due to limitations in CWL, it seems currently impossible to avoid that all target files (output files of target jobs), are written directly to the workdir, regardless of their relative paths in the Snakefile.
+
+Note that export is impossible in case the workflow contains :ref:`dynamic output files <snakefiles-dynamic_files>` or output files with absolute paths.
\ No newline at end of file
--- a/docs/getting_started/examples.rst
+++ b/docs/getting_started/examples.rst
@@ -253,7 +253,7 @@ Assuming that the above file is saved as ``tex.rules``, the actual documents are
    FIGURES = ['fig1.pdf']

    include:
-        'tex.smrules'
+        'tex.rules'

    rule all:
        input:

--- a/docs/index.rst
+++ b/docs/index.rst
@@ -16,8 +16,8 @@ Snakemake
 .. image:: https://img.shields.io/docker/cloud/build/snakemake/snakemake
       :target: https://hub.docker.com/r/snakemake/snakemake

-.. image:: https://circleci.com/gh/snakemake/snakemake/tree/master.svg?style=shield
-    :target: https://circleci.com/gh/snakemake/snakemake/tree/master
+.. image:: https://github.com/snakemake/snakemake/workflows/CI/badge.svg?branch=master
+    :target: https://github.com/snakemake/snakemake/actions?query=branch%3Amaster+workflow%3ACI

 .. image:: https://img.shields.io/badge/stack-overflow-orange.svg
    :target: https://stackoverflow.com/questions/tagged/snakemake
@@ -37,7 +37,7 @@ Workflows are described via a human readable, Python based language.
 They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition.
 Finally, Snakemake workflows can entail a description of required software, which will be automatically deployed to any execution environment.

-Snakemake is **highly popular** with, on average, `a new citation every few days <https://badge.dimensions.ai/details/id/pub.1018944052>`_.
+Snakemake is **highly popular** with, `~3 new citations per week <https://badge.dimensions.ai/details/id/pub.1018944052>`_.


 .. _manual-quick_example:
@@ -53,17 +53,28 @@ Rules describe how to create **output files** from **input files**.

    rule targets:
        input:
-            "plots/dataset1.pdf",
-            "plots/dataset2.pdf"
+            "plots/myplot.pdf"

-    rule plot:
+    rule transform:
        input:
            "raw/{dataset}.csv"
        output:
-            "plots/{dataset}.pdf"
+            "transformed/{dataset}.csv"
+        singularity:
+            "docker://somecontainer:v1.0"
        shell:
            "somecommand {input} {output}"

+    rule aggregate_and_plot:
+        input:
+            expand("transformed/{dataset}.csv", dataset=[1, 2])
+        output:
+            "plots/myplot.pdf"
+        conda:
+            "envs/matplotlib.yaml"
+        script:
+            "scripts/plot.py"
+

 * Similar to GNU Make, you specify targets in terms of a pseudo-rule at the top.
 * For each target and intermediate file, you create rules that define how they are created from input files.
@@ -197,7 +208,10 @@ Please consider to add your own.
  :hidden:
  :maxdepth: 1

-  executable
+  executing/cli
+  executing/cluster-cloud
+  executing/caching
+  executing/interoperability

 .. toctree::
    :caption: Defining workflows

--- a/docs/project_info/contributing.rst
+++ b/docs/project_info/contributing.rst
@@ -44,7 +44,7 @@ Contributing a new cluster or cloud execution backend
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Execution backends are added by implementing a so-called ``Executor``.
-All executors are located in `snakemake/executors.py <https://github.com/snakemake/snakemake/src/master/snakemake/executors.py>`_.
+All executors are located in `snakemake/executors.py <https://github.com/snakemake/snakemake/tree/master/snakemake/executors.py>`_.
 In order to implement a new executor, you have to inherit from the class ``ClusterExecutor``.
 Below you find a skeleton


--- a/docs/snakefiles/reporting.rst
+++ b/docs/snakefiles/reporting.rst
@@ -5,7 +5,7 @@ Reports
 -------

 From Snakemake 5.1 on, it is possible to automatically generate detailed self-contained HTML reports that encompass runtime statistics, provenance information, workflow topology and results.
-**A realistic example report from a real workflow can be found `here <https://koesterlab.github.io/resources/report.html>`_.**
+**A realistic example report from a real workflow can be found** `here <https://koesterlab.github.io/resources/report.html>`_.

 For including results into the report, the Snakefile has to be annotated with additional information.
 Each output file that shall be part of the report has to be marked with the ``report`` flag, which optionally points to a caption in `restructured text format <http://docutils.sourceforge.net/rst.html>`_ and allows to define a ``category`` for grouping purposes.

--- a/docs/snakefiles/rules.rst
+++ b/docs/snakefiles/rules.rst
@@ -985,7 +985,7 @@ Defining groups for execution

 From Snakemake 5.0 on, it is possible to assign rules to groups.
 Such groups will be executed together in **cluster** or **cloud mode**, as a so-called **group job**, i.e., all jobs of a particular group will be submitted at once, to the same computing node.
-By this, queueing and execution time can be safed, in particular if one or several short-running rules are involved.
+By this, queueing and execution time can be saved, in particular if one or several short-running rules are involved.
 When executing locally, group definitions are ignored.

 Groups can be defined via the ``group`` keyword, e.g.,
@@ -1140,7 +1140,10 @@ To illustrate the possibilities of this mechanism, consider the following comple
  # input function for the rule aggregate
  def aggregate_input(wildcards):
      # decision based on content of output file
-      with open(checkpoints.somestep.get(sample=wildcards.sample).output[0]) as f:
+      # Important: use the method open() of the returned file!
+      # This way, Snakemake is able to automatically download the file if it is generated in 
+      # a cloud environment without a shared filesystem.
+      with checkpoints.somestep.get(sample=wildcards.sample).output[0].open() as f:
          if f.read().strip() == "a":
              return "post/{sample}.txt"
          else:

--- a/format.sh
+++ b/format.sh
+#!/bin/sh
+
+black snakemake
+black tests/*.py
--- a/misc/vim/syntax/snakemake.vim
+++ b/misc/vim/syntax/snakemake.vim
@@ -49,7 +49,7 @@ source $VIMRUNTIME/syntax/python.vim
 syn keyword pythonStatement	include workdir onsuccess onerror
 syn keyword pythonStatement	ruleorder localrules configfile group
 syn keyword pythonStatement	touch protected temp wrapper conda shadow
-syn keyword pythonStatement	input output params message threads resources singularity
+syn keyword pythonStatement	input output params message threads resources singularity wildcard_constraints
 syn keyword pythonStatement	version run shell benchmark snakefile log script
 syn keyword pythonStatement	rule subworkflow nextgroup=pythonFunction skipwhite


--- a/setup.py
+++ b/setup.py
@@ -37,7 +37,7 @@ setup(
    zip_safe=False,
    license="MIT",
    url="https://snakemake.readthedocs.io",
-    packages=["snakemake", "snakemake.remote", "snakemake.report"],
+    packages=["snakemake", "snakemake.remote", "snakemake.report", "snakemake.caching"],
    entry_points={
        "console_scripts": [
            "snakemake = snakemake:main",

--- a/snakemake/__init__.py
+++ b/snakemake/__init__.py
 __author__ = "Johannes Köster"
-__contributors__ = ["Soohyun Lee"]
-__copyright__ = "Copyright 2015, Johannes Köster"
+__copyright__ = "Copyright 2015-2019, Johannes Köster"
 __email__ = "koester@jimmy.harvard.edu"
 __license__ = "MIT"

@@ -40,6 +39,7 @@ SNAKEFILE_CHOICES = [
 def snakemake(
    snakefile,
    batch=None,
+    cache=None,
    report=None,
    listrules=False,
    list_target_rules=False,
@@ -87,6 +87,7 @@ def snakemake(
    cleanup_metadata=None,
    cleanup_conda=False,
    cleanup_shadow=False,
+    cleanup_scripts=True,
    force_incomplete=False,
    ignore_incomplete=False,
    list_version_changes=False,
@@ -194,6 +195,7 @@ def snakemake(
        cleanup_metadata (list):    just cleanup metadata of given list of output files (default None)
        cleanup_conda (bool):       just cleanup unused conda environments (default False)
        cleanup_shadow (bool):      just cleanup old shadow directories (default False)
+        cleanup_scripts (bool):     delete wrapper scripts used for execution (default True)
        force_incomplete (bool):    force the re-creation of incomplete files (default False)
        ignore_incomplete (bool):   ignore incomplete files (default False)
        list_version_changes (bool): list output files with changed rule version (default False)
@@ -457,6 +459,7 @@ def snakemake(
            default_remote_prefix=default_remote_prefix,
            run_local=run_local,
            default_resources=default_resources,
+            cache=cache,
        )
        success = True
        workflow.include(
@@ -479,6 +482,7 @@ def snakemake(
                    cores=cores,
                    nodes=nodes,
                    local_cores=local_cores,
+                    cache=cache,
                    resources=resources,
                    default_resources=default_resources,
                    dryrun=dryrun,
@@ -504,6 +508,7 @@ def snakemake(
                    cleanup_metadata=cleanup_metadata,
                    cleanup_conda=cleanup_conda,
                    cleanup_shadow=cleanup_shadow,
+                    cleanup_scripts=cleanup_scripts,
                    force_incomplete=force_incomplete,
                    ignore_incomplete=ignore_incomplete,
                    latency_wait=latency_wait,
@@ -604,6 +609,7 @@ def snakemake(
                    cleanup_metadata=cleanup_metadata,
                    cleanup_conda=cleanup_conda,
                    cleanup_shadow=cleanup_shadow,
+                    cleanup_scripts=cleanup_scripts,
                    subsnakemake=subsnakemake,
                    updated_files=updated_files,
                    allowed_rules=allowed_rules,
@@ -804,6 +810,17 @@ def get_argument_parser(profile=None):
        ),
    )

+    group_exec.add_argument(
+        "--cache",
+        nargs="+",
+        metavar="RULE",
+        help="Store output files of given rules in a central cache given by the environment "
+        "variable $SNAKEMAKE_OUTPUT_CACHE. Likewise, retrieve output files of the given rules "
+        "from this cache if they have been created before (by anybody writing to the same cache), "
+        "instead of actually executing the rules. Output files are identified by hashing all "
+        "steps, parameters and software stack (conda envs or containers) needed to create them.",
+    )
+
    group_exec.add_argument(
        "--snakefile",
        "-s",
@@ -913,7 +930,12 @@ def get_argument_parser(profile=None):
            "changing them) instead of running their commands. This is "
            "used to pretend that the rules were executed, in order to "
            "fool future invocations of snakemake. Fails if a file does "
-            "not yet exist."
+            "not yet exist. Note that this will only touch files that would "
+            "otherwise be recreated by Snakemake (e.g. because their input "
+            "files are newer). For enforcing a touch, combine this with "
+            "--force, --forceall, or --forcerun. Note however that you loose "
+            "the provenance information when the files have been created in "
+            "realitiy. Hence, this should be used only as a last resort."
        ),
    )
    group_exec.add_argument(
@@ -1135,6 +1157,11 @@ def get_argument_parser(profile=None):
        help="Cleanup old shadow directories which have not been deleted due "
        "to failures or power loss.",
    )
+    group_utils.add_argument(
+        "--skip-script-cleanup",
+        action="store_true",
+        help="Don't delete wrapper scripts used for execution",
+    )
    group_utils.add_argument(
        "--unlock", action="store_true", help="Remove a lock on the working directory."
    )
@@ -1889,6 +1916,7 @@ def main(argv=None):
        success = snakemake(
            args.snakefile,
            batch=batch,
+            cache=args.cache,
            report=args.report,
            listrules=args.list,
            list_target_rules=args.list_target_rules,
@@ -1941,6 +1969,7 @@ def main(argv=None):
            cleanup_metadata=args.cleanup_metadata,
            cleanup_conda=args.cleanup_conda,
            cleanup_shadow=args.cleanup_shadow,
+            cleanup_scripts=not args.skip_script_cleanup,
            force_incomplete=args.rerun_incomplete,
            ignore_incomplete=args.ignore_incomplete,
            list_version_changes=args.list_version_changes,

--- a/snakemake/_version.py
+++ b/snakemake/_version.py
@@ -22,9 +22,9 @@ def get_keywords():
    # setup.py/versioneer.py will grep for the variable names, so they must
    # each be defined on a line of their own. _version.py will just call
    # get_keywords().
-    git_refnames = " (tag: v5.7.4)"
-    git_full = "86d6434cf40e334ef81ad9e548d754b0b0dd680f"
-    git_date = "2019-10-23 11:05:20 +0200"
+    git_refnames = " (HEAD -> master, tag: v5.8.1)"
+    git_full = "ee0d5b17311c9126d89b49eec70045a9fdd6bd9c"
+    git_date = "2019-11-15 15:33:42 +0100"
    keywords = {"refnames": git_refnames, "full": git_full, "date": git_date}
    return keywords


--- a/snakemake/caching/__init__.py
+++ b/snakemake/caching/__init__.py
+__authors__ = "Johannes Köster, Sven Nahnsen"
+__copyright__ = "Copyright 2019, Johannes Köster, Sven Nahnsen"
+__email__ = "johannes.koester@uni-due.de"
+__license__ = "MIT"
+
+from abc import ABCMeta, abstractmethod
+import os
+
+from snakemake.jobs import Job
+from snakemake.exceptions import WorkflowError, CacheMissException
+from snakemake.caching.hash import ProvenanceHashMap
+
+LOCATION_ENVVAR = "SNAKEMAKE_OUTPUT_CACHE"
+
+
+class AbstractOutputFileCache:
+    __metaclass__ = ABCMeta
+
+    def __init__(self):
+        try:
+            self.cache_location = os.environ[LOCATION_ENVVAR]
+        except KeyError:
+            raise WorkflowError(
+                "Output file cache activated (--cache), but no cache "
+                "location specified. Please set the environment variable "
+                "${}.".format(LOCATION_ENVVAR)
+            )
+        self.provenance_hash_map = ProvenanceHashMap()
+
+    @abstractmethod
+    def store(self, job: Job):
+        pass
+
+    @abstractmethod
+    def fetch(self, job: Job):
+        pass
+
+    @abstractmethod
+    def exists(self, job: Job):
+        pass
+
+    def get_outputfile(self, job: Job, check_exists=True):
+        self.check_job(job)
+        outputfile = job.output[0]
+        if check_exists:
+            assert os.path.exists(
+                outputfile
+            ), "Bug: Output file does not exist although it should be cached."
+        return outputfile
+
+    def check_job(self, job: Job):
+        assert (
+            not job.dynamic_output
+        ), "Bug: Rules with dynamic output may not be cached."
+        assert len(job.output) == 1, "Bug: Only single output files are supported."
+
+    def raise_write_error(self, entry, exception=None):
+        raise WorkflowError(
+            "Given output cache entry {} ($SNAKEMAKE_OUTPUT_CACHE={}) is not writeable.".format(
+                entry, self.cache_location
+            ),
+            *[exception],
+        )
+
+    def raise_read_error(self, entry, exception=None):
+        raise WorkflowError(
+            "Given output cache entry {} ($SNAKEMAKE_OUTPUT_CACHE={}) is not readable.".format(
+                entry, self.cache_location
+            ),
+            *[exception],
+        )
+
+    def raise_cache_miss_exception(self, job):
+        raise CacheMissException("Job {} not yet cached.".format(job))