Andreas Tille · Andreas Tille · Andreas Tille · Andreas Tille · Andreas Tille · Andreas Tille
--- a/.gitignore
+++ b/.gitignore
@@ -3,6 +3,5 @@ build.xml
 manifest.mf
 nbproject
 dist/README.TXT
-
-
+dist/javadoc
 install-packages.R
--- a/HDF5License.txt
+++ b/HDF5License.txt
-
-Copyright Notice and License Terms for 
-HDF5 (Hierarchical Data Format 5) Software Library and Utilities
-----------------------------------------------------------------------------
-
-HDF5 (Hierarchical Data Format 5) Software Library and Utilities
-Copyright 2006-2015 by The HDF Group.
-
-NCSA HDF5 (Hierarchical Data Format 5) Software Library and Utilities
-Copyright 1998-2006 by the Board of Trustees of the University of Illinois.
-
-All rights reserved.
-
-Redistribution and use in source and binary forms, with or without 
-modification, are permitted for any purpose (including commercial purposes) 
-provided that the following conditions are met:
-
-1. Redistributions of source code must retain the above copyright notice, 
-   this list of conditions, and the following disclaimer.
-
-2. Redistributions in binary form must reproduce the above copyright notice, 
-   this list of conditions, and the following disclaimer in the documentation 
-   and/or materials provided with the distribution.
-
-3. In addition, redistributions of modified forms of the source or binary 
-   code must carry prominent notices stating that the original code was 
-   changed and the date of the change.
-
-4. All publications or advertising materials mentioning features or use of 
-   this software are asked, but not required, to acknowledge that it was 
-   developed by The HDF Group and by the National Center for Supercomputing 
-   Applications at the University of Illinois at Urbana-Champaign and 
-   credit the contributors.
-
-5. Neither the name of The HDF Group, the name of the University, nor the 
-   name of any Contributor may be used to endorse or promote products derived 
-   from this software without specific prior written permission from 
-   The HDF Group, the University, or the Contributor, respectively.
-
-DISCLAIMER: 
-THIS SOFTWARE IS PROVIDED BY THE HDF GROUP AND THE CONTRIBUTORS 
-"AS IS" WITH NO WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED.  In no 
-event shall The HDF Group or the Contributors be liable for any damages 
-suffered by the users arising out of the use of this software, even if 
-advised of the possibility of such damage. 
-
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
-
-Contributors:   National Center for Supercomputing Applications (NCSA) at 
-the University of Illinois, Fortner Software, Unidata Program Center (netCDF), 
-The Independent JPEG Group (JPEG), Jean-loup Gailly and Mark Adler (gzip), 
-and Digital Equipment Corporation (DEC).
-
-----------------------------------------------------------------------------
-
-Portions of HDF5 were developed with support from the Lawrence Berkeley 
-National Laboratory (LBNL) and the United States Department of Energy 
-under Prime Contract No. DE-AC02-05CH11231.
-
-----------------------------------------------------------------------------
-
-Portions of HDF5 were developed with support from the University of 
-California, Lawrence Livermore National Laboratory (UC LLNL).  
-The following statement applies to those portions of the product and must 
-be retained in any redistribution of source code, binaries, documentation, 
-and/or accompanying materials:
-
-   This work was partially produced at the University of California, 
-   Lawrence Livermore National Laboratory (UC LLNL) under contract 
-   no. W-7405-ENG-48 (Contract 48) between the U.S. Department of Energy 
-   (DOE) and The Regents of the University of California (University) 
-   for the operation of UC LLNL.
-
-   DISCLAIMER: 
-   This work was prepared as an account of work sponsored by an agency of 
-   the United States Government. Neither the United States Government nor 
-   the University of California nor any of their employees, makes any 
-   warranty, express or implied, or assumes any liability or responsibility 
-   for the accuracy, completeness, or usefulness of any information, 
-   apparatus, product, or process disclosed, or represents that its use 
-   would not infringe privately- owned rights. Reference herein to any 
-   specific commercial products, process, or service by trade name, 
-   trademark, manufacturer, or otherwise, does not necessarily constitute 
-   or imply its endorsement, recommendation, or favoring by the United 
-   States Government or the University of California. The views and 
-   opinions of authors expressed herein do not necessarily state or reflect 
-   those of the United States Government or the University of California, 
-   and shall not be used for advertising or product endorsement purposes.
-----------------------------------------------------------------------------
-
-
--- a/README.md
+++ b/README.md
 ![NanoOK](https://documentation.tgac.ac.uk/download/thumbnails/7209095/nanook-01.jpg?version=1&modificationDate=1447675247000&api=v2)

-Full documentation can be found at https://documentation.tgac.ac.uk/display/NANOOK/NanoOK
+Full documentation can be found at http://nanook.readthedocs.io/

 Contact richard.leggett@earlham.ac.uk for more information or for comments/bug reports.
--- a/bin/nanook_split_fasta
+++ b/bin/nanook_split_fasta
-#!/usr/bin/perl
-#
-# Program: nanotools_split_fasta
-# Purpose: Split FASTA file into separate files for each read
-# Author:  Richard Leggett
-# Contact: richard.leggett@tgac.ac.uk
-
-use strict;
-use warnings;
-use Getopt::Long;
-
-my $input_file;
-my $output_dir;
-my $help_requested;
-my %ids;
-my $count = 0;
-
-&GetOptions(
-'i|input:s'       => \$input_file,
-'o|outputdir:s' => \$output_dir,
-'h|help'        => \$help_requested
-);
-
-if (defined $help_requested) {
-    print "\nnanotools_split_fasta\n\n";
-    print "Split a multi-read FASTA into separate files.\n\n";
-    print "Usage: nanotools_split_fasta.pl <-i input> [-o output_dir]\n\n";
-    print "Options:\n";
-    print "    -i | -input      Input FASTA file\n";
-    print "    -o | -outputdir  Output directory\n";
-    print "\n";
-    
-    exit;
-}
-
-die "You must specify an input file\n" if not defined $input_file;
-die "You must specify an output directory\n" if not defined $output_dir;
-
-my $fh;
-
-local $| = 1;
-
-open(INPUTFILE, $input_file) or die "Can't open input ".$input_file."\n";
-
-while(<INPUTFILE>) {
-    my $line = $_;
-    
-    if ($line =~ /^>(\S+)/) {
-        my $id = $1;
-        
-        if (not defined $ids{$id}) {
-            $ids{$id} = 1;
-            
-            if (defined $fh) {
-                close($fh);
-            }
-            
-            my $out_filename = $output_dir."/".$id.".fasta";
-            $count++;
-            #print "Writing $out_filename\n";
-            
-            if (($count % 10) == 0) {
-                print "\r$count";
-            }
-            
-            open($fh, ">".$out_filename) or die "Can't open output ".$out_filename."\n";
-        } else {
-            print "WARNING: Repeat ID $id\n";
-        }
-    }
-
-    if (defined $fh) {
-        print $fh $line;
-    } else {
-        print "Eeek\n";
-    }
-}
-
-if (defined $fh) {
-    close($fh);
-}
-
-close(INPUTFILE);
--- a/bin/nanook_split_reads.pl
+++ b/bin/nanook_split_reads.pl
+#!/usr/bin/perl
+#
+# Program: nanook_split_reads
+# Purpose: Split FASTA/Q file into separate files for each read
+# Author:  Richard Leggett
+# Contact: richard.leggett@earlham.ac.uk
+
+use strict;
+use warnings;
+use Getopt::Long;
+
+my $version="v0.02";
+my $input_file;
+my $output_dir;
+my $help_requested;
+my %ids;
+my $count = 0;
+my $input_format;
+my $output_format;
+my $requested_output_format;
+my $reads_per_chunk = 4000;
+
+&GetOptions(
+'f|outputfmt:s' => \$requested_output_format,
+'i|input:s'     => \$input_file,
+'o|outputdir:s' => \$output_dir,
+'h|help'        => \$help_requested
+);
+
+print "\nnanook_split_reads $version\n\n";
+
+if (defined $help_requested) {
+    print "Split a multi-read FASTA into separate files.\n\n";
+    print "Usage: nanook_split_reads.pl <-i input> [-o output_dir]\n\n";
+    print "Options:\n";
+    print "    -i | -input      Input FASTA/Q file\n";
+    print "    -o | -outputdir  Output directory\n";
+    print "    -f | -outputfmt  Output format FASTA or FASTQ\n";
+    print "                     (defaults to same as input)\n";
+    print "\n";
+    
+    exit;
+}
+
+die "You must specify an input file\n" if not defined $input_file;
+die "You must specify an output directory\n" if not defined $output_dir;
+
+if ($input_file =~ /.fq$/i) {
+    $input_format = "FASTQ";
+} elsif ($input_file =~ /.fastq$/i) {
+    $input_format = "FASTQ";
+} elsif ($input_file =~ /.fa/i) {
+    $input_format = "FASTA";
+} elsif ($input_file =~ /.fasta$/i) {
+    $input_format = "FASTA";
+} else {
+    die "Can't determine input file format from filename.\n";
+}
+
+if (defined $requested_output_format) {
+    $output_format = uc($requested_output_format);
+    
+    if ($input_format eq "FASTA") {
+        if ($output_format ne "FASTA") {
+            $output_format = "FASTA";
+            print "Defaulting to FASTA output for FASTA input\n";
+        }
+    } else {
+        if (($output_format ne "FASTA") && ($output_format ne "FASTQ")) {
+            $output_format = $input_format;
+            print "Unknown output format - defaulting to ".$output_format."\n";
+        }
+    }
+} else {
+    $output_format = $input_format;
+}
+
+print " Input format: $input_format\n";
+print "Output format: $output_format\n";
+
+local $| = 1;
+
+my $chunk = 0;
+
+open(INPUTFILE, $input_file) or die "Can't open input ".$input_file."\n";
+
+while(<INPUTFILE>) {
+    my $header_line = $_;
+    my $sequence;
+    my $qual_id;
+    my $qualities;
+    my $read_id;
+
+    if (($count % $reads_per_chunk) == 0) {
+        mkdir($output_dir."/".$chunk);
+    }
+    
+    if ($input_format eq "FASTA") {
+        if ($header_line =~ /^>(\S+)/) {
+            $read_id = $1;
+        } else {
+            die "Couldn't get read ID from $header_line\n";
+        }
+        $sequence = <INPUTFILE>;
+    } else {
+        if ($header_line =~ /^@(\S+)/) {
+            $read_id = $1;
+        } else {
+            die "Couldn't get read ID from $header_line\n";
+        }
+        $sequence = <INPUTFILE>;
+        $qual_id = <INPUTFILE>;
+        $qualities = <INPUTFILE>;
+    }
+
+#print $header_line."\n";    
+    
+    if (not defined $ids{$read_id}) {
+        $ids{$read_id} = 1;
+    } else {
+        print "\nWARNING: Repeat ID $read_id\n";
+        my $i=2;
+        my $newid;
+        do {
+            $newid = $read_id."_".$i;
+            $i++;
+        } while (defined $ids{$newid});
+        
+        print "         Changed to $newid\n";
+        $read_id = $newid;
+    }
+
+    if ($output_format eq "FASTQ") {    
+        if ($input_format eq "FASTA") {
+            $header_line =~ s/^>/@/;
+        }
+
+        my $out_filename = $output_dir."/".$chunk."/".$read_id.".fastq";
+        open(OUTFILE, ">".$out_filename) or die "Can't open output ".$out_filename."\n";
+        print OUTFILE $header_line;
+        print OUTFILE $sequence;
+        print OUTFILE $qual_id;
+        print OUTFILE $qualities;
+        close(OUTFILE);
+    } else {
+        if ($input_format eq "FASTQ") {
+            $header_line =~ s/^@/>/;
+        }
+
+        my $out_filename = $output_dir."/".$chunk."/".$read_id.".fasta";
+        open(OUTFILE, ">".$out_filename) or die "Can't open output ".$out_filename."\n";
+        print OUTFILE $header_line;
+        print OUTFILE $sequence;
+        close(OUTFILE);
+    }
+
+    $count++;
+    if (($count % $reads_per_chunk) == 0) {
+        $chunk++;
+    }
+    
+    if (($count % 10) == 0) {
+        print "\r$count";
+    }
+}
+
+close(INPUTFILE);
+
+print "\nDone\n";
--- a/debian/changelog
+++ b/debian/changelog
+nanook (1.33+dfsg-1) unstable; urgency=medium
+
+  * New upstream version
+  * debhelper 11
+  * Point Vcs fields to salsa.debian.org
+  * Standards-Version: 4.2.1
+
+ -- Andreas Tille <tille@debian.org>  Wed, 19 Sep 2018 22:21:00 +0200
+
 nanook (1.26+dfsg-1) unstable; urgency=medium

  * Initial release (Closes: #873983)

--- a/debian/compat
+++ b/debian/compat
-10
+11
--- a/debian/control
+++ b/debian/control
@@ -3,13 +3,13 @@ Maintainer: Debian Med Packaging Team <debian-med-packaging@lists.alioth.debian.
 Uploaders: Andreas Tille <tille@debian.org>
 Section: science
 Priority: optional
-Build-Depends: debhelper (>= 10),
+Build-Depends: debhelper (>= 11~),
               default-jdk,
               javahelper,
               libcommons-io-java
-Standards-Version: 4.1.0
-Vcs-Browser: https://anonscm.debian.org/cgit/debian-med/nanook.git
-Vcs-Git: https://anonscm.debian.org/git/debian-med/nanook.git
+Standards-Version: 4.2.1
+Vcs-Browser: https://salsa.debian.org/med-team/nanook
+Vcs-Git: https://salsa.debian.org/med-team/nanook.git
 Homepage: https://documentation.tgac.ac.uk/display/NANOOK/NanoOK

 Package: nanook

--- a/debian/copyright
+++ b/debian/copyright
@@ -9,12 +9,6 @@ Copyright: 2015-2017 Richard M. Leggett,
                     The Earlham Institute (formerly The Genome Analysis Centre)
 License: GPL-3

-Files: HDF5License.txt
-Comment: According to the author this is an artefact
-  Date: Mon, 13 Nov 2017 09:50:47 +0000
-  From: "Richard Leggett (EI)" <Richard.Leggett@earlham.ac.uk>
-  Thet HDF5License is an artefact - I no longer include any of their code. So I’ve just removed the license from GitHub.
-
 Files: debian/*
 Copyright: 2017 Andreas Tille <tille@debian.org>
 License: GPL-3

--- a/debian/patches/set_jar_path_in_bin.patch
+++ b/debian/patches/set_jar_path_in_bin.patch
 Author: Andreas Tille <tille@debian.org>
 Last-Update: Fri, 01 Sep 2017 14:39:50 +0200
-Description: Set internal pathes to fit Debian package layout
+Description: Set internal paths to fit Debian package layout

 --- a/bin/nanook
 +++ b/bin/nanook

--- a/docs/Makefile
+++ b/docs/Makefile
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line.
+SPHINXOPTS    =
+SPHINXBUILD   = python -msphinx
+SPHINXPROJ    = NanoOK
+SOURCEDIR     = source
+BUILDDIR      = build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
--- a/docs/build.sh
+++ b/docs/build.sh
+sphinx-build -a -b html source build
+
+#pandoc --from html --to rst datafiles.html > source/datafiles.rst
--- a/docs/docker.html
+++ b/docs/docker.html
+<div>A docker image of NanoOK is provided on <a href="https://registry.hub.docker.com/u/richardmleggett/nanook/">Docker Hub</a> which includes all the dependencies needed to run. <span style="line-height: 1.4285715;">First, you need to have installed the Docker Engine. </span>
+  <span style="line-height: 1.4285715;">Then you can pull the NanoOK image:</span>
+</div>
+<div>
+  <p> </p>
+  <ac:structured-macro ac:macro-id="6efed1fc-7f05-4b27-a543-ab81c2a08c2f" ac:name="code" ac:schema-version="1">
+    <ac:plain-text-body><![CDATA[docker pull richardmleggett/nanook]]></ac:plain-text-body>
+  </ac:structured-macro>
+  <p>
+    <span style="line-height: 1.4285715;">To run NanoOK, the easiest way is to run a shell in the NanoOK image using:</span>
+  </p>
+  <ac:structured-macro ac:macro-id="4187711a-13fc-4b7d-a6d6-b4ac97d9069f" ac:name="code" ac:schema-version="1">
+    <ac:plain-text-body><![CDATA[docker run -i -t -v /path/to/your/data:/usr/nanopore richardmleggett/nanook bash]]></ac:plain-text-body>
+  </ac:structured-macro>
+  <p>
+    <span style="line-height: 1.4285715;">From here you will get a prompt from which you can run your NanoOK commands, for example:</span>
+  </p>
+  <ac:structured-macro ac:macro-id="0fdc8356-3222-486b-9856-196cf90c70d9" ac:name="code" ac:schema-version="1">
+    <ac:plain-text-body><![CDATA[root@e21d794315d9:/# nanook extract -s /usr/nanopore/YourSample]]></ac:plain-text-body>
+  </ac:structured-macro>
+  <p>When you have finished, type <code>exit</code> to end Docker.</p>
+  <p>Notes:</p>
+  <ul>
+    <li>In the docker run command, you need to map your data directory to the Docker image. This is done with the <code>-v</code> option. In the above example, the data on our local machine is in <code>/path/to/your/data</code> and this appears in the Docker image as <code>/usr/nanopore</code>, which is why we specify <code>/usr/nanopore/YourSample</code> as the sample directory to the nanook command.</li>
+    <li>If you get an error from the docker command, it may be because you haven't sudo'd it, or added your user to the docker group - see <a href="http://askubuntu.com/questions/477551/how-can-i-use-docker-without-sudo">How can I use docker without sudo?</a>
+    </li>
+  </ul>
+</div>
+
+
--- a/docs/source/commonerrors.rst
+++ b/docs/source/commonerrors.rst
+Common errors
+=============
+
+**Error: unable to find any alignments to process**
+
+-  Have you run the align stage? If not, run it.
+-  If you have run the align stage, have the alignments worked? Have a
+   look inside sample/last/pass/2D or appropriate. If there are files
+   there, but they are 0 bytes, see below.
+
+**All my LAST alignment files are empty**
+
+-  Have you indexed your reference file?
+
+**lastal: invalid option -- 'o' when aligning**
+
+-  This means your version of LAST is too old. Older versions do not
+   support the -o option to output to a file and only output to the
+   screen.
+-  Solution: install latest version of LAST.
+
+**I don't get a PDF file in the latex directory**
+
+-  If there is a .tex file, then something went wrong converting the
+   LaTeX to PDF - probably missing LaTeX packages.
+-  Have a look at the .log file inside the latex directory. You will
+   likely see an error such as:
+   ``       ! LaTeX Error: File `multirow.sty' not found.     ``
+-  In this instance, you need to install the multirow package.
+
+**java.lang.OutOfMemoryError: Java heap space**
+
+-   Try editing the line in the nanook file of the bin directory: JAVA\_ARGS="-Xmx2048m"
+-   By default, this sets a maximum Java memory size of 2048 Mb - try increasing this according to what your memory you have available.
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
+# -*- coding: utf-8 -*-
+#
+# NanoOK documentation build configuration file, created by
+# sphinx-quickstart on Fri Sep  1 11:15:43 2017.
+#
+# This file is execfile()d with the current directory set to its
+# containing dir.
+#
+# Note that not all possible configuration values are present in this
+# autogenerated file.
+#
+# All configuration values have a default; values that are commented out
+# serve to show the default.
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+# import os
+# import sys
+# sys.path.insert(0, os.path.abspath('.'))
+
+
+# -- General configuration ------------------------------------------------
+
+# If your documentation needs a minimal Sphinx version, state it here.
+#
+# needs_sphinx = '1.0'
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = ['sphinx.ext.todo',
+    'sphinx.ext.imgmath',
+    'sphinx.ext.ifconfig',
+    'sphinx.ext.githubpages']
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# The suffix(es) of source filenames.
+# You can specify multiple suffix as a list of string:
+#
+# source_suffix = ['.rst', '.md']
+source_suffix = '.rst'
+
+# The master toctree document.
+master_doc = 'index'
+
+# General information about the project.
+project = u'NanoOK'
+copyright = u'2017, Richard Leggett'
+author = u'Richard Leggett'
+
+# The version info for the project you're documenting, acts as replacement for
+# |version| and |release|, also used in various other places throughout the
+# built documents.
+#
+# The short X.Y version.
+version = u'1.27'
+# The full version, including alpha/beta/rc tags.
+release = u'1.27'
+
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#
+# This is also used if you do content translation via gettext catalogs.
+# Usually you set "language" from the command line for these cases.
+language = None
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This patterns also effect to html_static_path and html_extra_path
+exclude_patterns = []
+
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = 'sphinx'
+
+# If true, `todo` and `todoList` produce output, else they produce nothing.
+todo_include_todos = True
+
+
+# -- Options for HTML output ----------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+#html_theme = 'alabaster'
+
+# Theme options are theme-specific and customize the look and feel of a theme
+# further.  For a list of options available for each theme, see the
+# documentation.
+#
+# html_theme_options = {}
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+
+# Custom sidebar templates, must be a dictionary that maps document names
+# to template names.
+#
+# This is required for the alabaster theme
+# refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars
+html_sidebars = {
+    '**': [
+        'about.html',
+        'navigation.html',
+        'relations.html',  # needs 'show_related': True theme option to display
+        'searchbox.html',
+        'donate.html',
+    ]
+}
+
+
+# -- Options for HTMLHelp output ------------------------------------------
+
+# Output file base name for HTML help builder.
+htmlhelp_basename = 'NanoOKdoc'
+
+
+# -- Options for LaTeX output ---------------------------------------------
+
+latex_elements = {
+    # The paper size ('letterpaper' or 'a4paper').
+    #
+    # 'papersize': 'letterpaper',
+
+    # The font size ('10pt', '11pt' or '12pt').
+    #
+    # 'pointsize': '10pt',
+
+    # Additional stuff for the LaTeX preamble.
+    #
+    # 'preamble': '',
+
+    # Latex figure (float) alignment
+    #
+    # 'figure_align': 'htbp',
+}
+
+# Grouping the document tree into LaTeX files. List of tuples
+# (source start file, target name, title,
+#  author, documentclass [howto, manual, or own class]).
+latex_documents = [
+    (master_doc, 'NanoOK.tex', u'NanoOK Documentation',
+     u'Richard Leggett', 'manual'),
+]
+
+
+# -- Options for manual page output ---------------------------------------
+
+# One entry per manual page. List of tuples
+# (source start file, name, description, authors, manual section).
+man_pages = [
+    (master_doc, 'nanook', u'NanoOK Documentation',
+     [author], 1)
+]
+
+
+# -- Options for Texinfo output -------------------------------------------
+
+# Grouping the document tree into Texinfo files. List of tuples
+# (source start file, target name, title, author,
+#  dir menu entry, description, category)
+texinfo_documents = [
+    (master_doc, 'NanoOK', u'NanoOK Documentation',
+     author, 'NanoOK', 'One line description of project.',
+     'Miscellaneous'),
+]
+
+
+
--- a/docs/source/datafiles.rst
+++ b/docs/source/datafiles.rst
+NanoOK data files
+=================
+
+While analysing alignments, NanoOK will write a number of tab delimited
+files to the 'analysis' subdirectory. These are used for graph plotting
+through R, but you may wish to use them in other applications or your
+own custom analyses. There are a number of global data files, plus
+subdirectories for each reference.
+
+In the analysis directory are the global data files:
+
+-  **length\_summary.txt**
+
+   -  Column 1: Read type - Template, Complement or 2D
+   -  Column 2: Number of reads
+   -  Column 3: Mean read length
+   -  Column 4: Longest read length
+   -  Column 5: Shortest read length
+   -  Column 6: N50 length
+   -  Column 7: Number of reads covered by N50
+   -  Column 8: N90 length
+   -  Column 9: Number of reads covered by N90
+
+-  **all\_summary.txt** - summary of number of reads and number of
+   alignments
+-  **all\_\ **[2D\|Template\|Complement]\_**\ alignment\_summary.txt** -
+   summary of number of reads aligning to each reference for Template,
+   Complement and 2D reads.
+-  **all\_[2D\|Template\|Complement]\_lengths.txt** - for each read of
+   each type:
+
+   -  Column 1: read ID
+   -  Column 2: length of read
+
+-  **all\_\ **[2D\|Template\|Complenent]\_**\ kmers.txt** - for each
+   read of each type:
+
+   -  Column 1: read ID
+   -  Column 2: length of read
+   -  Column 3: number of perfect 15mers
+   -  Column 4: number of perfect 17mers
+   -  Column 5: number of perfect 19mers
+   -  Column 6: number of perfect 21mers
+   -  Column 7: number of perfect 23mers
+   -  Column 8: number of perfect 25mers
+
+-  **all\_[2D\|Template\|Complenent]\_substitutions\_percent.txt** -
+   base substitution table.
+-  all\_\ **[2D\|Template\|Complenent]\_[deletion\|insertion\|substitution]\_[n]mer\_motifs.txt**
+   - deletion/insertion/substitution kmer motifs:
+
+   -  Column 1: kmer
+   -  Column 2: percentage this kmer occurs before error
+
+Within 'analysis', there will be a subdirectory for each reference. In
+each reference subdirectory is:
+
+-  **reference\_[2D\|Template\|Complement]\_alignments.txt** -
+   multi-column files of read-by-read alignment data for each read type.
+   Includes IDs, start and end positions of alignment, bases covered,
+   longest perfect kmer, mean perfect kmer etc. Header line provides
+   details.
+-  **reference\_[2D\|Template\|Complement]\_all\_perfect\_kmers.txt**
+
+   -  Column 1: kmer size
+   -  Column 2: number of perfect kmers of size across all reads
+
+-  **reference\_[2D\|Template\|Complement]\_best\_perfect\_kmers.txt**
+
+   -  Column 1: kmer size
+   -  Column 2: number of reads with best perfect kmer of size
+   -  Column 3: percentage of reads with best perfect kmer of size
+
+-  **reference\_[2D\|Template\|Complement]\_cumulative\_perfect\_kmers.txt**
+
+   -  Column 1: kmer size
+   -  Column 2: number of reads with best perfect kmer of size or
+      greater
+   -  Column 3: percentage of reads with best perfect kmer of size or
+      greater
+
+-  **reference\_[2D\|Template\|Complement]\_coverage.txt**
+
+   -  Column 1: position on reference
+   -  Column 2: mean coverage in bin
+
+-  **reference\_[2D\|Template\|Complement]\_deletions.txt**
+
+   -  Column 1: deletion size
+   -  Column 2: percentage of deletions that are this size
+
+-  **reference\_[2D\|Template\|Complement]\_insertions.txt**
+
+   -  Column 1: insertion size
+   -  Column 2: percentage of insertions that are this size
+
+-  **reference\_gc.txt**
+
+   -  Column 1: position
+   -  Column 2: mean GC percentage for bin
+
+-  **reference\_[2D\|Template\|Complement]\_kmers.txt**
+
+   -  Column 1: kmer (5-mer)
+   -  Column 2: Number of times kmer occurs in reference
+   -  Column 3: Percentage of total kmers in reference represented by
+      the kmer
+   -  Column 4: Number of times kmer occurs in the reads
+   -  Column 5: Percentage of total kmers in reads represented by the
+      kmer
--- a/docs/source/docker.rst
+++ b/docs/source/docker.rst
+.. _docker:
+
+Using the Docker image
+======================
+
+A docker image of NanoOK is provided on `Docker
+Hub <https://registry.hub.docker.com/u/richardmleggett/nanook/>`__ which
+includes all the dependencies needed to run. First, you need to have
+installed the Docker Engine.  Then you can pull the NanoOK image::
+
+  docker pull richardmleggett/nanook
+
+To run NanoOK, the easiest way is to run a shell in the NanoOK image
+using::
+
+  docker run -i -t -v /path/to/your/data:/usr/nanopore richardmleggett/nanook bash
+
+From here you will get a prompt from which you can run your NanoOK
+commands, for example::
+
+  nanook extract -s /usr/nanopore/YourSample
+
+When you have finished, type ``exit`` to end Docker.
+
+Notes:
+
+-  In the docker run command, you need to map your data directory to the
+   Docker image. This is done with the ``-v`` option. In the above
+   example, the data on our local machine is in ``/path/to/your/data``
+   and this appears in the Docker image as ``/usr/nanopore``, which is
+   why we specify ``/usr/nanopore/YourSample`` as the sample directory
+   to the nanook command.
+-  If you get an error from the docker command, it may be because you
+   haven't sudo'd it, or added your user to the docker group -
+   see \ `How can I use docker without
+   sudo? <http://askubuntu.com/questions/477551/how-can-i-use-docker-without-sudo>`__
--- a/docs/source/history.rst
+++ b/docs/source/history.rst
+Change history
+==============
+
+**1.25 (7 June 2017)**
+
+-  Added GraphMap support.
+-  Fix for trailing / on -f option.
+-  Fix for barcoding bug.
+
+**1.22 (5 May 2017)**
+
+-  Fixes for comparison mode.
+-  Slurmit script added for NanoOK RT.
+
+**1.20 (13 Apr 2017)**
+
+-  Better Albacore support.
+-  New -minquality option for filtering pass/fail reads.
+
+**1.17 (30 Mar 2017)**
+
+-  Detection of albacore directory structure.
+
+**1.15 (17 Mar 2017)**
+
+-  Auto-detection of directory structure - barcodes, batch\_ etc. 
+
+**1.14 (14 Mar 2017)**
+
+-  Updates to support MinKNOW 1.4.2 directory structure.
+-  Fixed bug in R graph plotting.
+-  Better error checking in R scripts.
+-  Option to merge reads into single file.
+
+**0.95 (2 Nov 2016)**
+
+-  Fixed issues with 1D report generation.
+-  Added warnings about .sizes file.
+-  New real-time watcher option for BAMBI project. Currently, this is
+   not general purpose, but will be enabled in future release.
+-  Created new Dockerfile and re-built Docker images.
+
+**0.79 (7 Oct 2016)**
+
+-  Added support for barcoded runs.
+
+**0.76 (7 Sep 2016)**
+
+-  Fixed issue with badly formatted reference files.
+-  Fixed issue with grid.edit in R.
+-  More descriptive error messages.
+-  Option for 1D only processing for new rapid kit.
+-  Template only and complement only options.
+-  Updated help text.
+
+**0.72 (13 May 2016)**
+
+-  New option to store original FAST5 path in FASTA output file (for
+   nanopolish).
+-  Fixed issue with alignerparams not being passed through.
+-  Enabled extraction of 2D only reads.
+-  Better detection of old/new style directory structure.
+-  Various bug fixes.
+
+**0.62 (3 Dec 2015)**
+
+-  Fixed extract bug that had been introduced with previous version.
+
+**0.61 (27 Nov 2015)**
+
+-  Had to roll back from using HDF5 library due to cross-platform JNI
+   issues.
+-  Otherwise, all functionality as of 0.60, including use of
+   NANOOK\_DIR.
+
+**0.60 (26 Nov 2015)**
+
+-  Added support for Metrichor changes to FAST5 output format.
+-  Added support for multiple analyses in 1 file
+   (i.e. /Analyses/Basecall\_2D\_XXX). New option -basecallindex to
+   support it, but default behaviour is latest (highest numbered
+   analysis).
+-  Moved from using HDF5 command line tool to using HDF5 Java library.
+-  Replaced the NANOOK\_SCRIPT\_DIR environment variable with a
+   NANOOK\_DIR one and slightly changed installation process.
--- a/docs/source/howitworks.rst
+++ b/docs/source/howitworks.rst
+How NanoOK works
+================
+
+How NanoOK deals with alignments
+--------------------------------
+
+If running with multiple reference sequences, a single query sequence
+may produce 1 or more alignments to 1 or more references. NanoOK adopts
+the following approach to assign reads to references:
+
+#. Sort alignments in order of score. The read belongs to the reference
+   with highest score.
+#. Then merge any other alignments that align to the same reference as
+   the highest scoring alignment, in order of score.
+#. Any sections of these subsequent alignments that overlap with already
+   merged alignments are discarded.
+
+Where the highest score is shared by two or more identically scoring
+alignments, NanoOK choses one of them at random. **This can result in
+very slight changes in alignment figures reported**. If you wish
+deterministic behaviour, specify the ``-deterministic`` parameter.
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
+.. NanoOK documentation master file, created by
+   sphinx-quickstart on Fri Sep  1 11:15:43 2017.
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+
+NanoOK
+======
+
+.. toctree::
+   :hidden:
+   
+   installation
+   docker
+   virtualbox
+   running
+   report
+   howitworks
+   parserapi
+   datafiles
+   commonerrors
+   tutorial
+   history
+   nanookrt
+   reporter
+
+NanoOK (pronounced na-nook) is a tool for extraction, alignment and analysis of Nanopore reads. NanoOK will extract reads as FASTA or FASTQ files, align them (with a choice of alignment tools), then generate a comprehensive multi-page PDF report containing yield, accuracy and quality analysis. Along the way, it generates plain text files which can be used for further analysis, as well as graphs suitable for inclusion in presentations and papers.
+
+NanoOK has a number of dependencies - Perl, LaTeX, R and an alignment tool - which means it works best on Linux and Mac OS platforms. 
+
+Further information:
+
+* To find out how to install NanoOK, see the :ref:`Download and installation page <installation>`.
+* For further information on NanoOK RT, see :ref:`these comments <nanookrt>`.
+* To find out how to run NanoOK, see the :ref:`Running NanoOK page <running>` or the :ref:`NanoOK tutorial page <tutorial>`.
+* Source code is on `GitHub <https://github.com/TGAC/NanoOK>`_.
+* Here's some information `about the other Nanook <http://en.wikipedia.org/wiki/Nanook>`_.
+
+Paper
+=====
+Leggett RM, Heavens D, Caccamo M, Clark MD, Davey RP (2016). `NanoOK: multi-reference alignment analysis of nanopore sequencing data, quality and error profiles <https://doi.org/10.1093/bioinformatics/btv540>`_. Bioinformatics 32(1):142–144.
+
+Talks and posters
+=================
+* Richard Leggett presented a poster at AGBT 2016 - `here it is <http://f1000research.com/posters/5-176>`_.
+* Richard Leggett spoke at Genome Science 2015 - `here are the slides <http://f1000research.com/slides/4-717>`_.
+* Richard Leggett spoke at the London Calling Nanopore conference - `here are his slides <http://documentation.tgac.ac.uk/download/attachments/7209095/RichardLeggett_LondonCalling2015.pdf?version=1&modificationDate=1431700116000&api=v2>`_.
+* Robert Davey presented a poster at AGBT 2015 - `here is the PDF <http://documentation.tgac.ac.uk/download/attachments/7209095/AGBT2015_NanoOK.pdf?version=1&modificationDate=1425471330000&api=v2>`_.
+
+Follow us
+=========
+You can follow NanoOK updates on twitter `@NanoOK_Software <https://twitter.com/nanook_software>`_.
+
+Or if you would like to be on a NanoOK mailing list to receive information about updates, please email richard.leggett@earlham.ac.uk.
+