Andreas Tille · Andreas Tille · Andreas Tille · Andreas Tille · Andreas Tille · Andreas Tille
--- a/README.md
+++ b/README.md

 ## The FASTA package - protein and DNA sequence similarity searching and alignment programs

-The **FASTA** (pronounced FAST-Aye, not FAST-Ah) programs are a
-comprehensive set of similarity searching and alignment programs for
-searching protein and DNA sequence databases.  Like the **BLAST** programs `blastp` and `blastn`, the `fasta` program itself uses a rapid heuristic strategy for finding similar regions in protein and DNA sequences.  But in
-addition to heuristic similarity searching, the FASTA package provides
-programs for rigorous local (`ssearch`) and global (`ggsearch`)
-similarity searching, as well as a program for finding non-overlapping
-sequence similarities (`lalign`).  Like BLAST, the FASTA package also
-includes programs for aligning translated DNA sequences against
-proteins (`fastx`, `fasty` are equivalent to `blastx`, `tfastx`,
-`tfasty` are similar to `tblastn`).
-
-####December, 2017
-The current FASTA version is fasta-36.3.8f, Dec. 2017
+The **FASTA** (pronounced FAST-Aye, not FAST-Ah) programs are a comprehensive set of similarity searching and alignment programs for searching protein and DNA sequence databases.  Like the **BLAST** programs `blastp` and `blastn`, the `fasta` program itself uses a rapid heuristic strategy for finding similar regions in protein and DNA sequences.  But in addition to heuristic similarity searching, the FASTA package provides
+programs for rigorous local (`ssearch`) and global (`ggsearch`) similarity searching, as well as a program for finding non-overlapping sequence similarities (`lalign`).  Like BLAST, the FASTA package also includes programs for aligning translated DNA sequences against proteins (`fastx`, `fasty` are equivalent to `blastx`,  and  `tfastx`, `tfasty` are similar to `tblastn`).
+
+#### March, 2019
+
+An updated release of the FASTA package (`fasta-36.3.8h`) is
+available.  In addition to minor bug fixes, the latest version can
+generate query and library sequences using program scripts.
+
+See doc/README_v36.3.8h.md and doc/readme.v36 for a more complete summary of changes.
+
+#### December, 2018
+
+The latest version of the FASTA package is `fasta-36.3.8h`, Dec. 2018.
+
+See doc/README_v36.3.8h.md for a more complete summary of changes.
+
+#### November, 2018
+
+The current released version of the FASTA package is `fasta-36.3.8h`, Nov. 2018
+
+See doc/README_v36.3.8h.md for a more complete summary of changes.
+
+#### October, 2018
+
+The current version of the FASTA package is fasta-36.3.8g, Oct. 2018
+
+See doc/README_v36.3.8h.md for a more complete summary of changes.
+
+#### April, 2018
+The current version of the FASTA package is fasta-36.3.8g, Apr. 2018
+
+#### December, 2017
+The current FASTA version is fasta-36.3.8g, Dec. 2017

 The statistics routines for normally distributed scores (ggsearch36,
 glsearch36) are more robust to very low E()-value thresholds.

-####Sept, 2017
+#### Sept, 2017
 The current FASTA version is fasta-36.3.8f, Sept. 2017

 If the -S option is used and a query sequence has no upper case
 letters, it is re-read with lower-case letters converted to upper-case.

-####May, 2017
+#### May, 2017
 The current FASTA version is fasta-36.3.8f, May. 2017

 Various bugs in sub-alignment scoring corrected and support for the
-EBI SP:GSTM1_HUMAN P09488 added.  The format for the $SRCH_URL and
-$SRCH_URL2 format strings has changed to enable pairwise alignment.
+EBI SP:GSTM1_HUMAN P09488 added.  The format for the `$SRCH_URL` and
+`$SRCH_URL2` format strings has changed to enable pairwise alignment.

-####September, 2016
+#### September, 2016

 The fasta-36.3.6e version includes a new directory, `psisearch2`, with
 scripts to run iterative PSSM (PSI-BLAST or SSEARCH36) searches using

--- a/bin/README
+++ b/bin/README
+Placeholder file to create destination for program binaries.
--- a/debian/changelog
+++ b/debian/changelog
+fasta3 (36.3.8h-1) UNRELEASED; urgency=medium
+
+  * Team upload.
+  * New upstream version
+  * debhelper-compat 12
+  * Standards-Version: 4.4.0
+  TODO: Do we really need to use non-free smith waterman code?
+        There is a free libssw.  Please contact upstream!
+
+ -- Andreas Tille <tille@debian.org>  Mon, 19 Aug 2019 21:45:02 +0200
+
 fasta3 (36.3.8g-1) unstable; urgency=low

  [ Andreas Tille ]

--- a/debian/compat
+++ b/debian/compat
-9
--- a/debian/control
+++ b/debian/control
@@ -4,9 +4,9 @@ Uploaders: Steffen Moeller <moeller@debian.org>
 Section: non-free/science
 XS-Autobuild: no
 Priority: optional
-Build-Depends: debhelper (>= 9),
+Build-Depends: debhelper-compat (= 12),
               zlib1g-dev
-Standards-Version: 4.1.3
+Standards-Version: 4.4.0
 Vcs-Browser: https://salsa.debian.org/med-team/fasta3
 Vcs-Git: https://salsa.debian.org/med-team/fasta3.git
 Homepage: http://fasta.bioch.virginia.edu

--- a/debian/patches/Makefile.patch
+++ b/debian/patches/Makefile.patch
 Description: Makefile
-Index: fasta3/make/Makefile
-===================================================================
--- fasta3.orig/make/Makefile
-+++ fasta3/make/Makefile
+--- a/make/Makefile
+++ b/make/Makefile
 @@ -34,6 +34,7 @@ THR_SUBS = pthr_subs2
 THR_LIBS = -lpthread
 THR_CC =
@@ -11,10 +9,8 @@ Index: fasta3/make/Makefile
 XDIR = /seqprg/bin
 
 DROPGSW_NA_O = dropgsw2.o wm_align.o calcons_sw.o
-Index: fasta3/make/Makefile.linux64_sse2
-===================================================================
--- fasta3.orig/make/Makefile.linux64_sse2
-+++ fasta3/make/Makefile.linux64_sse2
+--- a/make/Makefile.linux64_sse2
+++ b/make/Makefile.linux64_sse2
 @@ -12,7 +12,8 @@
 
 SHELL=/bin/bash
@@ -22,10 +18,10 @@ Index: fasta3/make/Makefile.linux64_sse2
 -CC = gcc -g -O -msse2
 +CC = gcc
 +CFLAGS = -g -O -msse2 $(CPPFLAGS)
+ LIB_DB=
+ 
 #CC= gcc -pg -g -O -msse2 -ffast-math
- #CC = gcc -g -DDEBUG -msse2
- #CC=gcc -Wall -pedantic -ansi -g -msse2 -DDEBUG
-@@ -24,7 +25,7 @@ CC = gcc -g -O -msse2
+@@ -26,7 +27,7 @@ LIB_DB=
 
 # standard options
 
@@ -34,16 +30,57 @@ Index: fasta3/make/Makefile.linux64_sse2
 # -I/usr/include/mysql -DMYSQL_DB
 # -DSUPERFAMNUM -DSFCHAR="'|'" 
 
-Index: fasta3/make/Makefile36m.common
-===================================================================
--- fasta3.orig/make/Makefile36m.common
-+++ fasta3/make/Makefile36m.common
-@@ -34,7 +34,7 @@ NGETLIB=nmgetlib
- # and "-L/usr/lib64/mysql -lmysqlclient -lz" in LIB_M
- # some systems may also require a LD_LIBRARY_PATH change
- 
-LIB_M= -lm -lz
-+LIB_M= $(LDFLAGS) -lm -lz
- #LIB_M= -L/usr/lib64/mysql -lmysqlclient -lz -lm
- NCBL_LIB=ncbl2_mlib.o
- #NCBL_LIB=ncbl2_mlib.o mysql_lib.o
+--- a/make/Makefile36m.common
+++ /dev/null
+@@ -1,51 +0,0 @@
+-#
+-# $Name:  $ - $Id: Makefile36m.common 1250 2014-01-24 21:33:39Z wrp $
+-#
+-# commands common to all architectures
+-# if your architecture does not support "include", append at the end.
+-#
+-
+-COMP_LIBO=comp_mlib9.o	# reads database into memory for multi-query without delay
+-COMP_THRO=comp_mthr9.o	# threaded version
+-
+-WORK_THRO=work_thr2.o
+-GETSEQO = 
+-
+-# standard nxgetaa, no memory mapping for 0 - 6
+-#LGETLIB=getseq.o lgetlib.o
+-#NGETLIB=nmgetlib
+-
+-# memory mapping for 0FASTA, 5PIRVMS, 6GCGBIN
+-LGETLIB= $(GETSEQO) lgetlib.o lgetaa_m.o
+-NGETLIB=nmgetlib
+-
+-# use ncbl_lib.c for BLAST1.4 support instead of ncbl2_mlib.c
+-#NCBL_LIB=ncbl_lib.o
+-
+-# this option should support both formats (BLAST1.4 not currently supported): 
+-#NCBL_LIB=ncbl_lib.o ncbl2_mlib.o
+-
+-# normally use ncbl2_mlib.c
+-#NCBL_LIB=ncbl2_mlib.o
+-#LIB_M= -lm
+-
+-# this option supports NCBI BLAST2 and mySQL
+-# it requires  "-I/usr/include/mysql -DMYSQL_DB" in CFLAGS
+-# and "-L/usr/lib64/mysql -lmysqlclient -lz" in LIB_M
+-# some systems may also require a LD_LIBRARY_PATH change
+-
+-LIB_M= -lm 
+-#LIB_M= -L/usr/lib64/mysql -lmysqlclient -lm # -lz 
+-NCBL_LIB=ncbl2_mlib.o
+-#NCBL_LIB=ncbl2_mlib.o mysql_lib.o
+-
+-# threaded as _t, serial
+-# include ../make/Makefile.pcom
+-
+-# threaded without _t
+-include ../make/Makefile.pcom_t
+-
+-# serial only 
+-# include ../make/Makefile.pcom_s
+-
+-include ../make/Makefile.fcom
--- a/debian/patches/OVERFLOW.patch
+++ b/debian/patches/OVERFLOW.patch
-Description: OVERFLOW
-Index: fasta3/src/dropnnw2.c
-===================================================================
--- fasta3.orig/src/dropnnw2.c
-+++ fasta3/src/dropnnw2.c
-@@ -575,7 +575,7 @@ void do_work (const unsigned char *aa0,
-      * be rerun with 16 bits. If it is more, and we have tried at least
-      * 500 sequences, we switch off the 8-bit mode.
-      */
-    if (score == OVERFLOW) {
-+    if (score == OVERFLOW_SCORE) {
-       f_str->done_16bit++;
-       if(f_str->done_8bit>500 && (3*f_str->done_16bit)>(f_str->done_8bit))
-         f_str->try_8bit = 0;
--- a/debian/patches/series
+++ b/debian/patches/series
 Makefile.patch
-OVERFLOW.patch
--- a/doc/README_v36.3.8g.md
+++ b/doc/README_v36.3.8g.md
-
-
-## The FASTA package - protein and DNA sequence similarity searching and alignment programs
-
-Changes in **fasta-36.3.8f** released 31-Dec-2017
-
-1. (December, 2017) -- Make statistical thresholds more robust for
-   small E()-values with normally distributed scores (ggsearch36,
-   glsearch36).
-
-2. (September, 2017) Treat all lower-case queries as uppercase with -S option.
-
-3. (May, 2017) Improvements/fixes to sub-alignment scoring strategies.
-
-4. Improvements/fixes to psisearch2 scripts.
-
-For more detailed information, see `doc/readme.v36`.
-
--- a/doc/README_v36.3.8h.md
+++ b/doc/README_v36.3.8h.md
+
+## The FASTA package - protein and DNA sequence similarity searching and alignment programs
+
+Changes in **fasta-36.3.8h** August, 2019
+
+1. Modifications to support makeblastdb format v5 databases. Currently, only simple database reads have been tested.
+
+
+Changes in **fasta-36.3.8h** March, 2019
+
+1. Translation table 1 (`-t 1`) now translates 'TGA'->'U' (selenocysteine).
+
+2. New script for extracting DNA sequences from genomes (`scripts/get_genome_seq.py`).  Currently works with human  (hg38), mouse (mm10), and rat (rn6).
+
+Changes in **fasta-36.3.8h** January, 2019
+
+1. Bug fixes: `fastx`/`tfastx` searches done with the `-t t` option  (which adds a `*` to protein sequences so that termination codons can  be matched), did not work properly with the `VT` series of matrices,  particularly `VT10`.  This has been fixed.
+
+2. New features: Both query and library/subject sequences can be generated by specifying a program script, either by putting a `!` at the start of the query/subject file name, or by specifying library type `9`. Thus, `fasta36 \\!../scripts/get_protein.py+P09488+P30711 /seqlib/swissprot.fa` or `fasta36 "../scripts/get_protein.py+P09488+P30711 9" /seqlib/swissprot.fa` will compare two query sequences, `P09488` and `P30711`, to SwissProt, by downloading them from Uniprot using the `get_protein.py` script (which can download sequences using either Uniprot or RefSeq protein accessions). Often, the leading `!` must be escaped from shell interpretation with `\\!`.
+
+New scripts that return FASTA sequences using accessions or genome coordinates are available in `scripts/`. `get_protein.py`, `get_uniprot.py`, `get_up_prot_iso_sql.py` and `get_refseq.py`. `get_refseq.py` can download either protein or mRNA RefSeq entries. `get_up_prot_iso_sql.py` retrieves a protein and its isoforms from a MySQL database.
+
+`get_genome_seq.py` extracts genome sequences using coordinates from local reference genomes (`hg38` and `mm10` included by default).
+
+Changes in **fasta-36.3.8h** December, 2018
+
+The `scripts/ann_exons_up_www.pl` and `ann_exons_up_sql.pl` now include the option `--gen_coord` which provides the associated genome coordinate (including chromosome) as a feature, indicated by `'<'` (start of exon) and `'>'` (end of exon).
+
+Changes in **fasta-36.3.8h** released November, 2018
+
+**fasta-36.3.8h** provides new scripts and modifications to the   `fasta` programs that normalize the process of merging sub-alignment   scores and region information into both FASTA and BLAST results.  To   move BLASTP towards FASTA with respect to alignment annotation and   sub-alignment scoring:
+
+1. The `blastp_annot_cmd.sh` runs a blast search, finds and scores   domain information for the alignments, and merges this information   back into the blast output `.html` file.  This script uses: 
+
+   1. `annot_blast_btab2.pl --query query.file --ann_script annot_script.pl --q_ann_script annot_script.pl blast.btab_file > blast.btab_file_ann` (a blast tabular file with one or two new fields, an annotation field and (optionally with --dom_info) a raw domain content field.
+   2. `merge_blast_btab.pl --btab blast.btab_file_ann blast.html > blast_ann.html`  (merge the annotations and domain content information in the `blast.btab_file_ann` file together with the standard blast output file to produce annotated alignments.
+   3. In addition, `rename_exons.py` is available to rename exons (later other domains) in the subject sequences to match the exon labeling in the aligned query sequence.
+   4. `relabel_domains.py` can be used to adjust color sets for homologous domains.
+
+2.  There is also an equivalent `fasta_annot_cmd.sh` script that provides similar funtionality for the FASTA programs.  This script does not need to use `annot_blast_btab2.pl` to produce domain subalignment scores (that functionality is provided in FASTA), but it also can use `merge_fasta_btab.pl` and `rename_exons.py` to modify the names of the aligned exons/domains in the subject sequences.
+
+3. To support the independence of the `blastp`/`fasta` output from html annotation, the FASTA package includes some new options:
+
+   1. The `-m 8CBL` option includes query sequence length and subject sequence length in the blast tabular output.  In addition, if domain annotations are available, the raw domain coordinates are provided in an additional field after the annotation/subalignment scoring field.  `-m 8CBl` provides the sequence lengths, but does not add the raw domain coordinates.
+
+   2. The `-Xa` option prevents annotation information from being included in the html output -- it is only available in the `-m 8CB`  (or `-m 8CBL/l`) output
+
+   3. To reduce problems with spaces in script arguements, annotation scripts with spaces separating arguments can use '+' instead of ' '.
+
+   4. The `fasta_annot_cmd.sh` script produces both a conventional alignment on `stdout` and a `-m 8CBL` alignment, which is sent to a separate file, which is separated from the `-m F8CBL` option with a `=`, thus `-m F8CBL=tmp_output.blast_tab`.
+
+Changes in **fasta-36.3.8g** released 23-Oct-2018
+
+1. (Oct. 2018) Improvements to scripts in the `psisearch2/` directory:
+
+   1. `psisearch2/m89_btop_msa2.pl`
+      1. the `--clustal` option produces a "CLUSTALW (1.8)", which is required for some downstream programs
+      2. the `--trunc_acc` option removes the database and accession from identifiers of the form: `sp|P09488|GSTM1_HUMAN` to produce `GSTM1_HUMAN`.
+      3. the `--min_align` option specifies the fraction of the query sequence that must be aligned `(q_end-q_start+1)/q_length)`
+   Together, these changes make it possible for the output of `m89_btop_msa2.pl` to be used by the EMBOSS program `fprotdist`.
+
+   2. A more general implementation of `psisearch2_msa_iter.sh`, which does `psisearch2` one iteration at a time, and a new equivalent `psisearch2_msa_iter_bl.sh`, which uses `psiblast` to do the search.
+
+* (Oct. 2018) A small restructuring of the `make/Makefiles` to remove the `-lz` dependence for non-debugging scripts (and add it back when -DDEBUG is used).
+
+Changes in **fasta-36.3.8g** released 5-Aug-2018
+
+1. (Apr 2018) incorporation of `-t t1` termination codes ("*") in `-m 8CB`, `-m 8CC`, and `-m9C` so that aligned termination codons are indicated as `**` (`-m8CB`) or `*1` (`-m8CC`, `-m9C`).
+
+2. (Mar 2018) Updates to scripts/annot_blast_btop2.pl to provide subalignment scoring for blastp searches (BLOSUM62 only).  (see doc/readme.v36)
+
+3. (Feb. 2018) a new extended option, `-XB`, which causes percent identity, percent similarity, and alignment length to be calculated using the BLAST model, which does not count gaps in the alignment length.
+
+see readme.v36 for other bug fixes.
+
+Changes in **fasta-36.3.8g** released 31-Dec-2017
+
+1. (December, 2017) -- Make statistical thresholds more robust for small E()-values with normally distributed scores (`ggsearch36`,`glsearch36`).
+
+2. (September, 2017) Treat lower-case queries with no upper-case residues as uppercase with `-S` option.
+
+3. (May, 2017) Improvements/fixes to sub-alignment scoring strategies.
+
+4. Improvements/fixes to psisearch2 scripts.
+
+For more detailed information, see `doc/readme.v36`.
+
--- a/doc/changes_v36.html
+++ b/doc/changes_v36.html
@@ -24,28 +24,44 @@ font-size: 12px; font-family: sans-serif; text-decoration:none; background-color
 </small>
 </pre>
 <hr>
-<h2>Latest Updates - FASTA version 36.3.8d (April, 2016)</h2>
+<h2>Latest Updates - FASTA version 36.3.8h (March, 2019)</h2>
 <ol>
+  <li>The FASTA programs have been released under the Apache2.0 Open
+  Source License.  The COPYRIGHT file, and copyright notices in
+  program files, have been updated to reflect this change.
+    <p>
+      <li>
+	fasta-36.3.8h includes bug fixes for translated alignments
+	with termination codons, the ability to use scripts as query
+	and library sequences, and new scripts for extracting genomic
+	DNA sequences given chromosome coordinates.
+      <li>
+	fasta-36.3.8g includes bug fixes for sub-alignment scoring and 
+	psisearch2 scripts, new annotation scripts for exons, and
+	fixes enabling very low statistical thresholds with ggsearch36
+	and glsearch36.
+      <li>
+	fasta-36.3.8e/scripts includes updated scripts for
+	capturing domain and feature annotations using the
+	EBI/proteins API (https://www.ebi.ac.uk/proteins/api/) to get
+	Uniprot annotations and exon locations.
+<p>
+      <li>
+	The <tt>fasta-36.3.8e/psisearch2/</tt> directory now
+	provides <tt>psisearch2_msa.pl</tt>
+	and <tt>psisearch2_msa.py</tt>, functionally identical scripts
+	for iterative searching with <tt>psiblast</tt>
+	or <tt>ssearch36</tt>.  <tt>psisearch2-msa.pl</tt> offers an
+	option, <tt>--query_seed</tt>, that can dramatically reduce
+	false-positives caused by alignment overextension, with very
+	little loss of search sensitivity.
+    <p>
 <li>
 The <tt>fasta-36.3.8d/scripts/</tt> directory now provides a
 script, <tt>annot_blast_btop2.pl</tt> that allows annotations and
 sub-alignment scoring on BLAST alignments that use the tabular format
 with BTOP alignment encoding.
 <p>
-  <li>
-    Bug fixes for overlapping domain domain scoring.  v36.3.7 was not thread-safe.
-  <li>
-    Annotation scripts accessing the Pfam domain database can now use
-    the <tt>--vdoms</tt> option to highlight missing parts of a Pfam
-    domain model. In addtion, domains from clans are labeled as clans
-    unless <tt>--no-clans</tt> is specified.
- </ol>
-<h2>Updates - FASTA version 36.3.7 (November, 2014)</h2>
-<ol>
-  <li>The FASTA programs have been released under the Apache2.0 Open
-  Source License.  The COPYRIGHT file, and copyright notices in
-  program files, have been updated to reflect this change.
-    <p>
 <li>Alignment sub-scoring scripts have been extended to allow
 overlapping domains.  This requires a modified annotation file format.
  The "classic" format placed the beginning and end of a domain on different lines:
@@ -70,7 +86,7 @@ which allows annotations of the form:
 </pre>
 <p>
  <li> New annotation scripts are available in
-  the <tt>fasta-36.3.7/scripts</tt> directory,
+  the <tt>fasta-36.3.8/scripts</tt> directory,
  e.g. <tt>ann_pfam_www_e.pl</tt> (Pfam) and <tt>ann_up_www2_e.pl</tt>
  (Uniprot) to support this new format.  If the domain annotations
  provided by Pfam or Uniprot overlap, then overlapping domains are

--- a/doc/fasta_guide.pdf
+++ b/doc/fasta_guide.pdf
--- a/doc/fasta_guide.tex
+++ b/doc/fasta_guide.tex
@@ -267,28 +267,41 @@ FASTA format files consist of a description line, beginning
 with a '$>$' character, followed by the sequence itself:
 \begin{quote}
 \begin{verbatim}
->sequence name and description 1
+>sequence_name1 and description
 A F A S Y T .... actual sequence.
 F S S       .... second line of sequence.
->sequence name and description 2
+>sequence_name2 and description 
 PMILTYV ... sequence 2
 \end{verbatim}
 \end{quote}
 All of the characters of the description line are read, and special
 characters can be used to indicate additional information about the
-sequence. In general, non-amino-acid/non-nucleotide sequences in the
-sequence lines are ignored.
+sequence. In particular, a \texttt{'@:C 12345'} at the end of the
+description line indicates that the first residue of the sequence has
+coordinate \texttt{'12345'}, instead of starting at \texttt{'1'}.
+Coordinates can be negative; a DNA sequence upstream from the start of
+transcription could be displayed with negative coordinates.
+
+In general, non-amino-acid/non-nucleotide sequences in the sequence
+lines are ignored, with the exception of \texttt{'*'}, which indicates
+a termination codon in a protein sequence, and can be used to indicate
+the match to a termination codon in protein:DNA alignments.

 FASTA format files from major sequence distributors, like the NCBI and
 EBI, have specially formatted description lines, e.g.:\\
 \indent
 \texttt{
->gi|54321|ref|np\_12345| example NCBI refseq sequence\\
+>np\_12345| example NCBI refseq sequence\\
 }
 or\\
 \indent
 \texttt{
->sw:gstm1\_human P01234 glutathione transferase GSTM1 - human\\
+>sp:gstm1\_human P01234 glutathione transferase GSTM1 - human\\
+}
+or
+\indent
+\texttt{
+>sp|P09488|GSTM1\_HUMAN glutathione transferase GSTM1 - human\\
 }

 Several sample test files are included with the FASTA distribution:
@@ -852,7 +865,11 @@ can use \texttt{-m 1 -m 6 -m 9}.
  comments, \texttt{-m 8XC} without comments) and, if available, an
  annotation encoding matching FASTA \texttt{-m 9C} output. All the
  \texttt{-m 9c/C/d/D} encodings are available with BLAST tabular
-  output using \texttt{-m 8C[c/C/d/D]}.
+  output using \texttt{-m 8C[c/C/d/D]}.  In the v36.3.8h release, a
+  new option has been added to \texttt{-m 8CB}, \texttt{-m 8CBL} (or
+  \texttt{-m 8CBl}. The \texttt{L/l} option adds the lengths of the
+  query and subject sequences after the \texttt{seqid}'s to BLAST
+  tabular output, e.g. \texttt{qseqid qlen sseqid slen percid ...}

 \item[\texttt{-m 9}] display alignment coordinates and scores with the
  best score information.  \texttt{-m 9i} provides alignment length,
@@ -926,7 +943,7 @@ while \texttt{-m 9D} would be:
 \texttt{1M1X2M4X2M1X2M7X3M9D1M2X1M4X2M1X1M1X2I1X1M1X1M3X1M2X1I3M1D1X1M2X1M}
 \end{footnotesize}
 \item[\texttt{-m 10}]
-a parseable format for use with other programs.
+a parseable format for use with other programs (this option no longer reliably tested; \texttt{-m 8CBL} is easier to parse and tested more extensively).
 \item[\texttt{-m 11}]
 Provide \texttt{lav}-like output (used by \texttt{lalign}) for graphical output.
 \begin{quote}
@@ -1124,17 +1141,24 @@ not treated as low complexity by the translated alignment
 programs. (There is an option in the \texttt{Makefile},
 \texttt{-DDNALIB\_LC}, to enable preserving case in DNA sequences.)

-\item[\texttt{-t \#}]
-Translation table - fastx36, tfastx36, fasty36, and
-tfasty3 now support the BLAST translation tables.  See
-\url{http://www.ncbi.nih.gov/Taxonomy/Utils/wprintgc.cgi}.
-
-\texttt{-t t} or \texttt{-t t\#} enables the addition of
-an implicit termination codon to a protein:translated DNA match.  That
-is, each protein sequence implicitly ends with \texttt{*}, which
-matches the termination codes for the appropriate genetic code.
-\texttt{-t t\#} sets implicit termination and a different genetic
-code.
+\item[\texttt{-t \#}] Translation table - fastx36, tfastx36, fasty36,
+  and tfasty3 now support the BLAST translation tables.  See
+  \url{http://www.ncbi.nih.gov/Taxonomy/Utils/wprintgc.cgi}.  
+
+  \texttt{-t 1} also enables translation of \texttt{'TGA'} to
+  \texttt{'U'} (seleno-cysteine) (by default, \texttt{'TGA'} is
+  translated to \texttt{'*'}). Because of the ambiguity of the
+  \texttt{'TGA'} codon, translated alignments of \texttt{'TGA'} with
+  \texttt{-t 1} match \texttt{'U'} and \texttt{'*'} (termination)
+  equally well.
+
+\texttt{-t t} enables the addition of an implicit termination codon to
+a protein:translated DNA match.  That is, each protein sequence
+implicitly ends with \texttt{*}, which matches the termination codes
+for the appropriate genetic code.  To change the translation table and
+insert a termination character after each protein sequence, use
+\texttt{-t 1 -t t}.
+
 \item[\texttt{-T \#}]
 set number of threads/workers.  Normally on a multi-core machine, the maximum
 number of processors/cores is used.
@@ -1349,7 +1373,13 @@ A number of rarely used options are now only available as extended options:
 \item[\texttt{X1}] sort output by \texttt{init1} score (for
  compatibility with FASTP; obsolete).

-\item[\texttt{XB}] (Previously \texttt{-B}.)  Show the z-score, rather
+\item[\texttt{XB}] Calculate pecent identity, percent similarity, and
+  alignment using the BLAST model, which excludes gapped residues.
+  This allows very high identity alignments with large gaps to look
+  much closer, but causes the alignment length to drop by the length
+  of the gap.
+
+\item[\texttt{Xb}] (Previously \texttt{-B}.)  Show the z-score, rather
  than the bit-score in the list of best scores (rarely used, provided
  for backward compatibility).

@@ -1795,6 +1825,7 @@ read the libraries in the following formats:\\
 5 & NBRF/PIR VMS (\texttt{>P1;SEQID}/comment/sequence) (obsolete)\\
 6 & GCG (version 8.0) Unix Protein and DNA (compressed)\\
 7 & FASTQ (sequence only, quality ignored)\\
+9 & a script that is executed to produce FASTA format sequences \\ 
 10 & subset format (</slib2/swissprot.lseg 0:2 4|) \\
 11 & NCBI Blast1.3.2 format  (unix only) (obsolete)\\
 12 & NCBI Blast2.0 format\\
@@ -1870,11 +1901,15 @@ remember where the libraries are kept or how they are named.
 \section{Frequently Asked Questions (FAQs)}

 {\noindent}\textbf{Where can I get FASTA?} --
-\url{http://faculty.virginia.edu/wrpearson/fasta} has the latest
-versions of the FASTA programs.  This document describes
-\texttt{\CURRENT}, which is available from
-\url{http://faculty.virginia.edu/wrpearson/fasta/fasta3.tar.gz}.
-In addition, pre-compiled versions of the programs are available for
+
+The most current version of the FASTA source code is available from
+\url{http://github.com/wrpearson/fasta36}.  In addition, you can get
+the programs from \url{http://faculty.virginia.edu/wrpearson/fasta},
+but sometimes there is a lag between the latest release on GITHUB and
+the compiled versions at \url{faculty.virginia.edu}.  This document
+describes \texttt{\CURRENT}, which is available from
+\url{http://faculty.virginia.edu/wrpearson/fasta/fasta3.tar.gz}.  In
+addition, pre-compiled versions of the programs are available for
 MacOSX and Windows.

 \needspace{4\baselineskip}
@@ -1887,7 +1922,7 @@ Query & Library & FASTA pgm. & BLAST pgm. & \\[1.2ex]
 Prot. & Prot. & \texttt{fasta36} & \texttt{blastp} & heuristic local similarity \\
 &  & \texttt{ssearch36} &  & optimal local sim.\\
 &  & \texttt{ggearch36} &  & global:global sim. \\
- &  & \texttt{ggearch36} &  & global:local sim.\\
+ &  & \texttt{glearch36} &  & global:local sim.\\
 DNA & DNA & \texttt{fasta36}$^*$ & \texttt{blastn} & \\[1.2ex]
 \hline \\[-1.0ex]
 Prot. & Prot. & \texttt{lalign36} & & multiple non-intersecting \\
@@ -2029,7 +2064,7 @@ As always, please inform me of bugs as soon as possible.
 \begin{quote}
 William R. Pearson\\
 Department of Biochemistry\\
-Jordan Hall Box 800733\\
+Pinn Hall Box 800733\\
 U. of Virginia\\
 Charlottesville, VA\\
 wrp@virginia.EDU

--- a/doc/readme.md
+++ b/doc/readme.md
+README_v36.3.8h.md
\ No newline at end of file
--- a/doc/readme.v34t0
+++ b/doc/readme.v34t0
@@ -111,7 +111,7 @@ compilation with Sun compiler with Makefile.sun_x86.

 This release provides an extremely efficient SSE2 implementation of
 the Smith-Waterman algorithm for the SSE2 vector instructions written
-by Michael Farrar (farrar.michael@gmail.com).  The SSE code speeds up
+by Michael Farrar.  The SSE code speeds up
 Smith-Waterman 8 - 10-fold in my tests, making it comparable to Eric
 Lindahl's Altivec code for the Apple/IBM G4/G5 architecture.


--- a/doc/readme.v36
+++ b/doc/readme.v36
@@ -6,6 +6,205 @@ multiple high-scoring alignments to be shown, rather than just one.
 This is the main functional difference between FASTA and BLAST -
 BLAST could show multiple HSPs, FASTA did not.

+>>Aug. 9, 2019
+[src/ncbl2_mlib.c, ncbl2_head.h]
+
+Modest extensions made to support reading makeblastdb format v5
+databases. Changes have only been made to read the db.pin file, but
+things work in simple tests.
+
+>July 16, 2019
+[src/comp_lib9.c]
+
+Fixed a memory leak problem when searching with large libraries that
+could be memory mapped (libraries with .xin index files).  If the
+library did not fit in memory, then the kept allocating new memory.
+By default, the largest database that fits in memory must be less than
+16 GB.  Larger libraries will be re-read, which slows down multi-query
+searches considerably.  To increase the size of the library allowed in
+memory, use the option: "-X M32G" to fit 32 GB libraries.
+
+>>Mar. 8, 2019
+[src/initfa.c,faatran.c,dropfx2.c]
+Modify translation table 1 to allow selenocysteine translation
+(TGA->'U'), and modify scoring matrices to give positive scores to
+'*':'U'.  The translation modification ONLY works with "-t 1".  In
+addition, BLAST BTOP alignments (-m 8CB) convert a 'U' aligned with a
+'*' to a '*', so the end of the alignment is '**' rather than 'U*'
+(fastx36) or '*U' (tfastx36).
+
+dropfx2.c (fastx36/tfastx36), dropfz3.c(fasty36/tfasty36) did not
+properly switch protein and translated DNA codes with -m 8CB -- fixed.
+
+version date updated to Mar, 2019
+
+>>Feb. 26, 2019
+[scripts/get_genome_seq.py]
+added get_genome_seq.py as a replacement for get_hg38_bed.py, remove
+get_hg38_bed.py.  'get_genome_seq.py --genome mm10' also produces
+sequences from mouse mm10 (and can now do any genome that bedtools can
+read).
+
+>>Feb. 23, 2019
+[src/comp_lib9.c, mshowbest.c]
+Modify repeat_thresh so that poor alignment scores (E() >
+ppst->e_cut_r, typically -E-threshold/10.0) do not look for additional
+alignments.
+
+>>Feb. 21, 2019
+[src/nmgetaa.c, scaleswn.c, scripts/get_protein.py, get_hg38_bed.py]
+
+Modify nmgetaa.c to ignore ':'s (for sequence subsets) in scripts.
+The script can do the subsetting.  Modify scripts/get_protein.py to
+provide subsetting.  Add scripts/get_hg38_bed.py to extract fasta
+sequences using the format "chr2:123456-543210"
+
+Modify scaleswn.c to estimate Altshul-Gish parameters when gap and
+extension do not match exactly.
+
+>>Feb. 6, 2019
+[src/compacc2e.c, nmgetaa.c]
+modify build_link_data() to allow '+' for space in scripts.  Ensure
+that lib_type is properly initialized (open_lib.c()).
+
+>>Jan. 23, 2019
+[nmgetaa.c]
+Fix bug introduced when checking for lib_type.
+
+>>Jan. 15, 2019
+[src/upam.h, altlib.h, nmgetaa.c]
+[scripts/rename_exons.py, map_exons_coords.py, get_uniprot.py, get_refseq.py, get_proteins.py]
+
+Bug fixes: The VT10, VT20, etc scoring matrices did not have scores for '*:*'
+alignments, used with FASTX/TFASTX for extending alignments through
+the termination codon.  As a result, searchs with '-t t' did not
+extend through the termination codon, even though they should have.
+This has been fixed.
+
+Enhancements: FASTA can now download both query and library sequences using a script, by specifying file type 9.  Thus:
+
+fasta36 "../scripts/get_uniprot.py+P09488 9" /seqlib/swissprot.fasta
+
+Will run the script "get_uniprot.py" with the argument "P09488" and
+use the output of the script as the query sequence.  In this example,
+the library type (9) is specified by the " 9" (this space cannot be
+replaced with a '+' character).
+
+Alternatively, library type '9' can be specified by putting a '!' before the script file name.
+
+fasta36 \!../scripts/get_uniprot.py+P09488 /seqlib/swissprot.fasta
+
+Scripts can be used to produce query or library sequences, or both.
+Three scripts that download sequences from the NCBI and Uniprot have
+been added in the "scripts" directory: "get_uniprot.py" takes Uniprot
+accessions as arguments, "get_refseq.py" takes refseq accessions
+(protein or mRNA), and "get_protein.py" gets both Uniprot and RefSeq
+protein sequences.
+
+rename_exons.py and map_exons_coords.py can take annotated BTOP
+alignments with genome coordinates and map exons to the alternative
+genome.
+
+>>Jan. 2, 2019
+[src/mshowbest.c]
+Fix problems with site annotation when dom_info is provided with -m8CBL
+[scripts/ann_exons_up_sql.pl, ann_exons_up_www.pl]
+Make scripts more robust to missing chromosome information,
+reverse-strand coordinates.
+
+>>Dec. 11, 2018
+[scripts/ann_exons_up_www.pl, ann_exons_up_sql.pl]
+Add the option "--gen_coord" to report exon start ('<') and end ('>')
+genome coordinates features of exons.
+
+>>Nov. 14, 2018
+[scripts/rename_exons.py, relabel_domains.py, compacc2e.c]
+
+Two new scripts, rename_exons.py and relabel_domains.py, that take a
+blast tabular output file with domain alignment annotations (and
+possibly raw domain information) and modifies the names
+(rename_exons.py) or colors (relabel_domains.py).  rename_exons.py
+takes the exon numbering associated with the query sequence and maps
+it onto the subject alignments.  relabel_domains.py can be used to use
+different color numbers for homologous and non-homologous domains.
+
+Both of these programs modify blast tabular output files, which can
+then be merged back into an alignment display using
+merge_blastp_annot.pl or merge_fasta_annot.pl.
+
+compacc2.c:build_link_data() has been modified to convert '+' in the
+script string to ' ', to allow passing command line options.  A space
+in the script string is used to separate the script from the library
+type of the file returned by the script.
+
+>>Nov. 6-7, 2018
+[doinit.c, mshowbest.c, mshowalign2.c, defs.h, structs.h]
+
+(a) Add options to provide query and subject sequence lengths and raw
+domain coordinates in BLASTP tabular output with the options -m 8CBl
+and -m 8CBL.  If domain annotations are available, -m 8CBL also
+provides the raw domain coordinates (not just those included in the
+alignment) in the form |DX:1-100;C=PF12345|XD:1-100;C=PF12345 where
+|DX a query annotation and |XD indicates a subject annotation.  -m
+8CBl (lower-case L) shows the sequence lengths, but not the raw domain
+info.
+
+(b) parse the annotation program strings so that '+' are converted to
+' '.  This greatly simplifies passing arguments to the annotation scripts.  Thus:
+
+-V \!ann_pfam_sql.pl --db=pfam31 --neg --vdoms  can be written as:
+-V \!ann_pfam_sql.pl+--db=pfam31+--neg+--vdoms  (likewise for -V q\!ann_pfam...)
+
+(c) provide an option to remove region/feature annotations from non-m8
+(blast-tabular) output.  This simplifies the process of using
+scripts/merge_fasta_btab.pl to use .bl_tab (-m 8CBL) files to inject
+sub-alignment scores and domain information.
+
+>>Nov. 1, 2018
+[doinit.c]
+Allow -m F#=file.name in addition to -m "F# file.name" to address
+problems I had with spaces in shell scripts.
+
+>>Oct. 23, 2018 [re-released as fasta-36.3.8g]  (see README_v36.3.8g.md)
+[make/Makefiles*,psisearch2/m89_btop_msa2.pl]
+
+Add options to psisearch2/m89_btop_msa2.pl to provide clustalw header
+(--clustal), require a minimum coverage of the query sequence
+(--min_align 0.8), and edit sequence identifiers to remove database
+and accession (--trunc_acc).
+
+Remove -lz dependency from non-debug Makefiles.
+
+>>Aug. 5, 2018  [re-released as fasta-36.3.8g]
+[lib_sel.c]
+Make lib_select.c more robust to missing indirect name files.
+[scripts/ann*.pl]
+update various annotation scripts to use https:// instead of http://
+
+>>April 3, 2018
+[initfa.c, comp_lib.c, dropfx2.c]
+Changes to (a) ensure that the "-t t" option correctly inserts and
+aligns a termination codon '*'. (a) changes to -m 8CB, -m8CC, and -m9C
+so that aligned termination codons are indicated as "**" (-m8CB) or
+"*1" (-m8CC, -m9C).
+
+>>Mar. 9, 2018
+[scripts/annot_blast_btop2.pl, merge_blast_btab.pl, blastp_annot_cmd.sh]
+Code is now in place to provide sub-alignment scoring using domain
+annotations with blastp searches (BLOSUM62 only).  blastp_annot_cmd.sh
+runs blast and produces both a standard HTML and a tabular output
+file.  It then runs annot_blast_btop2.pl to add sub-alignment scoring
+to the tabular ouput file, and then merge_blast_btab.pl merges the
+domain-annotated blast tabular file with the HTML output file.  When
+combined in this way, the FASTA web server (fasta.bioch.virginia.edu)
+can produce blastp searches with domain highlights/scoring.
+
+>>Feb. 6, 2018
+[initfa.c, doinit.c, mshowbest.c, mshowalign2.c]
+Add a new extended option, -XB, which causes percent identity, percent
+similarity, and alignment length to be presented using the BLAST
+model, which does not count gaps in the alignment length.
+
 >>Dec. 30, 2017  [released as fasta-36.3.8g]
 [scaleswn.c]
 Replace np_to_z() with np1_to_z(), which does not substract low

--- a/make/Makefile.linux
+++ b/make/Makefile.linux
-Makefile.linux64_sse2
\ No newline at end of file
--- a/make/Makefile.linux
+++ b/make/Makefile.linux
+# $ Id: $
+#
+# makefile for fasta3, fasta3_t Use Makefile.mpi for fasta36_mpi
+#
+# This file is designed for 64-bit Linux systems using an X86
+# architecture with SSE2 extensions.  -D_LARGEFILE64_SOURCE and
+# -DBIG_LIB64 require a 64-bit linux system.
+# SSE2 extensions are used for ssearch35(_t)
+#
+# Use Makefile.linux32_sse2 for 32-bit linux x86
+#
+
+SHELL=/bin/bash
+
+CC = gcc -g -O -msse2
+LIB_DB=
+
+#CC= gcc -pg -g -O -msse2 -ffast-math
+#CC = gcc -g -DDEBUG -msse2
+#CC=gcc -Wall -pedantic -ansi -g -msse2 -DDEBUG
+
+# EBI uses the following with pgcc, -O3 does not work:
+# CC= pgcc -O2 -pipe -mcpu=pentiumpro -march=pentiumpro -fomit-frame-pointer
+
+# this file works for x86 LINUX
+
+# standard options
+
+CFLAGS= -DSHOW_HELP -DSHOWSIM -DUNIX -DTIMES -DHZ=100 -DMAX_WORKERS=8 -DTHR_EXIT=pthread_exit  -DM10_CONS  -D_REENTRANT -DHAS_INTTYPES -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DUSE_FSEEKO -DSAMP_STATS -DPGM_DOC -DUSE_MMAP  -D_LARGEFILE64_SOURCE  -DBIG_LIB64
+# -I/usr/include/mysql -DMYSQL_DB
+# -DSUPERFAMNUM -DSFCHAR="'|'" 
+
+#
+#(for mySQL databases)  (also requires change to Makefile36m.common or use of Makefile36m.common_mysql)
+# run 'mysql_config' so find locations of mySQL files
+
+LIB_M = -lm
+# for mySQL databases
+# LIB_M = -L/usr/lib64/mysql -lmysqlclient -lm
+
+HFLAGS= -o
+NFLAGS= -o
+
+# for Linux
+THR_SUBS = pthr_subs2
+THR_LIBS = -lpthread
+THR_CC =
+
+BIN = ../bin
+XDIR = /seqprg/bin
+#XDIR = ~/bin/LINUX
+
+# set up files for SSE2/Altivec acceleration
+#
+include ../make/Makefile.sse_alt
+
+# SSE2 acceleration
+#
+DROPGSW_O = $(DROPGSW_SSE_O)
+DROPLAL_O = $(DROPLAL_SSE_O)
+DROPGNW_O = $(DROPGNW_SSE_O)
+DROPLNW_O = $(DROPLNW_SSE_O)
+
+# renamed (fasta36)  programs
+include ../make/Makefile36m.common
+# conventional (fasta3) names
+# include ../make/Makefile.common
--- a/make/Makefile.linux32
+++ b/make/Makefile.linux32
@@ -13,9 +13,11 @@ SHELL=/bin/bash

 #CC= gcc -g -O
 #CC = gcc -g -DDEBUG
+#LIB_DB=

 #CC=gcc -Wall -pedantic -ansi -g -O
 CC= /usr/local/parasoft/bin/insure -g -DDEBUG
+LIB_DB=-lz

 # EBI uses the following with pgcc, -O3 does not work:
 # CC= pgcc -O2 -pipe -mcpu=pentiumpro -march=pentiumpro -fomit-frame-pointer

--- a/make/Makefile.linux32_sse2
+++ b/make/Makefile.linux32_sse2
@@ -13,9 +13,11 @@
 SHELL=/bin/bash

 CC= gcc -g  -O -msse2 -ffast-math
+LIB_DB=
 #CC = gcc -g -DDEBUG -msse2

 #CC= /usr/local/parasoft/bin/insure -g -DDEBUG
+#LIB_DB=-lz

 #CC=gcc -Wall -pedantic -ansi -g -O