Skip to content
Commits on Source (6)
ugene (1.31.1+dfsg1-1) UNRELEASED; urgency=medium
* Use UGENE_USE_BUNDLED_ZLIB=1
* Remove psipred from upstream source - it is excluded from build anyway
* d/watch: dversionmangle=auto (to accept +dfsg[0-9])
* Add external plugins to Recommends as far as these are packaged
-- Andreas Tille <tille@debian.org> Fri, 01 Feb 2019 09:14:17 +0100
ugene (1.31.1+dfsg-1) unstable; urgency=medium
[ Steffen Moeller ]
......
......@@ -32,6 +32,34 @@ Architecture: any
Depends: ${shlibs:Depends},
ugene-data,
${misc:Depends}
Recommends: kraken,
hmmer2
kalign,
muscle,
primer3,
r-base,
bedtools,
ncbi-blast+,
bowtie,
bowtie2,
bwa,
clustalo,
clustalw,
cutadapt,
fastqc,
hmmer,
default-jre-headless,
macs,
mafft,
mrbayes,
phyml,
samtools,
spades,
t-coffee,
tophat,
trimmomatic,
vcftools
Suggests: cufflinks
Description: integrated bioinformatics toolkit
Unipro UGENE is a cross-platform visual environment for DNA and protein
sequence analysis. UGENE integrates the most important bioinformatics
......
......@@ -16,6 +16,10 @@ Files-Excluded: */src/libs_3rdparty/zlib
src/plugins_3rdparty/phylip/src/n*
src/plugins_3rdparty/phylip/src/p*
src/plugins_3rdparty/phylip/src/s*
src/plugins_3rdparty/psipred/src/s*
src/plugins_3rdparty/psipred/src/LICENSE
src/plugins_3rdparty/psipred/datafiles
src/plugins_3rdparty/psipred/psipred.license
Files: *
Copyright: © 2008-2015 UniPro <ugene@unipro.ru>
......@@ -56,88 +60,6 @@ License: PD
The software is provided as PUBLIC DOMAIN
(see http://www.drive5.com/muscle/license.htm)
Files: src/plugins_3rdparty/psipred/src/seq2mtx.cpp
src/plugins_3rdparty/psipred/src/sspred_avpred.cpp
src/plugins_3rdparty/psipred/src/sspred_hmulti.cpp
Copyright: © 2000 D.T. Jones <dtj@cs.ucl.ac.uk>
License: non-free_in_clause_8
BY USING THE PROGRAM YOU ARE ACKNOWLEDGING THE FACT THAT YOU AGREE TO
THE TERMS OUTLINED IN THIS AGREEMENT. USERS WISHING TO USE THE SOFTWARE
FOR COMMERCIAL ACTIVITIES NOT COVERED BY THIS AGREEMENT SHOULD SEND
E-MAIL TO: dtj@cs.ucl.ac.uk
.
* NOTE RECENT CHANGES TO PARAGRAPH 8 *
.
In regard to the protein structure prediction program (PSIPRED2)
herewith (the Software) the copyright and other intellectual property
rights to which belong to the Author(s).
.
Any user (the User) of the program undertakes to the Copyright holder that he
or she shall be bound by the following terms and conditions:
.
1. The User will receive the Software and any related documentation in
confidence and will not use the same except for the purpose of their own
research. The Software will be used only by such of the User's officers or
employees to whom it must reasonably be communicated to enable them to
undertake their research and who agree to be bound by the same confidence.
The User shall procure and enforce such agreement from his or her staff for
the benefit of the Copyright holder.
.
2. The publication of research using the Software must include an
appropriate citation to the method:
.
Jones, D.T. (1999) Protein secondary structure prediction based on
position-specific scoring matrices. J. Mol. Biol. 292:195-202.
.
3. All forms of the Software will be kept in a reasonably secure place to
prevent unauthorised access.
.
4. Each copy of the Software or, if not practicable then, any package
associated therewith shall be suitably marked (and such marking maintained)
with the following copyright notice: "Copyright 2000 D.T.Jones. All Rights
Reserved.".
.
5. The Software may be modified, but any changes made shall be communicated
to the Author(s) and made freely available.
.
6. The Software may not be sold as a standalone package, or incorporated into
a commercial software package without the written permission of the Copyright
holder. The Software may be used freely for individual academic or commercial
research. The Software may also be made freely available for training or
teaching purposes.
.
7. The results produced by the Software may not be incorporated into any
data banks or databases which are subject to the payment of access or
license fees without the written permission of the Copyright holder.
.
8. The Software may be made available to users over a local network or
wide area network (including the Internet), but only if access is granted
free of charge to all authorised users. Incorporation of the Software into
a commercial Web site or other fee paying service is not allowed without
the written permission of the Copyright holder. If PSIPRED results are
returned to the user via such a network service, then a suitable
acknowledgement of the PSIPRED method must be returned somewhere in the
output text.
.
9. The confidentiality obligation in paragraph one shall not apply:
.
(i) to information and data known to the User at the time of
receipt hereunder (as evidenced by its written records);
.
(ii) to information and data which was at the time of receipt in the
public domain or thereafter becomes so through no wrongful act of
the User;
.
(iii) to information and data which the User receives from a third
party not in breach of any obligation of confidentiality owed to
the Author(s).
.
10. The User understands that the Software is supplied "as is". No warranty
as to its fitness or suitability for any purpose whatsoever is made or
implied. In no event shall the Author(s) or Copyright holder be held
responsible for any direct or indirect damages arising through the use
of the Software.
Files: src/plugins_3rdparty/primer3/src/primer3_core/*
Copyright: © 1996-2007 Whitehead Institute for Biomedical Research, Steve Rozen
(http://jura.wi.mit.edu/rozen), and Helen Skaletsky
......
......@@ -13,7 +13,7 @@ export DEB_BUILD_MAINT_OPTIONS = hardening=+all
override_dh_auto_configure:
# exclude non-free plugins
dh_auto_configure -- QMAKE_CFLAGS_ISYSTEM= QMAKE_CXXFLAGS_ISYSTEM= UGENE_WITHOUT_NON_FREE=1 UGENE_LRELEASE=lrelease-qt5 UGENE_LUPDATE=lupdate-qt5
dh_auto_configure -- QMAKE_CFLAGS_ISYSTEM= QMAKE_CXXFLAGS_ISYSTEM= UGENE_WITHOUT_NON_FREE=1 UGENE_LRELEASE=lrelease-qt5 UGENE_LUPDATE=lupdate-qt5 UGENE_USE_BUNDLED_ZLIB=1
find . -name Makefile.* | xargs -r sed -i '/STRIP/d'
......
version=4
opts="repacksuffix=+dfsg,dversionmangle=s/\+(repack|dfsg)//,repack,compression=xz" \
opts="repacksuffix=+dfsg,dversionmangle=auto,repack,compression=xz" \
https://github.com/ugeneunipro/ugene/releases .*/archive/v?@ANY_VERSION@@ARCHIVE_EXT@
This diff is collapsed.
PSIPRED3 - PROTEIN SECONDARY STRUCTURE PREDICTION PROGRAM BY D.T.JONES
GENERAL LICENSE & CONFIDENTIALITY AGREEMENT
In regard to the protein structure prediction program (PSIPRED3)
herewith (the Software) the copyright and other intellectual property
rights to which belong to the Author(s).
Any user (the User) of the program undertakes to the Copyright holder that he
or she shall be bound by the following terms and conditions:-
1. The User will receive the Software and any related documentation in
confidence and will not use the same except for the purpose of their own
research. The Software will be used only by such of the User's officers or
employees to whom it must reasonably be communicated to enable them to
undertake their research and who agree to be bound by the same confidence.
The User shall procure and enforce such agreement from his or her staff for
the benefit of the Copyright holder.
2. The publication of research using the Software must include an
appropriate citation to the method:
Jones, D.T. (1999) Protein secondary structure prediction based on
position-specific scoring matrices. J. Mol. Biol. 292:195-202.
3. All forms of the Software will be kept in a reasonably secure place to
prevent unauthorised access.
4. Each copy of the Software or, if not practicable then, any package
associated therewith shall be suitably marked (and such marking maintained)
with the following copyright notice: "Copyright 2000 D.T.Jones. All Rights
Reserved.".
5. The Software may be modified, but any changes made shall be communicated
to the Author(s) and made freely available.
6. The Software may not be sold as a standalone package, or incorporated into
a commercial software package without the written permission of the Copyright
holder. The Software may be used freely for individual academic or commercial
research. The Software may also be made freely available for training or
teaching purposes.
7. The results produced by the Software may not be incorporated into any
data banks or databases which are subject to the payment of access or
license fees without the written permission of the Copyright holder.
8. The Software may be made available to users over a local network or
wide area network (including the Internet), but only if access is granted
free of charge to all authorised users. Incorporation of the Software into
a commercial Web site or other fee paying service is not allowed without
the written permission of the Copyright holder. If PSIPRED results are
returned to the user via such a network service, then a suitable
acknowledgement of the PSIPRED method must be returned somewhere in the
output text.
9. The confidentiality obligation in paragraph one shall not apply:
(i) to information and data known to the User at the time of
receipt hereunder (as evidenced by its written records);
(ii) to information and data which was at the time of receipt in the
public domain or thereafter becomes so through no wrongful act of
the User;
(iii) to information and data which the User receives from a third
party not in breach of any obligation of confidentiality owed to
the Author(s).
10. The User understands that the Software is supplied "as is". No warranty
as to its fitness or suitability for any purpose whatsoever is made or
implied. In no event shall the Author(s) or Copyright holder be held
responsible for any direct or indirect damages arising through the use
of the Software.
PLEASE READ THE FOLLOWING LICENSE AGREEMENT. BY USING THE PROGRAM YOU ARE
ACKNOWLEDGING THE FACT THAT YOU AGREE TO THE TERMS OUTLINED IN THIS
AGREEMENT. USERS WISHING TO USE THE SOFTWARE FOR COMMERCIAL ACTIVITIES
NOT COVERED BY THIS AGREEMENT SHOULD SEND E-MAIL TO: dtj@cs.ucl.ac.uk
* NOTE RECENT CHANGES TO PARAGRAPH 8 *
PSIPRED2 - PROTEIN SECONDARY STRUCTURE PREDICTION PROGRAM BY D.T.JONES
----------------------------------------------------------------------
GENERAL LICENSE &
-----------------
CONFIDENTIALITY AGREEMENT
-------------------------
In regard to the protein structure prediction program (PSIPRED2)
herewith (the Software) the copyright and other intellectual property
rights to which belong to the Author(s).
Any user (the User) of the program undertakes to the Copyright holder that he
or she shall be bound by the following terms and conditions:-
1. The User will receive the Software and any related documentation in
confidence and will not use the same except for the purpose of their own
research. The Software will be used only by such of the User's officers or
employees to whom it must reasonably be communicated to enable them to
undertake their research and who agree to be bound by the same confidence.
The User shall procure and enforce such agreement from his or her staff for
the benefit of the Copyright holder.
2. The publication of research using the Software must include an
appropriate citation to the method:
Jones, D.T. (1999) Protein secondary structure prediction based on
position-specific scoring matrices. J. Mol. Biol. 292:195-202.
3. All forms of the Software will be kept in a reasonably secure place to
prevent unauthorised access.
4. Each copy of the Software or, if not practicable then, any package
associated therewith shall be suitably marked (and such marking maintained)
with the following copyright notice: "Copyright 2000 D.T.Jones. All Rights
Reserved.".
5. The Software may be modified, but any changes made shall be communicated
to the Author(s) and made freely available.
6. The Software may not be sold as a standalone package, or incorporated into
a commercial software package without the written permission of the Copyright
holder. The Software may be used freely for individual academic or commercial
research. The Software may also be made freely available for training or
teaching purposes.
7. The results produced by the Software may not be incorporated into any
data banks or databases which are subject to the payment of access or
license fees without the written permission of the Copyright holder.
8. The Software may be made available to users over a local network or
wide area network (including the Internet), but only if access is granted
free of charge to all authorised users. Incorporation of the Software into
a commercial Web site or other fee paying service is not allowed without
the written permission of the Copyright holder. If PSIPRED results are
returned to the user via such a network service, then a suitable
acknowledgement of the PSIPRED method must be returned somewhere in the
output text.
9. The confidentiality obligation in paragraph one shall not apply:
(i) to information and data known to the User at the time of
receipt hereunder (as evidenced by its written records);
(ii) to information and data which was at the time of receipt in the
public domain or thereafter becomes so through no wrongful act of
the User;
(iii) to information and data which the User receives from a third
party not in breach of any obligation of confidentiality owed to
the Author(s).
10. The User understands that the Software is supplied "as is". No warranty
as to its fitness or suitability for any purpose whatsoever is made or
implied. In no event shall the Author(s) or Copyright holder be held
responsible for any direct or indirect damages arising through the use
of the Software.
/* seq2mtx - convert single sequence to pseudo IMPALA mtx file */
/* Copyright (C) 2000 D.T. Jones */
#include <QTemporaryFile>
#include <QTextStream>
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <math.h>
#include <string.h>
#ifdef Q_OS_WIN
#pragma warning(disable: 4996)
#endif
#include "sspred_utils.h"
#define MAXSEQLEN 65536
// #define FALSE 0
// #define TRUE 1
#define SQR(x) ((x)*(x))
#define MIN(x,y) (((x)<(y))?(x):(y))
#define MAX(x,y) (((x)>(y))?(x):(y))
const char *rescodes = "ARNDCQEGHILKMFPSTWYVBZX";
/* BLOSUM 62 */
const short aamat[23][23] =
{
{4, -1, -2, -2, 0, -1, -1, 0, -2, -1, -1, -1, -1, -2, -1, 1, 0, -3, -2, 0, -2, -1, 0},
{-1, 5, 0, -2, -3, 1, 0, -2, 0, -3, -2, 2, -1, -3, -2, -1, -1, -3, -2, -3, -1, 0, -1},
{-2, 0, 6, 1, -3, 0, 0, 0, 1, -3, -3, 0, -2, -3, -2, 1, 0, -4, -2, -3, 3, 0, -1},
{-2, -2, 1, 6, -3, 0, 2, -1, -1, -3, -4, -1, -3, -3, -1, 0, -1, -4,
-3, -3, 4, 1, -1},
{0, -3, -3, -3,10, -3, -4, -3, -3, -1, -1, -3, -1, -2, -3, -1, -1, -2,
-2, -1, -3, -3, -2},
{-1, 1, 0, 0, -3, 5, 2, -2, 0, -3, -2, 1, 0, -3, -1, 0, -1, -2,
-1, -2, 0, 3, -1},
{-1, 0, 0, 2, -4, 2, 5, -2, 0, -3, -3, 1, -2, -3, -1, 0, -1, -3,
-2, -2, 1, 4, -1},
{0, -2, 0, -1, -3, -2, -2, 6, -2, -4, -4, -2, -3, -3, -2, 0, -2, -2,
-3, -3, -1, -2, -1},
{-2, 0, 1, -1, -3, 0, 0, -2, 8, -3, -3, -1, -2, -1, -2, -1, -2, -2,
2, -3, 0, 0, -1},
{-1, -3, -3, -3, -1, -3, -3, -4, -3, 4, 2, -3, 1, 0, -3, -2, -1, -3,
-1, 3, -3, -3, -1},
{-1, -2, -3, -4, -1, -2, -3, -4, -3, 2, 4, -2, 2, 0, -3, -2, -1, -2,
-1, 1, -4, -3, -1},
{-1, 2, 0, -1, -3, 1, 1, -2, -1, -3, -2, 5, -1, -3, -1, 0, -1, -3,
-2, -2, 0, 1, -1},
{-1, -1, -2, -3, -1, 0, -2, -3, -2, 1, 2, -1, 5, 0, -2, -1, -1, -1,
-1, 1, -3, -1, -1},
{-2, -3, -3, -3, -2, -3, -3, -3, -1, 0, 0, -3, 0, 6, -4, -2, -2, 1,
3, -1, -3, -3, -1},
{-1, -2, -2, -1, -3, -1, -1, -2, -2, -3, -3, -1, -2, -4, 7, -1, -1, -4,
-3, -2, -2, -1, -2},
{1, -1, 1, 0, -1, 0, 0, 0, -1, -2, -2, 0, -1, -2, -1, 4, 1, -3,
-2, -2, 0, 0, 0},
{0, -1, 0, -1, -1, -1, -1, -2, -2, -1, -1, -1, -1, -2, -1, 1, 5, -2,
-2, 0, -1, -1, 0},
{-3, -3, -4, -4, -2, -2, -3, -2, -2, -3, -2, -3, -1, 1, -4, -3, -2, 11,
2, -3, -4, -3, -2},
{-2, -2, -2, -3, -2, -1, -2, -3, 2, -1, -1, -2, -1, 3, -3, -2, -2, 2,
7, -1, -3, -2, -1},
{0, -3, -3, -3, -1, -2, -2, -3, -3, 3, 1, -2, 1, -1, -2, -2, 0, -3,
-1, 4, -3, -2, -1},
{-2, -1, 3, 4, -3, 0, 1, -1, 0, -3, -4, 0, -3, -3, -2, 0, -1, -4,
-3, -3, 4, 1, -1},
{-1, 0, 0, 1, -3, 3, 4, -2, 0, -3, -3, 1, -1, -3, -1, 0, -1, -3,
-2, -2, 1, 4, -1},
{0, -1, -1, -1, -2, -1, -1, -1, -1, -1, -1, -1, -1, -1, -2, 0, 0, -2,
-1, -1, -1, -1, 4}
};
int seq2mtx(const char* seq, int seqlen, const char* outFileName)
{
int i, j;
const char *ncbicodes = "XAXCDEFGHIKLMNPQRSTVWXYXXX";
if (seqlen < 5 || seqlen >= MAXSEQLEN)
fail("Sequence length error!");
FILE* pFile = fopen( outFileName, "w" );
if (!pFile)
{
fail("open file for writing failed");
}
fprintf(pFile, "%d\n", seqlen);
for (i=0; i<seqlen; i++)
putc(seq[i], pFile);
fprintf(pFile, "\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n");
for (i=0; i<seqlen; i++)
{
for (j=0; j<26; j++)
if (ncbicodes[j] != 'X')
fprintf(pFile, "%d ", aamat[aanum(seq[i])][aanum(ncbicodes[j])]*100);
else
fprintf(pFile, "-32768 ");
putc('\n', pFile);
}
fclose(pFile);
return 0;
}
int seq2mtx( const char* seq, int seqlen, QTemporaryFile* tmpFile )
{
int i, j;
const char *ncbicodes = "XAXCDEFGHIKLMNPQRSTVWXYXXX";
if (seqlen < 5 || seqlen >= MAXSEQLEN)
fail("Sequence length error!");
tmpFile->open();
QTextStream stream(tmpFile);
stream << seqlen << '\n';
for (i=0; i<seqlen; i++)
stream << seq[i];
stream << "\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n";
for (i=0; i<seqlen; i++)
{
for (j=0; j<26; j++)
if (ncbicodes[j] != 'X')
stream << aamat[aanum(seq[i])][aanum(ncbicodes[j])]*100 << " ";
//fprintf(pFile, "%d ", aamat[aanum(seq[i])][aanum(ncbicodes[j])]*100);
else
stream << "-32768 ";
//fprintf(pFile, "-32768 ");
stream << '\n';
//putc('\n', pFile);
}
return 0;
}
#define MAXSEQLEN 10000
#define SQR(x) ((x)*(x))
#define MAX(x,y) ((x)>(y)?(x):(y))
#define MIN(x,y) ((x)<(y)?(x):(y))
#define REAL float
/* logistic 'squashing' function (output range +/- 1.0) */
#define logistic(x) ((REAL)1.0 / ((REAL)1.0 + (REAL)exp(-(x))))
/* PSIPRED 2 - Neural Network Prediction of Secondary Structure */
/* Copyright (C) 2000 David T. Jones - Created : January 2000 */
/* Original Neural Network code Copyright (C) 1990 David T. Jones */
/* Average Prediction Module */
#include <QDir>
#include <QFile>
#include <QTemporaryFile>
#include <QTextStream>
#include <QString>
#include <U2Core/AppContext.h>
#include <U2Core/AppSettings.h>
#include <U2Core/UserApplicationsSettings.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <ctype.h>
#include <time.h>
#ifdef Q_OS_WIN
#pragma warning(disable: 4996)
#endif
#include "sspred_avpred.h"
#include "sspred_utils.h"
#include "sspred_net.h"
//void *calloc(), *malloc();
// char *wtfnm;
//
// int nwtsum, fwt_to[TOTAL], lwt_to[TOTAL];
// float activation[TOTAL], bias[TOTAL], *weight[TOTAL];
//
// int profile[MAXSEQLEN][20];
//
// int seqlen;
//
// char seq[MAXSEQLEN];
enum aacodes
{
ALA, ARG, ASN, ASP, CYS,
GLN, GLU, GLY, HIS, ILE,
LEU, LYS, MET, PHE, PRO,
SER, THR, TRP, TYR, VAL,
UNK
};
PsiPassOne::PsiPassOne(QTemporaryFile* matFile, const QStringList& weightFiles) : matrixFile(matFile), weightFileNames(weightFiles) {
fwt_to = (int*) malloc(TOTAL*sizeof(int));
lwt_to = (int*) malloc(TOTAL*sizeof(int));
activation = (float*) malloc(TOTAL*sizeof(float));
bias = (float*) malloc(TOTAL*sizeof(float));
weight = (float**) malloc(TOTAL*sizeof(float*));
profile = (int **)malloc(MAXSEQLEN * sizeof(int *));
for (int i = 0; i < MAXSEQLEN; i++) {
profile[i] = (int *)malloc(20 * sizeof(int));
}
}
PsiPassOne::~PsiPassOne()
{
free(fwt_to);
free(lwt_to);
free(activation);
free(bias);
free(weight);
for (int i = 0; i < MAXSEQLEN; i++) {
delete profile[i];
}
delete profile;
}
void PsiPassOne::compute_output(void)
{
int i, j;
float netinp;
for (i = NUM_IN; i < TOTAL; i++)
{
netinp = bias[i];
for (j = fwt_to[i]; j < lwt_to[i]; j++)
netinp += activation[j] * weight[i][j];
/* Trigger neuron */
activation[i] = logistic(netinp);
}
}
/*
* load weights - load all link weights from a disk file
*/
void PsiPassOne::load_wts( const char *fname )
{
int i, j;
double t;
QFile weightFile(fname);
if (!weightFile.open(QIODevice::ReadOnly)) {
return;
}
QTextStream stream(&weightFile);
/* Load input units to hidden layer weights */
for (i = NUM_IN; i < NUM_IN + NUM_HID; i++)
for (j = fwt_to[i]; j < lwt_to[i]; j++)
{
stream >> t;
weight[i][j] = t;
}
/* Load hidden layer to output units weights */
for (i = NUM_IN + NUM_HID; i < TOTAL; i++)
for (j = fwt_to[i]; j < lwt_to[i]; j++)
{
stream >> t;
weight[i][j] = t;
}
/* Load bias weights */
for (j = NUM_IN; j < TOTAL; j++)
{
stream >> t;
bias[j] = t;
}
}
/* Initialize network */
void PsiPassOne::init(void)
{
int i;
for (i = NUM_IN; i < TOTAL; i++)
if (!(weight[i] = (float*) calloc(TOTAL - NUM_OUT, sizeof(float))))
fail("init: Out of Memory!");
/* Connect input units to hidden layer */
for (i = NUM_IN; i < NUM_IN + NUM_HID; i++)
{
fwt_to[i] = 0;
lwt_to[i] = NUM_IN;
}
/* Connect hidden units to output layer */
for (i = NUM_IN + NUM_HID; i < TOTAL; i++)
{
fwt_to[i] = NUM_IN;
lwt_to[i] = NUM_IN + NUM_HID;
}
}
/* Make 1st level prediction averaged over specified weight sets */
void PsiPassOne::predict()
{
//int aa, i, j, k, n, winpos,ws;
int aa, j, winpos;
//char fname[80], predsst[MAXSEQLEN];
char *predsst;
//float avout[MAXSEQLEN][3], conf, confsum[MAXSEQLEN];
float **avout, conf, *confsum;
// Allocate buffers
// TODO: not good, memory is allocated in small chunks for avout
predsst = (char*) malloc(seqlen*sizeof(char));
avout = (float**) malloc(seqlen*sizeof(float*));
for (int i = 0; i < seqlen; ++i) {
avout[i] = (float*) malloc(3*sizeof(float));
}
confsum = (float*) malloc(seqlen*sizeof(float));
for (winpos = 0; winpos < seqlen; winpos++)
avout[winpos][0] = avout[winpos][1] = avout[winpos][2] = confsum[winpos] = 0.0F;
foreach (const QString& wfName, weightFileNames)
{
load_wts(qPrintable(wfName));
for (winpos = 0; winpos < seqlen; winpos++)
{
for (j = 0; j < NUM_IN; j++)
activation[j] = 0.0;
for (j = WINL; j <= WINR; j++)
{
if (j + winpos >= 0 && j + winpos < seqlen)
for (aa=0; aa<20; aa++)
activation[(j - WINL) * 21 + aa] = profile[j+winpos][aa]/1000.0;
else
activation[(j - WINL) * 21 + 20] = 1.0;
}
compute_output();
conf = (2*MAX(MAX(activation[TOTAL - NUM_OUT], activation[TOTAL - NUM_OUT+1]), activation[TOTAL - NUM_OUT+2])-(activation[TOTAL - NUM_OUT]+activation[TOTAL - NUM_OUT+1]+activation[TOTAL - NUM_OUT+2])+MIN(MIN(activation[TOTAL - NUM_OUT], activation[TOTAL - NUM_OUT+1]), activation[TOTAL - NUM_OUT+2]));
avout[winpos][0] += conf * activation[TOTAL - NUM_OUT];
avout[winpos][1] += conf * activation[TOTAL - NUM_OUT+1];
avout[winpos][2] += conf * activation[TOTAL - NUM_OUT+2];
confsum[winpos] += conf;
}
}
for (winpos = 0; winpos < seqlen; winpos++)
{
avout[winpos][0] /= confsum[winpos];
avout[winpos][1] /= confsum[winpos];
avout[winpos][2] /= confsum[winpos];
if (avout[winpos][0] >= MAX(avout[winpos][1], avout[winpos][2]))
predsst[winpos] = 'C';
else if (avout[winpos][2] >= MAX(avout[winpos][0], avout[winpos][1]))
predsst[winpos] = 'E';
else
predsst[winpos] = 'H';
}
QString pFilePath = U2::AppContext::getAppSettings()->getUserAppsSettings()->getUserTemporaryDirPath() + QDir::separator() + "output.ss";
FILE* pFile = fopen(pFilePath.toLatin1().constData(), "w");
if (!pFile) {
fail("failed opening file for writing");
}
for (winpos = 0; winpos < seqlen; winpos++)
fprintf(pFile, "%4d %c %c %6.3f %6.3f %6.3f\n", winpos + 1, seq.constData()[winpos], predsst[winpos], avout[winpos][0], avout[winpos][1], avout[winpos][2]);
fclose(pFile);
// Deallocate buffers
free(predsst);
for (int i = 0; i < seqlen; ++i) {
free(avout[i]);
}
free(avout);
free(confsum);
}
#define BUFSIZE 256
/* Read PSI AA frequency data */
int PsiPassOne::getmtx()
{
int j, naa;
QTextStream stream(matrixFile);
qDebug("%s", qPrintable(matrixFile->fileName()));
stream >> naa;
if (naa == 0) {
fail("Bad mtx file - no sequence length!");
}
if (naa > MAXSEQLEN)
fail("Input sequence too long!");
stream >> seq;
if (seq.size() == 0)
{ fail("Bad mtx file - no sequence!");
}
while (!stream.atEnd())
{
QByteArray line;
line = stream.readLine().toLatin1();
char* buf = line.data();
if (!strncmp(buf, "-32768 ", 7))
{
for (j=0; j<naa; j++)
{
if (sscanf(buf, "%*d%d%*d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%*d%d", &profile[j][ALA], &profile[j][CYS], &profile[j][ASP], &profile[j][GLU], &profile[j][PHE], &profile[j][GLY], &profile[j][HIS], &profile[j][ILE], &profile[j][LYS], &profile[j][LEU], &profile[j][MET], &profile[j][ASN], &profile[j][PRO], &profile[j][GLN], &profile[j][ARG], &profile[j][SER], &profile[j][THR], &profile[j][VAL], &profile[j][TRP], &profile[j][TYR]) != 20)
fail("Bad mtx format!");
line = stream.readLine().toLatin1();
if (line.size() == 0)
break;
buf = line.data();
}
}
}
return naa;
}
int PsiPassOne::runPsiPass()
{
seqlen = getmtx();
init();
predict();
return 0;
}
#ifndef SSPRED_AVPRED_H
#define SSPRED_AVPRED_H
#include <QVector>
#include <QByteArray>
#include <QStringList>
#include "ssdefs.h"
class QByteArray;
class QTemporaryFile;
class PsiPassOne {
int *fwt_to, *lwt_to;
float *activation, *bias, **weight;
int **profile;
int seqlen;
QTemporaryFile* matrixFile;
QByteArray seq;
QStringList weightFileNames;
public:
PsiPassOne(QTemporaryFile* matFile, const QStringList& weightFiles);
~PsiPassOne();
void compute_output(void);
void load_wts(const char *fname);
void init(void);
int getmtx();
void predict();
int runPsiPass();
};
#endif // SSPRED_AVPRED_H
/* PSIPRED2 - Neural Network Prediction of Secondary Structure */
/* Copyright (C) 2000 David T. Jones - Created : January 2000 */
/* Original Neural Network code Copyright (C) 1990 David T. Jones */
/* 2nd Level Prediction Module */
#include <QByteArray>
#include <QFile>
#include <QTextStream>
#include <QString>
#include <QDir>
#include <U2Core/AppContext.h>
#include <U2Core/AppSettings.h>
#include <U2Core/UserApplicationsSettings.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <ctype.h>
#include <time.h>
#ifdef Q_OS_WIN
#pragma warning(disable: 4996)
#endif
#include "sspred_hmulti.h"
#include "sspred_net2.h"
#include "sspred_utils.h"
/* logistic 'squashing' function (+/- 1.0) */
//#define logistic(x) ((REAL)1.0 / ((REAL)1.0 + (REAL)exp(-(x))))
const char *rnames[] =
{
"ALA", "ARG", "ASN", "ASP", "CYS",
"GLN", "GLU", "GLY", "HIS", "ILE",
"LEU", "LYS", "MET", "PHE", "PRO",
"SER", "THR", "TRP", "TYR", "VAL",
"???"
};
// static char *wtfnm;
//
// static int nwtsum, fwt_to[TOTAL], lwt_to[TOTAL];
// static REAL activation[TOTAL], bias[TOTAL], *weight[TOTAL];
//
// static float profile[MAXSEQLEN][3];
//
// static char seq[MAXSEQLEN];
//
// static int seqlen, nprof;
PsiPassTwo::PsiPassTwo() {
fwt_to = (int*) malloc(TOTAL*sizeof(int));
lwt_to = (int*) malloc(TOTAL*sizeof(int));
activation = (float*) malloc(TOTAL*sizeof(REAL));
bias = (float*) malloc(TOTAL*sizeof(REAL));
weight = (float**) malloc(TOTAL*sizeof(REAL*));
nprof = 0; //must be initialized (at least for Fedora)
};
PsiPassTwo::~PsiPassTwo()
{
free(fwt_to);
free(lwt_to);
free(activation);
free(bias);
free(weight);
}
void PsiPassTwo::compute_output(void)
{
int i, j;
REAL netinp;
for (i = NUM_IN; i < TOTAL; i++)
{
netinp = bias[i];
for (j = fwt_to[i]; j < lwt_to[i]; j++)
netinp += activation[j] * weight[i][j];
/* Trigger neuron */
activation[i] = logistic(netinp);
}
}
/*
* load weights - load all link weights from a disk file
*/
void PsiPassTwo::load_wts( const char *fname )
{
int i, j;
QFile weightFile(fname);
if (!weightFile.open(QIODevice::ReadOnly)) {
fail("cannot open weights file");
}
QTextStream stream(&weightFile);
/* Load input units to hidden layer weights */
for (i = NUM_IN; i < NUM_IN + NUM_HID; i++)
for (j = fwt_to[i]; j < lwt_to[i]; j++)
{
stream >> weight[i][j];
}
/* Load hidden layer to output units weights */
for (i = NUM_IN + NUM_HID; i < TOTAL; i++)
for (j = fwt_to[i]; j < lwt_to[i]; j++)
{
stream >> weight[i][j];
}
/* Load bias weights */
for (j = NUM_IN; j < TOTAL; j++)
{
stream >> bias[j];
}
}
void PsiPassTwo::init(void)
{
int i;
for (i = NUM_IN; i < TOTAL; i++)
if (!(weight[i] = (float*) calloc(TOTAL - NUM_OUT, sizeof(REAL))))
fail("init: Out of Memory!");
/* Connect input units to hidden layer */
for (i = NUM_IN; i < NUM_IN + NUM_HID; i++)
{
fwt_to[i] = 0;
lwt_to[i] = NUM_IN;
}
/* Connect hidden units to output layer */
for (i = NUM_IN + NUM_HID; i < TOTAL; i++)
{
fwt_to[i] = NUM_IN;
lwt_to[i] = NUM_IN + NUM_HID;
}
}
/* Main prediction routine */
QByteArray PsiPassTwo::predict( int niters, float dca, float dcb, const char *outname )
{
// char pred, predsst[MAXSEQLEN], lastpreds[MAXSEQLEN], *che = "CHE";
// float score_c[MAXSEQLEN], score_h[MAXSEQLEN], score_e[MAXSEQLEN], bestsc, score, conf[MAXSEQLEN], predq3, av_c, av_h, av_e;
// int aa, a, b, nb, i, j, k, n, winpos;
int aa,b, nb, i, j, winpos;
char pred, *predsst, *lastpreds;
float *score_c, *score_h, *score_e, *conf, av_c, av_h, av_e;
// Allocate buffers
predsst = (char*) malloc(seqlen*sizeof(char));
lastpreds = (char*) malloc(seqlen*sizeof(char));
score_c = (float*) malloc(seqlen*sizeof(float));
score_h = (float*) malloc(seqlen*sizeof(float));
score_e = (float*) malloc(seqlen*sizeof(float));
conf = (float*) malloc(seqlen*sizeof(float));
FILE *ofp;
ofp = fopen(outname, "w");
if (!ofp)
fail("Cannot open output file!");
fputs("# PSIPRED VFORMAT (PSIPRED V2.6 by David Jones)\n\n", ofp);
if (niters < 1)
niters = 1;
do {
memcpy(lastpreds, predsst, seqlen);
av_c = av_h = av_e = 0.0;
for (winpos = 0; winpos < seqlen; winpos++)
{
av_c += profile[winpos][0];
av_h += profile[winpos][1];
av_e += profile[winpos][2];
}
av_c /= seqlen;
av_h /= seqlen;
av_e /= seqlen;
for (winpos = 0; winpos < seqlen; winpos++)
{
for (j = 0; j < NUM_IN; j++)
activation[j] = 0.0;
activation[(WINR - WINL + 1) * IPERGRP] = av_c;
activation[(WINR - WINL + 1) * IPERGRP + 1] = av_h;
activation[(WINR - WINL + 1) * IPERGRP + 2] = av_e;
activation[(WINR - WINL + 1) * IPERGRP + 3] = logistic((seqlen-150)/100.0);
for (j = WINL; j <= WINR; j++)
{
if (j + winpos >= 0 && j + winpos < seqlen)
{
for (aa = 0; aa < 3; aa++)
activation[(j - WINL) * IPERGRP + aa] = profile[j + winpos][aa];
}
else
activation[(j - WINL) * IPERGRP + 3] = 1.0;
}
compute_output();
if (activation[TOTAL - NUM_OUT] > (dca * activation[TOTAL - NUM_OUT + 1]) && activation[TOTAL - NUM_OUT] > (dcb * activation[TOTAL - NUM_OUT + 2]))
pred = 'C';
else if (dca * activation[TOTAL - NUM_OUT + 1] > activation[TOTAL - NUM_OUT] && dca*activation[TOTAL - NUM_OUT + 1] > dcb * activation[TOTAL - NUM_OUT + 2])
pred = 'H';
else
pred = 'E';
predsst[winpos] = pred;
score_c[winpos] = activation[TOTAL - NUM_OUT];
score_h[winpos] = activation[TOTAL - NUM_OUT + 1];
score_e[winpos] = activation[TOTAL - NUM_OUT + 2];
}
for (winpos = 0; winpos < seqlen; winpos++)
{
profile[winpos][0] = score_c[winpos];
profile[winpos][1] = score_h[winpos];
profile[winpos][2] = score_e[winpos];
}
} while (memcmp(predsst, lastpreds, seqlen) && --niters);
for (winpos = 0; winpos < seqlen; winpos++)
conf[winpos] = (2*MAX(MAX(score_c[winpos], score_h[winpos]), score_e[winpos])-(score_c[winpos]+score_h[winpos]+score_e[winpos])+MIN(MIN(score_c[winpos], score_h[winpos]), score_e[winpos]));
for (winpos = 0; winpos < seqlen; winpos++)
if (winpos && winpos < seqlen - 1 && predsst[winpos - 1] == predsst[winpos + 1] && conf[winpos] < 0.5*(conf[winpos-1]+conf[winpos+1]))
predsst[winpos] = predsst[winpos - 1];
for (winpos = 0; winpos < seqlen; winpos++)
{
if (winpos && winpos < seqlen - 1 && predsst[winpos - 1] == 'C' && predsst[winpos] != predsst[winpos + 1])
predsst[winpos] = 'C';
if (winpos && winpos < seqlen - 1 && predsst[winpos + 1] == 'C' && predsst[winpos] != predsst[winpos - 1])
predsst[winpos] = 'C';
}
for (winpos=0; winpos<seqlen; winpos++)
fprintf(ofp, "%4d %c %c %6.3f %6.3f %6.3f\n", winpos + 1, seq[winpos], predsst[winpos], score_c[winpos], score_h[winpos], score_e[winpos]);
fclose(ofp);
// FILE* pFile = fopen( "header.out", "w" );
// if (!pFile)
// {
// fail("open file for writing failed");
// }
QByteArray result;
nb = seqlen / 60 + 1;
j = 1;
for (b = 0; b < nb; b++)
{
//fprintf(pFile, "\nConf: ");
for (i = 0; i < 60; i++)
{
if (b * 60 + i >= seqlen)
break;
j = b * 60 + i + 1;
// putc(MIN((char)(10.0*conf[j-1]+'0'), '9'), pFile);
}
//fprintf("\nPred: ");
for (i = 0; i < 60; i++)
{
if (b * 60 + i >= seqlen)
break;
j = b * 60 + i + 1;
// putc(predsst[j - 1], pFile);
result.append(predsst[j-1]);
}
//fprintf(pFile, "\n AA: ");
for (i = 0; i < 60; i++)
{
if (b * 60 + i >= seqlen)
break;
j = b * 60 + i + 1;
// putc(seq[j - 1], pFile);
}
// fprintf(pFile, "\n ");
for (i = 0; i < 58; i++)
{
if (b * 60 + i + 3 > seqlen)
break;
j = b * 60 + i + 3;
if (!(j % 10)) {
// fprintf(pFile, "%3d", j);
i += 2;
}
else {
// fprintf(pFile, " ");
}
}
// putc('\n', pFile);
// putc('\n', pFile);
}
//fclose(pFile);
//Deallocate buffers
free(predsst);
free(lastpreds);
free(score_c);
free(score_h);
free(score_e);
free(conf);
return result;
}
/* Read PSI AA frequency data */
int PsiPassTwo::getss(FILE * lfil)
{
int naa;
float pv[3];
char buf[256];
naa = 0;
while (!feof(lfil))
{
if (!fgets(buf, 255, lfil))
break;
seq[naa] = buf[5];
//char c = buf[5];
//seq.insert(naa, c);
if (sscanf(buf + 11, "%f%f%f", &pv[0], &pv[1], &pv[2]) != 3)
break;
if (!nprof)
{
profile[naa][0] = pv[0];
profile[naa][1] = pv[1];
profile[naa][2] = pv[2];
}
else
{
profile[naa][0] += pv[0];
profile[naa][1] += pv[1];
profile[naa][2] += pv[2];
}
naa++;
}
nprof++;
if (!naa)
fail("Bad format!");
return naa;
}
int PsiPassTwo::runPsiPass( int argc, const char *argv[], QByteArray& result )
{
int i;
FILE *ifp;
/* malloc_debug(3); */
if (argc < 5)
fail("usage : psipass2 weight-file itercount DCA DCB outputfile ss-infile ...");
init();
load_wts(wtfnm = argv[1]);
QString outputFileName=U2::AppContext::getAppSettings()->getUserAppsSettings()->getUserTemporaryDirPath() + QDir::separator() + "output.ss"; //File created at sspred_avpred.cpp in method predict
ifp = fopen(outputFileName.toLatin1().constData(), "r");
if (!ifp) {
fail("failed opening file for reading");
// exit(1);
}
seqlen = getss(ifp);
fclose(ifp);
for (i=0; i<seqlen; i++)
{
profile[i][0] /= nprof;
profile[i][1] /= nprof;
profile[i][2] /= nprof;
}
//puts("# PSIPRED HFORMAT (PSIPRED V2.6 by David Jones)");
QString outputFileName2=U2::AppContext::getAppSettings()->getUserAppsSettings()->getUserTemporaryDirPath() + QDir::separator() + "output.ss2";
result = predict(atoi(argv[2]), (float)atof(argv[3]), (float)atof(argv[4]), outputFileName2.toLatin1().constData());
return 0;
}
#ifndef SSPRED_HMULTI_H
#define SSPRED_HMULTI_H
#include <QVector>
#include <QByteArray>
#include "ssdefs.h"
class QTemporaryFile;
class PsiPassTwo {
const char *wtfnm;
int *fwt_to, *lwt_to;
REAL *activation, *bias, **weight;
float profile[MAXSEQLEN][3];
char seq[MAXSEQLEN];
int seqlen, nprof;
public:
PsiPassTwo();
~PsiPassTwo();
void compute_output(void);
void load_wts(const char *fname);
void init(void);
QByteArray predict(int niters, float dca, float dcb, const char *outname);
int getss(FILE * lfil);
int runPsiPass(int argc, const char *argv[], QByteArray& result);
};
#endif // SSPRED_HMULTI_H
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.