Commit 6ae6890d authored by Julian Taylor's avatar Julian Taylor

Imported Upstream version 3.3.1

parent 0fa465a7

Too many changes to show.

To preserve performance only 1000 of 1000+ files are displayed.

Authors of FFTW (reachable at fftw@fftw.org):
Matteo Frigo <athena@fftw.org>
Stevenj G. Johnson <stevenj@alum.mit.edu>
Stefan Kral <skral@fftw.org> wrote genfft-k7/*.ml*, which was
added in fftw-3.0 and removed in fftw-3.2.
Support for the Cell Broadband Engine was graciously donated by the
IBM Austin Research Lab, which was added in fftw-3.2 and removed in
fftw-3.3.
Support for MIPS64 paired-single SIMD instructions was graciously
donated by CodeSourcery, Inc.
Code conventions used internally by fftw3 (not in API):
LEARN FROM THE MASTERS: read Ken Thompson's C compiler in Plan 9.
Avoid learning from C++/Java programs.
INDENTATION: K&R, 5 spaces/tab. In case of doubt, indent -kr -i5.
NAMES: keep them short. Shorter than you think. The Bible was written
without vowels. Don't outsmart the Bible.
Common names:
R : real type, aka fftw_real
E : real type for local variables (possibly extra precision)
C : complex type
sz : size
vecsz : vector size
is, os : input/output stride
ri, ii : real/imag input (complex data)
ro, io : real/imag output (complex data)
I, O : real input/output (real data)
A : assert
CK : check
S : solver, defined internally to each solver file
P : plan, defined internally to each solver file
k : codelet
X(...) : used for mangling of external names (see below)
K(...) : floating-point constant, in E precision
If a name is used often and must have the form fftw_foo to avoid
namespace pollution, #define FOO fftw_foo and use the short name.
Leave that hungarian crap to MS. foo_t counts as hungarian: use
foo instead. foo is lowercase so that it does not look like a DOS
program. Exception: typedef struct foo_s {...} foo; instead of
typedef struct foo {...} foo; for C++ compatibility.
NAME MANGLING: use X(foo) for external names instead of fftw_foo.
X(foo) expands to fftwf_foo or fftw_foo, depending on the
precision. (Unfortunately, this is a ugly form of hungarian
notation. Grrr...) Names that are not exported do not need to be
mangled.
REPEATED CODE: favor a table. E.g., do not write
foo("xxx", 1);
foo("yyy", 2);
foo("zzz", -1);
Instead write
struct { const char *nam, int arg } footab[] = {
{ "xxx", 1 },
{ "yyy", 2 },
{ "zzz", -1 }
};
and loop over footab. Rationale: it saves code space.
Similarly, replace a switch statement with a table whenever
possible.
C++: The code should compile as a C++ program. Run the code through
gcc -xc++ . The extra C++ restrictions are unnecessary, of
course, but this will save us from a flood of complaints when
we release the code.
This diff is collapsed.
/*
* Copyright (c) 2003, 2007-11 Matteo Frigo
* Copyright (c) 2003, 2007-11 Massachusetts Institute of Technology
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
*
*/
This diff is collapsed.
This diff is collapsed.
OPTIONS_AUTOMAKE=gnu
lib_LTLIBRARIES = libfftw3@PREC_SUFFIX@.la
# pkgincludedir = $(includedir)/fftw3@PREC_SUFFIX@
# nodist_pkginclude_HEADERS = config.h
# recompile genfft if maintainer mode is true
if MAINTAINER_MODE
GENFFT = genfft
else
GENFFT =
endif
ACLOCAL_AMFLAGS=-I m4
SUBDIRS=support $(GENFFT) kernel simd-support dft rdft reodft api \
threads libbench2 . tests mpi doc tools m4
EXTRA_DIST=COPYRIGHT bootstrap.sh CONVENTIONS fftw.pc.in
SIMD_LIBS = \
simd-support/libsimd_support.la \
simd-support/libsimd_sse2_nonportable.la
if HAVE_SSE2
SSE2_LIBS = dft/simd/sse2/libdft_sse2_codelets.la \
rdft/simd/sse2/librdft_sse2_codelets.la
endif
if HAVE_AVX
AVX_LIBS = dft/simd/avx/libdft_avx_codelets.la \
rdft/simd/avx/librdft_avx_codelets.la
endif
if HAVE_ALTIVEC
ALTIVEC_LIBS = dft/simd/altivec/libdft_altivec_codelets.la \
rdft/simd/altivec/librdft_altivec_codelets.la
endif
if HAVE_NEON
NEON_LIBS = dft/simd/neon/libdft_neon_codelets.la \
rdft/simd/neon/librdft_neon_codelets.la
endif
if THREADS
if COMBINED_THREADS
COMBINED_THREADLIBS=threads/libfftw3@PREC_SUFFIX@_threads.la
endif
endif
libfftw3@PREC_SUFFIX@_la_SOURCES =
libfftw3@PREC_SUFFIX@_la_LIBADD = \
kernel/libkernel.la \
dft/libdft.la \
dft/scalar/libdft_scalar.la \
dft/scalar/codelets/libdft_scalar_codelets.la \
rdft/librdft.la \
rdft/scalar/librdft_scalar.la \
rdft/scalar/r2cf/librdft_scalar_r2cf.la \
rdft/scalar/r2cb/librdft_scalar_r2cb.la \
rdft/scalar/r2r/librdft_scalar_r2r.la \
reodft/libreodft.la \
api/libapi.la \
$(SIMD_LIBS) $(SSE2_LIBS) $(AVX_LIBS) $(ALTIVEC_LIBS) $(NEON_LIBS) \
$(COMBINED_THREADLIBS)
if QUAD
# cannot use -no-undefined since dependent on libquadmath
libfftw3@PREC_SUFFIX@_la_LDFLAGS = -version-info @SHARED_VERSION_INFO@
else
libfftw3@PREC_SUFFIX@_la_LDFLAGS = -no-undefined -version-info \
@SHARED_VERSION_INFO@
endif
fftw3@PREC_SUFFIX@.pc: fftw.pc
cp -f fftw.pc fftw3@PREC_SUFFIX@.pc
pkgconfigdir = $(libdir)/pkgconfig
pkgconfig_DATA = fftw3@PREC_SUFFIX@.pc
WISDOM_DIR = /etc/fftw
WISDOM = wisdom@PREC_SUFFIX@
WISDOM_TIME=12 # default to 12-hour limit, i.e. overnight
WISDOM_FLAGS=--verbose --canonical --time-limit=$(WISDOM_TIME)
wisdom:
tools/fftw@PREC_SUFFIX@-wisdom -o $@ $(WISDOM_FLAGS)
install-wisdom: wisdom
$(mkinstalldirs) $(WISDOM_DIR)
$(INSTALL_DATA) wisdom $(WISDOM_DIR)/$(WISDOM)
This diff is collapsed.
This diff is collapsed.
FFTW is a free collection of fast C routines for computing the
Discrete Fourier Transform in one or more dimensions. It includes
complex, real, symmetric, and parallel transforms, and can handle
arbitrary array sizes efficiently. FFTW is typically faster than
other publically-available FFT implementations, and is even
competitive with vendor-tuned libraries. (See our web page for
extensive benchmarks.) To achieve this performance, FFTW uses novel
code-generation and runtime self-optimization techniques (along with
many other tricks).
The doc/ directory contains the manual in texinfo, PDF, info, and HTML
formats. Frequently asked questions and answers can be found in the
doc/FAQ/ directory in ASCII and HTML.
For a quick introduction to calling FFTW, see the "Tutorial" section
of the manual.
Installation instructions are provided in the manual (don't worry, it
is straightforward).
CONTACTS
--------
FFTW was written by Matteo Frigo and Steven G. Johnson. You can
contact them at fftw@fftw.org. The latest version of FFTW,
benchmarks, links, and other information can be found at the FFTW home
page (http://www.fftw.org). You can also sign up to the fftw-announce
mailing list to receive (infrequent) updates and information about new
releases; to do so, go to:
http://www.fftw.org/mailman/listinfo/fftw-announce
TODO before FFTW-$2\pi$:
* Wisdom: make it clear that it is specific to the exact fftw version
and configuration. Report error codes when reading wisdom. Maybe
have multiple system wisdom files, one per version?
* DCT/DST codelets? which kinds?
* investigate the addition-chain trig computation
* I can't believe that there isn't a closed form for the omega
array in Rader.
* convolution problem type(s)
* Explore the idea of having n < 0 in tensors, possibly to mean
inverse DFT.
* better estimator: possibly, let "other" cost be coef * n, where
coef is a per-solver constant determined via some big numerical
optimization/fit.
* vector radix, multidimensional codelets
* it may be a good idea to unify all those little loops that do
copying, (X[i], X[n-i]) <- (X[i] + X[n-i], X[i] - X[n-i]),
and multiplication of vectors by twiddle factors.
* Pruned FFTs (basically, a vecloop that skips zeros).
* Try FFTPACK-style back-and-forth (Stockham) FFT. (We tried this a
few years ago and it was slower, but perhaps matters have changed.)
* Generate assembly directly for more processors, or maybe fork gcc. =)
* ensure that threaded solvers generate (block_size % 4 == 0)
to allow SIMD to be used.
* memoize triggen.
This diff is collapsed.
AM_CPPFLAGS = -I$(top_srcdir)/kernel -I$(top_srcdir)/dft \
-I$(top_srcdir)/rdft -I$(top_srcdir)/reodft
AM_CFLAGS = $(STACK_ALIGN_CFLAGS)
EXTRA_DIST = f03api.sh genf03.pl fftw3.f03.in
include_HEADERS = fftw3.h fftw3.f fftw3l.f03 fftw3q.f03
nodist_include_HEADERS = fftw3.f03
noinst_LTLIBRARIES = libapi.la
# pkgincludedir = $(includedir)/fftw3@PREC_SUFFIX@
# pkginclude_HEADERS = api.h x77.h guru.h guru64.h
libapi_la_SOURCES = apiplan.c configure.c execute-dft-c2r.c \
execute-dft-r2c.c execute-dft.c execute-r2r.c execute-split-dft-c2r.c \
execute-split-dft-r2c.c execute-split-dft.c execute.c \
export-wisdom-to-file.c export-wisdom-to-string.c export-wisdom.c \
f77api.c flops.c forget-wisdom.c import-system-wisdom.c \
import-wisdom-from-file.c import-wisdom-from-string.c import-wisdom.c \
malloc.c map-r2r-kind.c mapflags.c mkprinter-file.c mktensor-iodims.c \
mktensor-rowmajor.c plan-dft-1d.c plan-dft-2d.c plan-dft-3d.c \
plan-dft-c2r-1d.c plan-dft-c2r-2d.c plan-dft-c2r-3d.c plan-dft-c2r.c \
plan-dft-r2c-1d.c plan-dft-r2c-2d.c plan-dft-r2c-3d.c plan-dft-r2c.c \
plan-dft.c plan-guru-dft-c2r.c plan-guru-dft-r2c.c plan-guru-dft.c \
plan-guru-r2r.c plan-guru-split-dft-c2r.c plan-guru-split-dft-r2c.c \
plan-guru-split-dft.c plan-many-dft-c2r.c plan-many-dft-r2c.c \
plan-many-dft.c plan-many-r2r.c plan-r2r-1d.c plan-r2r-2d.c \
plan-r2r-3d.c plan-r2r.c print-plan.c rdft2-pad.c the-planner.c \
version.c api.h f77funcs.h fftw3.h x77.h guru.h guru64.h \
mktensor-iodims.h plan-guru-dft-c2r.h plan-guru-dft-r2c.h \
plan-guru-dft.h plan-guru-r2r.h plan-guru-split-dft-c2r.h \
plan-guru-split-dft-r2c.h plan-guru-split-dft.h plan-guru64-dft-c2r.c \
plan-guru64-dft-r2c.c plan-guru64-dft.c plan-guru64-r2r.c \