Skip to content
Commits on Source (5)
Changes
=======
v2.0 - TO BE RELEASED
v2.0 - april 11, 2019
---------------------
FIRST VERSION SUPPORTING PYTHON3-ONLY!
See closed issues in Milestone 2.0: https://github.com/geopython/stetl/milestone/10?closed=1
These are all related to the Py2 to Py3 migration. Other issues arevmoved to later Milestones/releases.
These are all related to the Py2 to Py3 migration. Other issues are moved to later Milestones/releases.
Main is the PR worked on for the Py2 to Py3 migration:
https://github.com/geopython/stetl/pull/81
......
......@@ -90,6 +90,8 @@ accurate comments, etc.) and any other requirements (such as test coverage).
You can run the `nose` and `flake8` tools to check your code with respect to
unit tests and coding style.
### Getting started
Follow this process if you'd like your work considered for inclusion in the
project:
......@@ -105,6 +107,10 @@ project:
git remote add upstream https://github.com/<upstream-owner>/<repo-name>
```
2. make a `virtualenv` e.g. for Python 3.4.2 (e.g. `pyenv` on MacOSX Homebrew)
2. install dependencies, GDAL may be tricky, usually `pip install gdal==<version>` with v from `gdalinfo --version`
2. If you cloned a while ago, get the latest changes from upstream:
```bash
......@@ -112,6 +118,8 @@ project:
git pull upstream <dev-branch>
```
3. Run ``python setup.py install`` to install the ``stetl`` command and Stetl code in the ``site-packages`` for ``venv``.
3. Create a new topic branch (off the main project development branch) to
contain your feature, change, or fix:
......@@ -131,6 +139,12 @@ project:
git pull [--rebase] upstream <dev-branch>
```
5. Run ``flake8`` from the root directory to verify the syntax and coding standards.
5. Run ``nose2`` from the root direcotry to verify all tests are ok.
5. Check the basic examples still work: ``cd examples/basics; ./runall.sh > runall.log 2>&1``. Inspect ``runall.log`` for errors or strange outputs
6. Push your topic branch up to your fork:
```bash
......@@ -143,6 +157,14 @@ project:
**IMPORTANT**: By submitting a patch, you agree to allow the project owner to
license your work under the same license as that used by the project.
### Docker
To verify the Docker image build:
* build Docker Image: `docker build -t geopython/stetl:2.0 .`
* `cd examples/basics; ./runall-docker.sh > runall-docker.log 2>&1` - run all basic examples from Docker Image
* inspect `runall-docker.log` for errors or strange outputs
## Thanks
This doc copied and adapted from original at:
......
Metadata-Version: 2.1
Name: Stetl
Version: 1.3
Version: 2.0
Summary: Transformation and conversion framework (ETL) mainly for geospatial data
Home-page: http://github.com/geopython/stetl
Author: Just van den Broecke
......@@ -98,13 +98,13 @@ Description: # Stetl - Streaming ETL
Changes
=======
v2.0 - TO BE RELEASED
v2.0 - april 11, 2019
---------------------
FIRST VERSION SUPPORTING PYTHON3-ONLY!
See closed issues in Milestone 2.0: https://github.com/geopython/stetl/milestone/10?closed=1
These are all related to the Py2 to Py3 migration. Other issues arevmoved to later Milestones/releases.
These are all related to the Py2 to Py3 migration. Other issues are moved to later Milestones/releases.
Main is the PR worked on for the Py2 to Py3 migration:
https://github.com/geopython/stetl/pull/81
......
1.3
\ No newline at end of file
2.0
\ No newline at end of file
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Main Stetl program.
#
# Author: Just van den Broecke
......@@ -8,6 +6,7 @@
from stetl.main import parse_args
from stetl.etl import ETL
from stetl.util import Util
from stetl.version import __version__
import sys
log = Util.get_log('main')
......@@ -20,19 +19,22 @@ def main():
-c --config <config_file> the Stetl config file.
-s --section <section_name> the section in the Stetl config (ini) file to execute (default is [etl]).
-a --args <arglist> zero or more substitutable args for symbolic, {arg}, values in Stetl config file, in format -a arg1=foo -a arg2=bar etc.
-v --version Show the current version of stelt and exit
-h --help <subject> Get component documentation like its configuration parameters, e.g. stetl doc stetl.inputs.fileinput.FileInput
"""
args = parse_args(sys.argv[1:])
if args.version:
print('Stetl version: ', __version__)
exit()
if args.config_file:
# Do the ETL
etl = ETL(vars(args), args.config_args)
etl.run()
# elif args.doc_args:
# print_doc(args.doc_args)
else:
print('Try stetl -h for help')
......
python-stetl (2.0+ds-1~exp1) experimental; urgency=medium
* New upstream release.
* Switch to Python 3.
-- Bas Couwenberg <sebastic@debian.org> Thu, 11 Apr 2019 18:36:47 +0200
python-stetl (1.3+ds-1~exp1) experimental; urgency=medium
* New upstream release.
......
......@@ -5,19 +5,19 @@ Section: science
Priority: optional
Build-Depends: debhelper (>= 9),
dh-python,
pylint,
python-all,
python-cov-core,
python-flake8,
python-gdal,
python-jinja2,
python-lxml,
python-mock,
python-nose,
python-nose2,
python-psycopg2,
python-setuptools,
python-sphinx,
pylint3,
python3-all,
python3-cov-core,
python3-deprecated,
python3-flake8,
python3-gdal,
python3-jinja2,
python3-lxml,
python3-mock,
python3-nose,
python3-nose2,
python3-psycopg2,
python3-setuptools,
python3-sphinx,
docbook2x,
docbook-xsl,
......@@ -28,15 +28,14 @@ Vcs-Browser: https://salsa.debian.org/debian-gis-team/python-stetl
Vcs-Git: https://salsa.debian.org/debian-gis-team/python-stetl.git
Homepage: http://stetl.org/
Package: python-stetl
Package: python3-stetl
Architecture: all
Section: python
Depends: libjs-jquery,
libjs-underscore,
${python:Depends},
${python3:Depends},
${misc:Depends}
Provides: ${python:Provides}
Description: Streaming ETL - Geospatial ETL framework for Python 2
Description: Streaming ETL - Geospatial ETL framework for Python 3
Stetl, streaming ETL, pronounced "staedl", is a lightweight ETL-framework
for the conversion of rich (as GML) geospatial data conversion.
.
......@@ -53,13 +52,13 @@ Description: Streaming ETL - Geospatial ETL framework for Python 2
modules (e.g. PostGIS), transformers (e.g. XSLT) and outputs (e.g. a GML
file or even WFS-T a geospatial protocol to publish GML to a server).
.
This package contains the module for Python 2.
This package contains the module for Python 3.
Package: stetl
Architecture: all
Section: utils
Depends: python-stetl (>= ${binary:Version}),
${python:Depends},
Depends: python3-stetl (>= ${binary:Version}),
${python3:Depends},
${misc:Depends}
Description: Streaming ETL - Commandline utility
Stetl, streaming ETL, pronounced "staedl", is a lightweight ETL-framework
......
......@@ -4,5 +4,5 @@ Author: Just van den Broecke
Section: Programming/Python
Format: HTML
Index: /usr/share/doc/python-stetl/html/index.html
Files: /usr/share/doc/python-stetl/html/*.html
Index: /usr/share/doc/python3-stetl/html/index.html
Files: /usr/share/doc/python3-stetl/html/*.html
usr/share/javascript/jquery/jquery.js usr/share/doc/python-stetl/html/_static/jquery.js
usr/share/javascript/underscore/underscore.js usr/share/doc/python-stetl/html/_static/underscore.js
usr/share/javascript/jquery/jquery.js usr/share/doc/python3-stetl/html/_static/jquery.js
usr/share/javascript/underscore/underscore.js usr/share/doc/python3-stetl/html/_static/underscore.js
Description: Fix typo in example config.
Author: Bas Couwenberg <sebastic@debian.org>
Forwarded: https://github.com/geopython/stetl/pull/88
Applied-Upstream: https://github.com/geopython/stetl/commit/6cffe3fca8a54cf0ad27e18e0eec324b44798daa
--- a/examples/top10nl/etl-top10nl.cfg
+++ b/examples/top10nl/etl-top10nl.cfg
@@ -29,7 +29,7 @@ chains = input_sql_pre|schema_name_filte
# Pre SQL file inputs to be executed
[input_sql_pre]
-class =stetl. inputs.fileinput.StringFileInput
+class = stetl.inputs.fileinput.StringFileInput
file_path = sql/drop-tables.sql,sql/create-schema.sql
# Post SQL file inputs to be executed
Description: Use python3 interpreter in test.
Author: Bas Couwenberg <sebastic@debian.org>
Forwarded: https://github.com/geopython/stetl/pull/90
Applied-Upstream: https://github.com/geopython/stetl/commit/30b093adc2bc95dd9406853efb643e07af1924a9
--- a/tests/data/commandexecfilter.txt
+++ b/tests/data/commandexecfilter.txt
@@ -1 +1 @@
-python -c "print('{0}/{1}'.format('foo','bar'))"
+python3 -c "print('{0}/{1}'.format('foo','bar'))"
python3.patch
example-typo.patch
......@@ -18,7 +18,7 @@ MANPAGES := $(wildcard debian/man/*.*.xml)
%:
dh $@ \
--buildsystem=pybuild \
--with python2 \
--with python3 \
--parallel
override_dh_clean:
......@@ -43,7 +43,7 @@ override_dh_auto_install:
# Move usr/bin/stetl to stetl package
mkdir -p debian/stetl/usr
mv debian/python-stetl/usr/bin debian/stetl/usr
mv debian/python3-stetl/usr/bin debian/stetl/usr
override_dh_installexamples:
dh_installexamples
......
# Python 3 is not supported yet: https://github.com/geopython/stetl/issues/23
python-foo-but-no-python3-foo python-stetl
# Not worth the effort
testsuite-autopkgtest-missing
# Python 3 is not supported yet: https://github.com/geopython/stetl/issues/23
dependency-on-python-version-marked-for-end-of-life (Depends: python)
......@@ -26,6 +26,7 @@ Contents:
intro.rst
install.rst
py3upgrade.rst
background.rst
using.rst
cases.rst
......
......@@ -5,6 +5,7 @@ Installation
Stetl up to and including version 1.3 only runs with Python 2 (2.7+).
Starting with Stetl v2.0 only Python 3 (3.4.2+) will be supported.
You may want to read :ref:`py3upgrade` when upgrading from a Stetl pre-v2 version.
Easiest is to first install the Stetl-dependencies (see below) and then
install and maintain Stetl on your system as a Python package (`pip` is preferred). ::
......@@ -50,6 +51,7 @@ Stetl depends on the following Python packages:
* psycopg2 (PostgreSQL client)
* lxml
* Jinja2 templating
* Deprecated
``GDAL`` Python binding requires the native GDAL/OGR libs and tools (version 2+) to be installed.
......@@ -60,7 +62,9 @@ Stetl depends on the following Python packages:
When using the ``Jinja2`` templating filter, ``Jinja2TemplatingFilter``, see http://jinja.pocoo.org:
* Python Jinja2 package
* Python ``Jinja2`` package
``Deprecated`` is used to indicated deprecated functions and classes.
Platform-specific guidelines for dependencies follow next.
......@@ -76,17 +80,17 @@ choose to install the same packages via `pip` to have more recent versions like
- Python dependencies: ::
apt-get install python-setuptools
apt-get install python-dev
apt-get install python-pip
apt-get install python3-setuptools
apt-get install python3-dev
apt-get install python3-pip
pip install --upgrade pip
- ``libxml2/libxslt`` libs are usually already installed. Together with Python ``lxml``, the total install for ``lxml`` is: ::
apt-get install python-libxml2
apt-get install python3-libxml2
apt-get install python-libxslt1
apt-get install libxml2-dev libxslt1-dev lib32z1-dev
apt-get install python-lxml
apt-get install python3-lxml
- ``GDAL`` (http://gdal.org) version 2+ with Python bindings: ::
......@@ -95,17 +99,17 @@ choose to install the same packages via `pip` to have more recent versions like
apt-get update
apt-get install gdal-bin
gdalinfo --version
# should show something like: GDAL 2.2.1, released 2017/06/23
# should show something like: GDAL 2.4.0, released 2019/03/04
apt-get install python-gdal
- the PostgreSQL client library for Python ``psycopg2``: ::
apt-get install python-psycopg2
apt-get install python3-psycopg2
- for ``Jinja2``: ::
apt-get install python-jinja2
apt-get install python3-jinja2
Mac OSX
~~~~~~~
......@@ -153,7 +157,7 @@ You should get meaningful output like ::
2013-09-16 18:25:12,122 main INFO Stetl version = 1.0.3
usage: stetl [-h] -c CONFIG_FILE [-s CONFIG_SECTION] [-a CONFIG_ARGS]
Especially check the Stetl version number.
Especially check the Stetl version number. You can also use the `-v` or `--version` option for stetl.
Try running the examples when running with a downloaded distro. ::
......
.. _py3upgrade:
Upgrade to Python 3
===================
Stetl development started in Python 2. With `PEP 373
<https://legacy.python.org/dev/peps/pep-0373/>`_ the EOL of python 2.7 was announced and python 2
will not be officialy supported after 2020. Stetl was therefore upgraded to Python 3.
Python 3
--------
Work started early 2019 to upgrade ``Stetl`` from Python 2 to Python 3. The last version of Stetl
that supports Python 2 is version 1.3. This version *might* receive quick fixes and updates, but
users are encouraged to upgrade to Stetl version 2 or higher and thus use Python 3.
For the full discussion on the Python 2 to Python 3 migration: see the `conversation in pull
request #81 <https://github.com/geopython/stetl/pull/81>`_ within the GitHub repository.
Important changes for developers
--------------------------------
Python 2 and 3 are very similar, but there are a couple of important changes that developers need
to keep in mind and are worth mentioning:
- Stetl 2 supports Python 3.4.2 and higher (so unfortunately no `f strings <https://www.python.org/dev/peps/pep-0498/>`_)
- Python 3 uses Unicode strings, meaning encoding/decoding is a bit different
- ``stringIO`` and ``cstringIO`` were moved around
- slight syntax change on calling ``next()`` for iterators
- update on ``import`` statements
- differences in ``urllib`` to make http-calls (although `issue 80 <https://github.com/geopython/stetl/issues/80>`_ might change it to the `requests` library).
Important changes for users
---------------------------
The specification of the Stetl tool chain uses a configuration file. You can use the Inputs, Filters, and
Outputs that are provided by Stetl, or write your own. If you use Stetl Components in your configuration, you *must*
specify the ``stetl.`` package prefix in the class specification. For example before Stetl version 2 the input XML
file was specified as ::
[input_xml_file]
class = inputs.fileinput.XmlFileInput
file_path = input/cities.xml
for Stetl version 2 this is changed to ::
[input_xml_file]
class = stetl.inputs.fileinput.XmlFileInput
file_path = input/cities.xml
Note the extra ``stetl.`` part in the ``class`` specification.
......@@ -33,20 +33,20 @@ This example takes the input file ``input/cities.xml`` and transforms this file
chains = input_xml_file|transformer_xslt|output_file
[input_xml_file]
class = inputs.fileinput.XmlFileInput
class = stetl.inputs.fileinput.XmlFileInput
file_path = input/cities.xml
[transformer_xslt]
class = filters.xsltfilter.XsltFilter
class = stetl.filters.xsltfilter.XsltFilter
script = cities2gml.xsl
[output_file]
class = outputs.fileoutput.FileOutput
class = stetl.outputs.fileoutput.FileOutput
file_path = output/gmlcities.gml
Most of the sections in this ini-file specify a Stetl component: an Input, Filter or Output component.
Each component is specified by its (Python) class and per-component specific parameters.
For example ``[input_xml_file]`` uses the class :class:`inputs.fileinput.XmlFileInput` reading and parsing the
For example ``[input_xml_file]`` uses the class :class:`stetl.inputs.fileinput.XmlFileInput` reading and parsing the
file ``input/cities.xml`` specified by the ``file_path`` property. ``[transformer_xslt]`` is a Filter that
applies XSLT with the script file ``cities2gml.xsl`` that is in the same directory. The ``[output_file]``
component specifies the output, in this case a file.
......@@ -73,6 +73,9 @@ It is even possible to have both Splitting and Merging together with filtering:
[etl]
chains = (input_http_api_1 | cleaner_filter) (input_http_api_2) | data_transformer | (output_db) (output_file)
Note: since version 2 of stetl it is required that the call to *stetl* components actually start with `stetl`. This is
not necessary when you write your own components (see `example 7 <https://github.com/geopython/stetl/tree/master/examples/basics/7_mycomponent>`_)
Configuring Components
----------------------
......@@ -168,17 +171,17 @@ whatever it gets as input from the previous Filter in the Chain. ::
chains = input_xml_file|transformer_xslt|output_ogr_shape
[input_xml_file]
class = inputs.fileinput.XmlFileInput
class = stetl.inputs.fileinput.XmlFileInput
file_path = input/cities.xml
[transformer_xslt]
class = filters.xsltfilter.XsltFilter
class = stetl.filters.xsltfilter.XsltFilter
script = cities2gml.xsl
# The ogr2ogr command-line. May be split over multiple lines for readability.
# Backslashes not required in that case.
[output_ogr_shape]
class = outputs.ogroutput.Ogr2OgrOutput
class = stetl.outputs.ogroutput.Ogr2OgrOutput
temp_file = temp/gmlcities.gml
ogr2ogr_cmd = ogr2ogr
-overwrite
......@@ -206,6 +209,9 @@ For example within the current directory you may have an ``etl.cfg`` Stetl file:
WORK_DIR=`pwd`
sudo docker run -v ${WORK_DIR}:${WORK_DIR} -w ${WORK_DIR} geopython/stetl:latest stetl -c etl.cfg
# or leaner
sudo docker run --rm -v $(pwd):/work -w /work geopython/stetl:latest stetl -c etl.cfg
A more advanced setup would be (network-)linking to a PostGIS Docker image
like `kartoza/postgis <https://hub.docker.com/r/kartoza/postgis/>`_: ::
......@@ -249,15 +255,15 @@ example `6_cmdargs <https://github.com/geopython/stetl/tree/master/examples/basi
chains = input_xml_file|transformer_xslt|output_file
[input_xml_file]
class = inputs.fileinput.XmlFileInput
class = stetl.inputs.fileinput.XmlFileInput
file_path = {in_xml}
[transformer_xslt]
class = filters.xsltfilter.XsltFilter
class = stetl.filters.xsltfilter.XsltFilter
script = {in_xsl}
[output_file]
class = outputs.fileoutput.FileOutput
class = stetl.outputs.fileoutput.FileOutput
file_path = {out_xml}
Note the symbolic input, xsl and output files. We can now perform
......@@ -430,19 +436,19 @@ Here the Chains are split by using ``()`` in the ETL Chain definition: ::
[input_xml_file]
class = inputs.fileinput.XmlFileInput
class = stetl.inputs.fileinput.XmlFileInput
file_path = input/cities.xml
[transformer_xslt]
class = filters.xsltfilter.XsltFilter
class = stetl.filters.xsltfilter.XsltFilter
script = cities2gml.xsl
[output_file]
class = outputs.fileoutput.FileOutput
class = stetl.outputs.fileoutput.FileOutput
file_path = output/gmlcities.gml
[output_std]
class = outputs.standardoutput.StandardOutput
class = stetl.outputs.standardoutput.StandardOutput
Chain Merging
-------------
......@@ -466,20 +472,20 @@ Outputs: ::
[input_1]
class = inputs.fileinput.XmlFileInput
class = stetl.inputs.fileinput.XmlFileInput
file_path = input1/cities.xml
[input_2]
class = inputs.fileinput.XmlFileInput
class = stetl.inputs.fileinput.XmlFileInput
file_path = input2/cities.xml
[transformer_xslt]
class = filters.xsltfilter.XsltFilter
class = stetl.filters.xsltfilter.XsltFilter
script = cities2gml.xsl
[output_file]
class = outputs.fileoutput.FileOutput
class = stetl.outputs.fileoutput.FileOutput
file_path = output/gmlcities.gml
[output_std]
class = outputs.standardoutput.StandardOutput
class = stetl.outputs.standardoutput.StandardOutput
......@@ -15,18 +15,18 @@ chains = input_json|filter_template_xml|output_xml_file
,input_addresses_csv|convert_record_array_to_struct|filter_template_addresses2inspire|output_inspire_addresses
[input_json]
class = inputs.fileinput.JsonFileInput
class = stetl.inputs.fileinput.JsonFileInput
file_path = input/cities.json
# EXAMPLE 1 - simple Jinja2 templating: JSON to XML
# Simple xml templating
[filter_template_xml]
class = filters.templatingfilter.Jinja2TemplatingFilter
class = stetl.filters.templatingfilter.Jinja2TemplatingFilter
template_file = templates/cities-json2xml.jinja2
[output_xml_file]
class = outputs.fileoutput.FileOutput
class = stetl.outputs.fileoutput.FileOutput
file_path = output/cities.xml
......@@ -35,18 +35,18 @@ file_path = output/cities.xml
# Advanced gml templating with globals for more or less static content
# like contact info etc
[filter_template_gml]
class = filters.templatingfilter.Jinja2TemplatingFilter
class = stetl.filters.templatingfilter.Jinja2TemplatingFilter
template_file = templates/cities-json2gml.jinja2
template_globals_path = input/globals.json
[output_gml_file]
class = outputs.fileoutput.FileOutput
class = stetl.outputs.fileoutput.FileOutput
file_path = output/cities.gml
# EXAMPLE 3 - more advanced Jinja2 templating - GeoJSON to GML - and reading from URL
[input_geojson]
class = inputs.fileinput.JsonFileInput
class = stetl.inputs.fileinput.JsonFileInput
# file_path = input/cities-gjson.json
file_path = https://raw.githubusercontent.com/justb4/stetl/master/examples/basics/10_jinja2_templating/input/cities-gjson.json
output_format = geojson_collection
......@@ -54,26 +54,26 @@ output_format = geojson_collection
# More advanced gml templating with globals for more or less static content
# and GeoJSON to GML geometry conversion
[filter_template_geojson2gml]
class = filters.templatingfilter.Jinja2TemplatingFilter
class = stetl.filters.templatingfilter.Jinja2TemplatingFilter
template_file = templates/cities-gjson2gml.jinja2
template_globals_path = input/globals.json,https://raw.githubusercontent.com/justb4/stetl/master/examples/basics/10_jinja2_templating/input/more-globals.json
input_format = geojson_collection
[output_gml_file2]
class = outputs.fileoutput.FileOutput
class = stetl.outputs.fileoutput.FileOutput
file_path = output/cities-gjson.gml
# EXAMPLE 4 - very advanced Jinja2 templating - local addresd data (CSV) to INSPIRE Addresses (AD) GML
[input_addresses_csv]
class = inputs.fileinput.CsvFileInput
class = stetl.inputs.fileinput.CsvFileInput
file_path = input/addresses.csv
# We need this, since the Jinja2 template expects a named struct
# So basically we will convert to a JSON-like structure where the root
# member is named "addresses" as an array of records (name/value pairs).
[convert_record_array_to_struct]
class = filters.formatconverter.FormatConverter
class = stetl.filters.formatconverter.FormatConverter
input_format = record_array
output_format = struct
converter_args = {'top_name' : 'addresses'}
......@@ -81,15 +81,15 @@ converter_args = {'top_name' : 'addresses'}
# More advanced gml templating with globals for more or less static content
# and GeoJSON to GML geometry conversion
[filter_template_addresses2inspire]
class = filters.templatingfilter.Jinja2TemplatingFilter
class = stetl.filters.templatingfilter.Jinja2TemplatingFilter
template_file = templates/addresses2inspire-ad.jinja2
template_globals_path = input/addresses-globals.json
[output_inspire_addresses]
class = outputs.fileoutput.FileOutput
class = stetl.outputs.fileoutput.FileOutput
file_path = output/inspire-addresses.gml
# EXTRA
# for testing/debugging only: replace file output with this to directly show results
[output_std]
class = outputs.standardoutput.StandardOutput
class = stetl.outputs.standardoutput.StandardOutput