Skip to content
Commits on Source (4)
Changes
=======
v2.0 - TO BE RELEASED
---------------------
FIRST VERSION SUPPORTING PYTHON3-ONLY!
See closed issues in Milestone 2.0: https://github.com/geopython/stetl/milestone/10?closed=1
These are all related to the Py2 to Py3 migration. Other issues arevmoved to later Milestones/releases.
Main is the PR worked on for the Py2 to Py3 migration:
https://github.com/geopython/stetl/pull/81
v1.3 - march 20, 2019
---------------------
LAST VERSION SUPPORTING PYTHON2!
See closed issues in Milestone 1.3: https://github.com/geopython/stetl/milestone/9?closed=1
Very few changes, this release is mainly to make a baseline for v2.0 (Python3).
v1.2 - july 7, 2018
-------------------
......
......@@ -87,6 +87,8 @@ project's developers might not want to merge into the project.
Please adhere to the coding conventions used throughout a project (indentation,
accurate comments, etc.) and any other requirements (such as test coverage).
You can run the `nose` and `flake8` tools to check your code with respect to
unit tests and coding style.
Follow this process if you'd like your work considered for inclusion in the
project:
......
......@@ -10,6 +10,8 @@ Stetl is developed by:
Bas Couwenberg is providing Debian/Ubuntu packaging.
Rob van Loon preparing Python3 migration and other.
This project would not be possible without the great work of Frank Warmerdam and other
GDAL/OGR developers (http://gdal.org).
......
Metadata-Version: 1.2
Metadata-Version: 2.1
Name: Stetl
Version: 1.2
Version: 1.3
Summary: Transformation and conversion framework (ETL) mainly for geospatial data
Home-page: http://github.com/geopython/stetl
Author: Just van den Broecke
......@@ -98,6 +98,25 @@ Description: # Stetl - Streaming ETL
Changes
=======
v2.0 - TO BE RELEASED
---------------------
FIRST VERSION SUPPORTING PYTHON3-ONLY!
See closed issues in Milestone 2.0: https://github.com/geopython/stetl/milestone/10?closed=1
These are all related to the Py2 to Py3 migration. Other issues arevmoved to later Milestones/releases.
Main is the PR worked on for the Py2 to Py3 migration:
https://github.com/geopython/stetl/pull/81
v1.3 - march 20, 2019
---------------------
LAST VERSION SUPPORTING PYTHON2!
See closed issues in Milestone 1.3: https://github.com/geopython/stetl/milestone/9?closed=1
Very few changes, this release is mainly to make a baseline for v2.0 (Python3).
v1.2 - july 7, 2018
-------------------
......@@ -211,6 +230,8 @@ Description: # Stetl - Streaming ETL
Bas Couwenberg is providing Debian/Ubuntu packaging.
Rob van Loon preparing Python3 migration and other.
This project would not be possible without the great work of Frank Warmerdam and other
GDAL/OGR developers (http://gdal.org).
......@@ -231,3 +252,4 @@ Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 2
Classifier: Topic :: Scientific/Engineering :: GIS
Description-Content-Type: text/markdown
1.2
\ No newline at end of file
1.3
\ No newline at end of file
python-stetl (1.2+ds-2) UNRELEASED; urgency=medium
python-stetl (1.3+ds-1~exp1) experimental; urgency=medium
* New upstream release.
* Bump Standards-Version to 4.3.0, no changes.
* Drop autopkgtests to test installability & module import.
* Add lintian override for testsuite-autopkgtest-missing.
* Remove package name from lintian overrides.
-- Bas Couwenberg <sebastic@debian.org> Sun, 05 Aug 2018 20:54:36 +0200
-- Bas Couwenberg <sebastic@debian.org> Wed, 20 Mar 2019 15:25:59 +0100
python-stetl (1.2+ds-1) unstable; urgency=medium
......
......@@ -3,10 +3,11 @@
Installation
============
Stetl currently only runs with Python 2 (2.7+). `Work is underway <https://github.com/geopython/stetl/pull/27>`_ for Python3 support.
Stetl up to and including version 1.3 only runs with Python 2 (2.7+).
Starting with Stetl v2.0 only Python 3 (3.4.2+) will be supported.
Easiest is to first install the Stetl-dependencies (see below) and then
install and maintain Stetl on your system as a Python package (pip is preferred). ::
install and maintain Stetl on your system as a Python package (`pip` is preferred). ::
(sudo) pip install stetl
or
......@@ -106,12 +107,16 @@ choose to install the same packages via `pip` to have more recent versions like
apt-get install python-jinja2
Mac OSX
~~~~~~~
Dependencies can best be installed via `Homebrew <http://brew.sh/>`_.
Tip: sometimes installing GDAL Python bindings can be tricky as the
installed GDAL binaries must be compatible. To install the right version you may use: ::
pip install GDAL==`gdalinfo --version | cut -d' ' -f2 | cut -d',' -f1`
Windows
~~~~~~~
......
......@@ -42,7 +42,7 @@
<cities:name>Amsterdam</cities:name>
<cities:population>779808</cities:population>
<cities:geometry>
<gml:Point srsName="urn:ogc:def:crs:EPSG::4258" gml:id="point-1"><gml:pos>52.3730454545455 4.89483636363636</gml:pos></gml:Point>
<gml:Point srsName="urn:ogc:def:crs:EPSG::4258" gml:id="point-1"><gml:pos>52.3730454554572 4.89483636363636</gml:pos></gml:Point>
</cities:geometry>
</cities:City>
</gml:featureMember>
......@@ -51,7 +51,7 @@
<cities:name>Bonn</cities:name>
<cities:population>327913</cities:population>
<cities:geometry>
<gml:Point srsName="urn:ogc:def:crs:EPSG::4258" gml:id="point-2"><gml:pos>50.7345545454545 7.09981818181818</gml:pos></gml:Point>
<gml:Point srsName="urn:ogc:def:crs:EPSG::4258" gml:id="point-2"><gml:pos>50.7345545463786 7.09981818181818</gml:pos></gml:Point>
</cities:geometry>
</cities:City>
</gml:featureMember>
......@@ -60,7 +60,7 @@
<cities:name>Rome</cities:name>
<cities:population>2753000</cities:population>
<cities:geometry>
<gml:Point srsName="urn:ogc:def:crs:EPSG::4258" gml:id="point-3"><gml:pos>41.88 12.52</gml:pos></gml:Point>
<gml:Point srsName="urn:ogc:def:crs:EPSG::4258" gml:id="point-3"><gml:pos>41.8800000009378 12.52</gml:pos></gml:Point>
</cities:geometry>
</cities:City>
</gml:featureMember>
......
This diff is collapsed.
......@@ -47,6 +47,7 @@ setup(
maintainer_email='justb4@gmail.com',
url='http://github.com/geopython/stetl',
long_description=readme + "\n" + changes + "\n" + credits,
long_description_content_type="text/markdown",
packages=find_packages(exclude=['tests']),
namespace_packages=['stetl'],
include_package_data=True,
......
......@@ -69,7 +69,7 @@ class ETL:
# Parse unique list of argument names from config file string.
# https://www.machinelearningplus.com/python/python-regex-tutorial-examples/
args_names = list(set(re.findall('{[A-Z|a-z]\w+}', config_str)))
args_names = list(set(re.findall(r'{[A-Z|a-z]\w+}', config_str)))
args_names = [name.split('{')[1].split('}')[0] for name in args_names]
# Optional: expand from equivalent env vars
......
......@@ -5,6 +5,7 @@
#
# Author:Just van den Broecke
from stetl.component import Config
from stetl.util import Util, etree
from stetl.filter import Filter
from stetl.packet import FORMAT
......@@ -19,12 +20,19 @@ class XsltFilter(Filter):
consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc
"""
@Config(ptype=str, required=True)
def script(self):
"""
Path to XSLT script file.
"""
pass
# Constructor
def __init__(self, configdict, section):
Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)
self.xslt_file_path = self.cfg.get('script')
self.xslt_file = open(self.xslt_file_path, 'r')
self.xslt_file = open(self.script, 'r')
# Parse XSLT file only once
self.xslt_doc = etree.parse(self.xslt_file)
self.xslt_obj = etree.XSLT(self.xslt_doc)
......
......@@ -109,7 +109,7 @@ class OgrInput(Input):
# Report failure if failed
if self.data_source_p is None:
log.error("Cannot open OGR datasource: %s with the following drivers." % self.data_source)
log.error("Cannot open OGR datasource: %s with the following drivers." % Util.safe_string_value(self.data_source))
for iDriver in range(self.ogr.GetDriverCount()):
log.info(" -> " + self.ogr.GetDriver(iDriver).GetName())
......@@ -126,11 +126,11 @@ class OgrInput(Input):
self.layer_count = self.data_source_p.GetLayerCount()
self.layer_idx = 0
log.info("Opened OGR source ok: %s layer count=%d" % (self.data_source, self.layer_count))
log.info("Opened OGR source ok: %s layer count=%d" % (Util.safe_string_value(self.data_source), self.layer_count))
def read(self, packet):
if not self.data_source_p:
log.info("End reading from: %s" % self.data_source)
log.info("End reading from: %s" % Util.safe_string_value(self.data_source))
return packet
if self.layer is None:
......@@ -145,11 +145,11 @@ class OgrInput(Input):
if self.layer is None:
log.error("Could not fetch layer %d" % 0)
raise Exception()
log.info("Start reading from OGR Source: %s, Layer: %s" % (self.data_source, self.layer.GetName()))
log.info("Start reading from OGR Source: %s, Layer: %s" % (Util.safe_string_value(self.data_source), self.layer.GetName()))
else:
# No more Layers left: cleanup
packet.set_end_of_stream()
log.info("Closing OGR source: %s" % self.data_source)
log.info("Closing OGR source: %s" % Util.safe_string_value(self.data_source))
# Destroy not required anymore: http://trac.osgeo.org/gdal/wiki/PythonGotchas
# self.data_source_p.Destroy()
self.data_source_p = None
......@@ -314,7 +314,7 @@ class OgrPostgisInput(Input):
self.cmd = self.cmd.split('|')
def exec_cmd(self):
log.info("start ogr2ogr cmd = %s" % repr(self.cmd))
log.info("start ogr2ogr cmd = %s" % Util.safe_string_value(repr(self.cmd)))
self.ogr_process = subprocess.Popen(self.cmd,
shell=False,
stdout=subprocess.PIPE,
......
......@@ -48,7 +48,7 @@ class ExecOutput(Output):
try:
os.environ.update(env_vars)
log.info("executing cmd=%s" % cmd)
log.info("executing cmd=%s" % Util.safe_string_value(cmd))
subprocess.call(cmd, shell=True)
log.info("execute done")
finally:
......
......@@ -201,7 +201,7 @@ class OgrOutput(Output):
if self.dest_fd is None:
self.dest_fd = self.dest_driver.CreateDataSource(self.dest_data_source, options=self.dest_create_options)
if self.dest_fd is None:
log.error("%s driver failed to create %s" % (self.dest_format, self.dest_data_source))
log.error("%s driver failed to create %s" % (self.dest_format, Util.safe_string_value(self.dest_data_source)))
raise Exception()
# /* -------------------------------------------------------------------- */
......@@ -218,7 +218,7 @@ class OgrOutput(Output):
self.layer_create_options)
self.feature_def = None
log.info("Opened OGR dest ok: %s " % self.dest_data_source)
log.info("Opened OGR dest ok: %s " % Util.safe_string_value(self.dest_data_source))
def write(self, packet):
......@@ -228,7 +228,7 @@ class OgrOutput(Output):
return packet
if self.layer is None:
log.info("No Layer, end writing to: %s" % self.dest_data_source)
log.info("No Layer, end writing to: %s" % Util.safe_string_value(self.dest_data_source))
return packet
# Assume ogr_feature_array input, otherwise convert ogr_feature to list
......@@ -268,7 +268,7 @@ class OgrOutput(Output):
def write_end(self, packet):
# Destroy not required anymore: http://trac.osgeo.org/gdal/wiki/PythonGotchas
# self.dest_fd.Destroy()
log.info("End writing to: %s" % self.dest_data_source)
log.info("End writing to: %s" % Util.safe_string_value(self.dest_data_source))
self.dest_fd = None
self.layer = None
return packet
......
......@@ -4,9 +4,10 @@
#
# Author:Just van den Broecke
import glob
import logging
import os
import glob
import re
import types
from time import time
from ConfigParser import ConfigParser
......@@ -14,6 +15,15 @@ from ConfigParser import ConfigParser
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(name)s %(levelname)s %(message)s')
# Constants for precompiled regular expressions
RE_PG_START = re.compile(r'\bPG:', flags=re.IGNORECASE)
RE_PG_PWD = re.compile(r'\bpassword=[^\'"]\S*', flags=re.IGNORECASE)
RE_PG_PWD_DBL = re.compile(r'\bpassword="(?:[^"\\]|\\.)*"', flags=re.IGNORECASE)
RE_PG_PWD_SNG = re.compile(r'\bpassword=\'(?:[^\'\\]|\\.)*\'', flags=re.IGNORECASE)
RE_PG_USER = re.compile(r'\buser=[^\'"]\S*', flags=re.IGNORECASE)
RE_PG_USER_DBL = re.compile(r'\buser="(?:[^"\\]|\\.)*"', flags=re.IGNORECASE)
RE_PG_USER_SNG = re.compile(r'\buser=\'(?:[^\'\\]|\\.)*\'', flags=re.IGNORECASE)
# Static utility methods
class Util:
......@@ -348,6 +358,24 @@ class Util:
return elem
# Hide user names and passwords in string values, like the Postgres connection string as used by GDAL/OGR
# See https://stackoverflow.com/questions/249791/regex-for-quoted-string-with-escaping-quotes for the escaped quotes expressions
@staticmethod
def safe_string_value(value, hide_value='***'):
# PostgreSQL connection strings as used by GDAL/OGR
if RE_PG_START.search(value) is not None:
value = RE_PG_PWD.sub('password=%s' % hide_value, value)
value = RE_PG_PWD_DBL.sub('password="%s"' % hide_value, value)
value = RE_PG_PWD_SNG.sub('password=\'%s\'' % hide_value, value)
value = RE_PG_USER.sub('user=%s' % hide_value, value)
value = RE_PG_USER_DBL.sub('user="%s"' % hide_value, value)
value = RE_PG_USER_SNG.sub('user=\'%s\'' % hide_value, value)
# Add more cases as needed ...
return value
log = Util.get_log("util")
......@@ -488,9 +516,14 @@ class ConfigSection():
# Need to hide some sensitive values, usually used for logging
safe_copy = self.config_dict.copy()
hides = ['passw', 'pasw', 'token', 'user']
hide_value = '<hidden>'
for key in safe_copy:
for hide_key in hides:
if hide_key in key.lower():
safe_copy[key] = '<hidden>'
safe_copy[key] = hide_value
# Also hide usernames/passwords in string values, like Postgres connection strings used by GDAL/OGR
safe_copy[key] = Util.safe_string_value(safe_copy[key], hide_value)
return repr(safe_copy)
......@@ -162,7 +162,7 @@ class parser:
self._names.append(self.alias(element))
subpattern = '(\S*)'
subpattern = r'(\S*)'
if hasquotes:
if element == '%r' or findreferreragent.search(element):
......
__version__ = "1.2"
__version__ = "1.3"