import 0.1.2 via 24db32e9366307dcdfbc91d1ba7046b287d7b341

There is no formal versioning, rely on the commit hash instead

wget https://github.com/firstlookmedia/pdf-redact-tools/archive/24db32e9366307dcdfbc91d1ba7046b287d7b341.zip
parents
# Building PDF Redact Tools
First, get a copy of the source code.
```sh
git clone https://github.com/micahflee/pdf-redact-tools.git
cd pdf-redact-tools
```
### Debian-based Linux (Debian, Ubuntu, Mint, etc.)
Install dependencies:
```sh
sudo apt-get install imagemagick libimage-exiftool-perl python-stdeb python-all fakeroot build-essential
```
Create a .deb and install it:
```sh
./build_deb.sh
sudo dpkg -i deb_dist/pdf-redact-tools_*-1_all.deb
```
### Red Hat-based Linux (Red Hat, Fedora, CentOS, etc.)
Install dependencies:
```sh
sudo dnf install rpm-build ImageMagick perl-Image-ExifTool
```
Create a .rpm and install it:
```sh
./build_rpm.sh
sudo dnf install dist/pdf-redact-tools-*-1.noarch.rpm
```
### Mac OS X
The easiest way to get this working on OS X is by installing dependencies with [Homebrew](http://brew.sh/).
Install dependencies:
```sh
brew install imagemagick exiftool gs
```
Install pdf-redact-tools systemwide:
```sh
sudo cp pdf-redact-tools /usr/local/bin
```
Changelog
# 0.1.2
* Added achromatic option that converts to black and white to remove printer dots
* Added a warning to the readme about ImageMagick vulnerabilities
# 0.1.1
* Added safety check against ImageMagick vulnerability CVE-2016-3714
# 0.1
* Initial release
This diff is collapsed.
include LICENSE
include README.md
include version
# PDF Redact Tools
![PDF Redact Tools](/logo.png)
PDF Redact Tools helps with securely redacting and stripping metadata from documents before publishing.
*Warning:* PDF Redact Tools uses ImageMagick to parse PDFs. While ImageMagick is a versatile tool, it has a history of some [terrible](https://imagetragick.com/) security bugs. A malicious PDF could exploit a bug in ImageMagick to take over your computer. If you're working with potentially malicious PDFs, it's safest to run them through PDF Redact Tools in an isolated environment, such as a virtual machine, or by using a tool such as the [Qubes PDF Converter](https://github.com/QubesOS/qubes-app-linux-pdf-converter) instead.
## Quick Start
### Mac OS X
* Install [Homebrew](http://brew.sh/)
* Open a terminal and type `$ brew install pdf-redact-tools`
### Ubuntu
You can install PDF Redact Tools from this Ubuntu PPA:
```sh
$ sudo add-apt-repository ppa:micahflee/ppa
$ sudo apt-get update
$ sudo apt-get install pdf-redact-tools
```
### Other
PDF Redact Tools isn't yet packaged in any GNU/Linux distributions yet, however it's easy to install by following the [build instructions](/BUILD.md). I haven't attempted to make this work in Windows.
## How to Use
To use it, convert your original document to a PDF.
Then start by exploding the PDF into PNG files:
```sh
$ pdf-redact-tools --explode example_document.pdf
```
This will create a new folder in the same directory as the PDF called (in this case) `example_document_pages`, with a PNG for each page.
Edit each page that needs redacting in graphics editing software like GIMP or Photoshop. Note that opening, editing, and saving a PNG will likely make it look slightly different than the other PNGs. For best results, open all PNGs and simply save and close the pages you don't need to edit.
When you're done, combine the PNGs back into a flattened, informationless PDF:
```sh
$ pdf-redact-tools --merge example_document.pdf
```
In this case, the final redacted PDF is called `example_document-final.pdf`.
If you don't need to redact anything, but you just want a new PDF that definitely doesn't contain malware or metadata, you can simply sanitize it.
```sh
$ pdf-redact-tools --sanitize untrusted.pdf
```
The final document that you can trust is called `untrusted-final.pdf`.
#!/bin/sh
VERSION=`cat version`
# clean up from last build
rm -r deb_dist
# build binary package
python setup.py --command-packages=stdeb.command bdist_deb
# install it
echo ""
echo "To install, run:"
echo "sudo dpkg -i deb_dist/pdf-redact-tools_$VERSION-1_all.deb"
#!/bin/sh
VERSION=`cat version`
# clean up from last build
rm -r build
# build binary package
python setup.py bdist_rpm --requires="ImageMagick, perl-Image-ExifTool"
# install it
echo ""
echo "RPM package: dist/pdf-redact-tools-$VERSION-1.noarch.rpm"
logo.png

16.4 KB

#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
PDF Redact Tools | https://github.com/micahflee/pdf-redact-tools
Copyright (C) 2014-2015 Micah Lee <micah@micahflee.com>
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
"""
import sys, os, subprocess, argparse, shutil
class PDFRedactTools(object):
def __init__(self, pdf_filename = None):
if pdf_filename:
self.set_pdf_filename(pdf_filename)
else:
self.pdf_filename = None
self.pages_dirname = None
def set_pdf_filename(self, pdf_filename):
self.pdf_filename = os.path.abspath(pdf_filename)
self.output_filename = self.pdf_filename.replace('.pdf', '-final.pdf')
split = os.path.splitext(self.pdf_filename)
self.pages_dirname = split[0] + '_pages'
self.transparent_filename = os.path.join(self.pages_dirname, 'page-transparent.png')
def explode(self, achromatic = False):
if not self.pdf_filename:
print 'Error: you must call set_pdf_filename before calling explode'
return False
# make dir for pages
if os.path.isdir(self.pages_dirname):
print 'Error: the directory {} already exists, you must delete it before exploding'.format(self.pages_dirname)
return False
else:
os.makedirs(self.pages_dirname, 0700)
# convert PDF to PNGs
print 'Converting PDF to PNGs'
subprocess.call([ 'convert',
'-density', '128',
self.pdf_filename,
'-quality', '100',
'-sharpen', '0x1.0',
self.transparent_filename])
# flatten all the PNGs, so they don't have transparent backgrounds
print 'Flattening PNGs'
filenames = os.listdir(self.pages_dirname)
for filename in filenames:
if os.path.splitext(filename)[1].lower() == '.png':
# one-page exploded PDFs end in "-transparent.png"
if filename[-16:] == '-transparent.png':
new_filename = filename.replace('-transparent', '-0')
# multipage exploded PDFs end in "-transparent-#.png"
else:
new_filename = filename.replace('-transparent-', '-')
subprocess.call(['convert',
os.path.join(self.pages_dirname, filename),
'-flatten',
os.path.join(self.pages_dirname, new_filename)])
os.remove(os.path.join(self.pages_dirname, filename))
# convert images to achromatic to remove printer dots
if achromatic:
print 'Converting colors to achromatic'
filenames = os.listdir(self.pages_dirname)
for filename in filenames:
if os.path.splitext(filename)[1].lower() == '.png':
# add '-bw' suffix to temporary file
new_filename = filename.replace('.png', '-bw.png')
subprocess.call(['convert',
os.path.join(self.pages_dirname, filename),
'-threshold', '75%',
os.path.join(self.pages_dirname, new_filename)])
# remove original files
os.remove(os.path.join(self.pages_dirname, filename))
# rename files with the '-bw.png' suffix to '.png'
os.rename(os.path.join(self.pages_dirname, new_filename), os.path.join(self.pages_dirname, filename))
# rename files to sort alphabetically instead of just numerically
numbers = []
filenames = os.listdir(self.pages_dirname)
filenames.sort()
filename_template = os.path.join(self.pages_dirname, filenames[0].replace('-0.png', '-{}.png'))
for filename in filenames:
n = int(filename.split('.png')[0].split('-')[-1])
numbers.append(n)
numbers.sort()
digits = len(str(numbers[-1]))
for n in numbers:
cur_digits = len(str(n))
if cur_digits < digits:
new_n = '0'*(digits - cur_digits) + str(n)
os.rename(filename_template.format(n), filename_template.format(new_n))
return True
def merge(self):
if not self.pdf_filename:
print 'Error: you must call set_pdf_filename before calling merge'
return False
# make sure pages directory exists
if not os.path.isdir(self.pages_dirname):
print "Error: {} is not a directory".format(pages_dirname)
return False
# convert PNGs to PDF
print "Converting PNGs to PDF"
subprocess.call(['convert',
os.path.join(self.pages_dirname, 'page-*.png'),
self.output_filename])
# strip metadata
print "Stripping ImageMagick metadata"
subprocess.call(['exiftool', '-Title=', '-Producer=', self.output_filename])
os.remove('{0}_original'.format(self.output_filename))
return True
def parse_arguments():
def require_pdf(fname):
ext = os.path.splitext(fname)[1][1:]
if ext.lower() != 'pdf':
parser.error("file must be a PDF")
if not os.path.isfile(fname):
parser.error("{} does not exist".format(fname))
return fname
parser = argparse.ArgumentParser()
group = parser.add_mutually_exclusive_group(required=True)
group.add_argument('-e', '--explode',
metavar='filename', dest='explode_filename',
type=lambda s:require_pdf(s),
help='Explode a PDF into PNGs')
group.add_argument('-m', '--merge',
metavar='filename', dest='merge_filename',
type=lambda s:require_pdf(s),
help='Merge a folder of PNGs into a PDF')
group.add_argument('-s', '--sanitize',
metavar='filename', dest='sanitize_filename',
type=lambda s:require_pdf(s),
help='Sanitize a PDF')
parser.add_argument('-a', '--achromatic',
action='store_true',
help='Convert to black and white to remove printer dots')
args = parser.parse_args()
return args
def valid_pdf(filename):
return subprocess.check_output(['file',
'-b',
'--mime-type',
filename]).strip() == 'application/pdf'
def main():
# parse arguements
args = parse_arguments()
explode_filename = args.explode_filename
merge_filename = args.merge_filename
sanitize_filename = args.sanitize_filename
achromatic = args.achromatic
pdfrt = PDFRedactTools()
# explode
if explode_filename:
if valid_pdf(explode_filename):
pdfrt.set_pdf_filename(explode_filename)
if pdfrt.explode(achromatic):
print 'All done, now go edit PNGs in {} to redact and then run: pdf-redact-tools -m {}'.format(pdfrt.pages_dirname, pdfrt.pdf_filename)
else:
print explode_filename,' does not appear to be a PDF file, will not process'
# merge
if merge_filename:
if valid_pdf(merge_filename):
pdfrt.set_pdf_filename(merge_filename)
if pdfrt.merge():
print "All done, your final output is {}".format(pdfrt.output_filename)
else:
print merge_filename,' does not appear to be a PDF file, will not process'
# sanitize
if sanitize_filename:
if valid_pdf(sanitize_filename):
pdfrt.set_pdf_filename(sanitize_filename)
if pdfrt.explode(achromatic):
if pdfrt.merge():
# delete temp files
shutil.rmtree(pdfrt.pages_dirname)
print "All done, your final output is {}".format(pdfrt.output_filename)
else:
print sanitize_filename,' does not appear to be a PDF file, will not process'
if __name__ == '__main__':
main()
#!/bin/sh
# This script pushes updates to my Ubuntu PPA: https://launchpad.net/~micahflee/+archive/ppa
# If you want to use it, you'll need your own ~/.dput.cf and ssh key.
# More info: https://help.launchpad.net/Packaging/PPA/Uploading
VERSION=`cat version`
rm -rf deb_dist
python setup.py --command-packages=stdeb.command sdist_dsc
cd deb_dist/pdf-redact-tools-$VERSION
dpkg-buildpackage -S
cd ..
dput ppa:micahflee/ppa pdf-redact-tools_$VERSION-1_source.changes
cd ..
from distutils.core import setup
import os
with open('version') as buf:
version = buf.read().strip()
setup(
name='pdf-redact-tools',
version=version,
author='Micah Lee',
author_email='micah.lee@firstlook.org',
platforms=['GNU/Linux'],
license='GPLv3',
url='https://github.com/micahflee/pdf-redact-tools',
description='PDF Redact Tools helps with securely redacting and stripping metadata from documents before publishing',
long_description="PDF Redact Tools helps with securely redacting and stripping metadata from documents before publishing.",
scripts=['pdf-redact-tools']
)
[DEFAULT]
Package: pdf-redact-tools
Depends: imagemagick, libimage-exiftool-perl
Suite: trusty
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment