Commit d2dc206f authored by SVN-Git Migration's avatar SVN-Git Migration

Imported Upstream version 20101226+dfsg

parent a50af389
......@@ -3,7 +3,7 @@
PACKAGE=pdfminer
PYTHON=python
PYTHON=python2
GIT=git
RM=rm -f
CP=cp -f
......@@ -21,16 +21,14 @@ clean:
distclean: clean test_clean cmap_clean
pack: distclean MANIFEST
sdist: distclean MANIFEST.in
$(PYTHON) setup.py sdist
register: distclean MANIFEST
register: distclean MANIFEST.in
$(PYTHON) setup.py sdist upload register
MANIFEST:
$(GIT) ls-tree --name-only -r HEAD > MANIFEST
WEBDIR=$$HOME/Site/unixuser.org/python/$(PACKAGE)
publish:
$(CP) docs/*.html docs/*.png $(WEBDIR)
$(CP) docs/*.html docs/*.png docs/*.css $(WEBDIR)
CONV_CMAP=$(PYTHON) tools/conv_cmap.py
CMAPSRC=cmaprsrc
......
Metadata-Version: 1.0
Name: pdfminer
Version: 20101017
Version: 20101226
Summary: PDF parser and analyzer
Home-page: http://www.unixuser.org/~euske/python/pdfminer/index.html
Author: Yusuke Shinyama
......
This diff is collapsed.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>Mining PDF files</title>
<style type="text/css"><!--
blockquote { background: #eeeeee; }
--></style>
</head><body>
<h1>Mining PDF files</h1>
<p>
<p>
<a href="http://www.unixuser.org/~euske/python/pdfminer/index.html">Homepage</a>
<div align=right class=lastmod>
<!-- hhmts start -->
Last Modified: Sat Nov 14 21:09:01 JST 2009
<!-- hhmts end -->
</div>
<h2>What is PDF?</h2>
<p>
<h3>What PDF is ...</h3>
<ul>
<li> A weird mixture of texts and binaries. (Yikes!)
<li> Generated sequentially, but needs random access to read.
</ul>
<h3>What PDF is not ...</h3>
<ul>
<li> Editable document format (like Word or HTML).
<li> Nice for accessility point of view.
</ul>
<h2>Structure of PDF</h2>
<p>
From a data structure's point of view, PDF is a total mess in the
computer history. Originally, Adobe had a document format called
PostScript (which is also more like "graphics" format rather than
text format). It has nice graphic representation and is able to
express commercial quality typesetting. However, it has to be for
a specific printer and its file size tends to get bloated because
almost everything is represented as text. PDF is Adobe's attempt
to create a less printer dependent format with a reduced data size
(that's why it was named "portable" document format). To some
degree, PDF can be seen as a "compressed" version of PostScript
with seekable index tables. Since its drawing model and concepts
(coordinations, color spaces, etc.) remains pretty much the same
as its precedessor, Adobe decided to reuse the original PostScript
notation partially in PDF. However, this eclectic position ended
up with a disastrous situation.
<h3>Format Disaster</h2>
<p>
When designing a data format, there are two different strategies:
using text or using binary. They both have obvious merits and
demerits. The biggest merit of having textual representation is
that they are human readable and can be modified with any text
editor. The demerits of textual representation is its bloted size,
especially if you want to put something like pictures and
multimedia data like audio or video. Another demerit of textual
representation is that you need a program to serialize/deserialize
(parse) the data, which can be very complex and buggy. On the
other hand, binary representation normally doesn't require a
complex parser and takes much less space than texts. However,
they're not readable for humans. Now, Adobe decided to take the
good parts from both worlds by making PDF a partially text and
partially binary format, and as a result, PDF inherits the
drawbacks of both worlds without having much of their merits, i.e.
PDF is a human *unreadable* document format that still requires a
complex and error-prone parser and has a bloated file size.
<p>
Adobe has been probably aware of this problem from early on, and
they tried to fix this over years. So they gradually dropped text
representations and more inclided toward binaries. For example,
in PDF specification 1.5, they introduce a new notation called
"object stream" (which is different from a "stream object" that
was already there in the specification).
However, by this time there are already tons of PDFs that were
produced by the original standard, which still requires every PDF
viewer to support.
<h2>Problem of Text Extraction from PDF Documents</h2>
<p>
Many people tend to think that a PDF document is somewhat similar
to a Word or HTML document, which is not true. In fact, the primary
focus of PDF is printing and showing on a computer display, so
it is extremely versatile for showing the details of "looks"
of text typography, picture and graphics. All the texts in a PDF document is
just a bunch of string objects floating at various locations on a
blank slate. There is no text flow control and no contexual clue
about its content, except few special "tagged" PDF documents with
extra annotations that denote headlines or page boundaries, which
require specialized tools to create.
<p>
(OpenOffice, for example, has ability to create tagged PDF
documents. But the degree of the annotations is varied depending
on its implementation, and in many cases it is not possible to
obtain the full layout information by only using tags.)
<p>
Besides tagged documents, PDF doesn't care the order of text
strings rendered in a page. You can completely jumble up every
piece of strings in a PDF and still make it look like a
perfect document on the surface. Even worse, PDF allows a word to
be split in the middle and drawn as multiple unrelated strings in
order to represent precise text positioning. For example, a
certain word processing software creates a PDF that splits a word
"You" into two separate strings "Y" and "ou" because of the subtle
kerning between the letters.
<p>
So there's a huge problem associated with extracting texts properly
from PDF files. They require almost similar kinds of analysis
to optical character recognition (OCR).
<hr noshade>
<address>Yusuke Shinyama</address>
</body>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head>
<link rel="stylesheet" type="text/css" href="style.css">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>Programming with PDFMiner</title>
<style type="text/css"><!--
blockquote { background: #eeeeee; }
.comment { color: darkgreen; }
--></style>
</head><body>
</head>
<body>
<div align=right class=lastmod>
<!-- hhmts start -->
Last Modified: Sun Oct 17 09:18:29 UTC 2010
<!-- hhmts end -->
</div>
<p>
<a href="index.html">[Back to PDFMiner homepage]</a>
<h1>Programming with PDFMiner</h1>
<p>
This document explains how to use PDFMiner as a library
This page explains how to use PDFMiner as a library
from other applications.
<ul>
<li> <a href="#overview">Overview</a>
<li> <a href="#basic">Basic Usage</a>
<li> <a href="#layout">Layout Analysis</a>
<li> <a href="#toc">TOC Extraction</a>
<li> <a href="#more">more</a>
<li> <a href="#tocextract">TOC Extraction</a>
<li> <a href="#extend">Parser Extension</a>
</ul>
<a name="overview">
<hr noshade>
<h2>Overview</h2>
<h2><a name="overview">Overview</a></h2>
<p>
<strong>PDF is evil.</strong> Although it is called a PDF
"document", it's nothing like Word or HTML. PDF is more like a
picture representation. PDF contents are just a bunch of
"document", it's nothing like Word or HTML document. PDF is more
like a graphic representation. PDF contents are just a bunch of
instructions that tell how to place the stuff at each exact
position on a display or paper. In most cases, it has no logical
structure such as sentences or paragraphs and it cannot adapt
......@@ -38,6 +41,13 @@ reconstruct some of those structures by guessing from its
positioning, but there's nothing guaranteed to work. Ugly, I
know. Again, PDF is evil.
<p>
[More technical details about the internal structure of PDF:
"How to Extract Text Contents from PDF Manually"
<a href="http://www.youtube.com/watch?v=k34wRxaxA_c">(part 1)</a>
<a href="http://www.youtube.com/watch?v=_A1M4OdNsiQ">(part 2)</a>
<a href="http://www.youtube.com/watch?v=sfV_7cWPgZE">(part 3)</a>]
<p>
Because a PDF file has such a big and complex structure,
parsing a PDF file as a whole is time and memory consuming. However,
......@@ -61,9 +71,7 @@ Figure 1 shows the relationship between the classes in PDFMiner.
<small>Figure 1. Relationships between PDFMiner classes</small>
</div>
<a name="basic">
<hr noshade>
<h2>Basic Usage</h2>
<h2><a name="basic">Basic Usage</a></h2>
<p>
A typical way to parse a PDF file is the following:
<blockquote><pre>
......@@ -97,9 +105,7 @@ for page in doc.get_pages():
interpreter.process_page(page)
</pre></blockquote>
<a name="layout">
<hr noshade>
<h2>Accessing Layout Objects</h2>
<h2><a name="layout">Accessing Layout Objects</a></h2>
<p>
Here is a typical way to use the layout analysis function:
<blockquote><pre>
......@@ -174,9 +180,10 @@ Could be used for framing another pictures or figures.
<dd> Represents a polygon in a page.
</dl>
<a name="toc">
<hr noshade>
<h2>TOC Extraction</h2>
<p>
Also, check out <a href="http://denis.papathanasiou.org/?p=343">a more complete example by Denis Papathanasiou</a>.
<h2><a name="tocextract">TOC Extraction</a></h2>
<p>
PDFMiner provides functions to access the document's table of contents
("Outlines").
......@@ -205,9 +212,7 @@ way to refer to any in-page object from the outside, there's no
way to tell exactly which part of text these destinations are
refering to.
<a name="more">
<hr noshade>
<h2>More</h2>
<h2><a name="extend">Parser Extension</a></h2>
<p>
You can extend <code>PDFPageInterpreter</code> and <code>PDFDevice</code> class
......
blockquote { background: #eeeeee; }
h1 { border-bottom: solid black 2px; }
h2 { border-bottom: solid black 1px; }
#!/usr/bin/env python
__version__ = '20101017'
#!/usr/bin/env python2
__version__ = '20101226'
if __name__ == '__main__': print __version__
#!/usr/bin/env python
#!/usr/bin/env python2
""" Python implementation of Arcfour encryption algorithm.
......
#!/usr/bin/env python
#!/usr/bin/env python2
""" Python implementation of ASCII85/ASCIIHex decoder (Adobe version).
......
#!/usr/bin/env python
#!/usr/bin/env python2
""" Adobe character mapping (CMap) support.
......@@ -71,6 +71,18 @@ class CMap(object):
d = self.code2cid
return
def dump(self, out=sys.stdout, code2cid=None, code=None):
if code2cid is None:
code2cid = self.code2cid
code = ()
for (k,v) in sorted(code2cid.iteritems()):
c = code+(k,)
if isinstance(v, int):
out.write('code %r = cid %d\n' % (c,v))
else:
self.dump(out=out, code2cid=v, code=c)
return
## IdentityCMap
##
......@@ -107,6 +119,11 @@ class UnicodeMap(object):
print >>sys.stderr, 'get_unichr: %r, %r' % (self, cid)
return self.cid2unichr[cid]
def dump(self, out=sys.stdout):
for (k,v) in sorted(self.cid2unichr.iteritems()):
out.write('cid %d = unicode %r\n' % (k,v))
return
## FileCMap
##
......@@ -394,8 +411,10 @@ def main(argv):
for fname in args:
fp = file(fname, 'rb')
cmap = FileUnicodeMap()
#cmap = FileCMap()
CMapParser(cmap, fp).run()
fp.close()
cmap.dump()
return
if __name__ == '__main__': sys.exit(main(sys.argv))
This diff is collapsed.
#!/usr/bin/env python
#!/usr/bin/env python2
import re
from psparser import PSLiteral
......
#!/usr/bin/env python
#!/usr/bin/env python2
""" Font metrics for the Adobe core 14 fonts.
......
#!/usr/bin/env python
#!/usr/bin/env python2
""" Mappings from Adobe glyph names to Unicode characters.
......
#!/usr/bin/env python
#!/usr/bin/env python2
""" Standard encoding tables used in PDF.
......
This diff is collapsed.
#!/usr/bin/env python
#!/usr/bin/env python2
import sys
from sys import stderr
try:
......
#!/usr/bin/env python
#!/usr/bin/env python2
from psparser import LIT
......
#!/usr/bin/env python
#!/usr/bin/env python2
import sys
from utils import mult_matrix, translate_matrix
from utils import enc, bbox2str
......@@ -54,11 +54,6 @@ class PDFDevice(object):
##
class PDFTextDevice(PDFDevice):
def handle_undefined_char(self, cidcoding, cid):
if self.debug:
print >>sys.stderr, 'undefined: %r, %r' % (cidcoding, cid)
return '?'
def render_string(self, textstate, seq):
matrix = mult_matrix(textstate.matrix, self.ctm)
font = textstate.font
......
#!/usr/bin/env python
#!/usr/bin/env python2
import sys
try:
from cStringIO import StringIO
......@@ -399,6 +399,8 @@ class TrueTypeFont(object):
else:
for c in xrange(sc, ec+1):
char2gid[c] = (c + idd) & 0xffff
else:
assert 0
# create unicode map
unicode_map = FileUnicodeMap()
for (char,gid) in char2gid.iteritems():
......@@ -422,14 +424,14 @@ class PDFFont(object):
def __init__(self, descriptor, widths, default_width=None):
self.descriptor = descriptor
self.widths = widths
self.fontname = descriptor.get('FontName', 'unknown')
self.fontname = resolve1(descriptor.get('FontName', 'unknown'))
if isinstance(self.fontname, PSLiteral):
self.fontname = literal_name(self.fontname)
self.flags = int_value(descriptor.get('Flags', 0))
self.ascent = num_value(descriptor.get('Ascent', 0))
self.descent = num_value(descriptor.get('Descent', 0))
self.italic_angle = num_value(descriptor.get('ItalicAngle', 0))
self.default_width = default_width or descriptor.get('MissingWidth', 0)
self.default_width = default_width or num_value(descriptor.get('MissingWidth', 0))
self.leading = num_value(descriptor.get('Leading', 0))
self.bbox = list_value(descriptor.get('FontBBox', (0,0,0,0)))
self.hscale = self.vscale = .001
......@@ -668,7 +670,7 @@ class PDFCIDFont(PDFFont):
def main(argv):
for fname in argv[1:]:
fp = file(fname, 'rb')
CFFFont(fp)
font = TrueTypeFont(fname, fp)
fp.close()
return
......
#!/usr/bin/env python
#!/usr/bin/env python2
import re
from sys import stderr
from struct import pack, unpack
......@@ -808,7 +808,8 @@ class PDFPageInterpreter(object):
##
class PDFTextExtractionNotAllowed(PDFInterpreterError): pass
def process_pdf(rsrcmgr, device, fp, pagenos=None, maxpages=0, password=''):
def process_pdf(rsrcmgr, device, fp, pagenos=None, maxpages=0, password='',
check_extractable=True):
# Create a PDF parser object associated with the file object.
parser = PDFParser(fp)
# Create a PDF document object that stores the document structure.
......@@ -820,7 +821,7 @@ def process_pdf(rsrcmgr, device, fp, pagenos=None, maxpages=0, password=''):
# (If no password is set, give an empty string.)
doc.initialize(password)
# Check if the document allows text extraction. If not, abort.
if not doc.is_extractable:
if check_extractable and not doc.is_extractable:
raise PDFTextExtractionNotAllowed('Text extraction is not allowed: %r' % fp)
# Create a PDF interpreter object.
interpreter = PDFPageInterpreter(rsrcmgr, device)
......
#!/usr/bin/env python
#!/usr/bin/env python2
import sys
import re
import struct
......
#!/usr/bin/env python
#!/usr/bin/env python2
import sys
import zlib
from lzw import lzwdecode
......
This diff is collapsed.
#!/usr/bin/env python
#!/usr/bin/env python2
""" Python implementation of Rijndael encryption algorithm.
......
#!/usr/bin/env python
#!/usr/bin/env python2
#
# RunLength decoder (Adobe version) implementation based on PDF Reference
# version 1.4 section 3.3.4.
......
#!/usr/bin/env python
#!/usr/bin/env python2
"""
Miscellaneous Routines.
"""
from sys import maxint as INF
from struct import pack, unpack
......@@ -8,7 +11,7 @@ from struct import pack, unpack
MATRIX_IDENTITY = (1, 0, 0, 1, 0, 0)
def mult_matrix((a1,b1,c1,d1,e1,f1), (a0,b0,c0,d0,e0,f0)):
'''Multiplies two matrices.'''
'''Returns the multiplication of two matrices.'''
return (a0*a1+c0*b1, b0*a1+d0*b1,
a0*c1+c0*d1, b0*c1+d0*d1,
a0*e1+c0*f1+e0, b0*e1+d0*f1+f0)
......@@ -179,6 +182,54 @@ class ObjIdRange(object):
return self.nobjs
## Plane
##
## A data structure for objects placed on a plane.
## Can efficiently find objects in a certain rectangular area.
## It maintains two parallel lists of objects, each of
## which is sorted by its x or y coordinate.
##
class Plane(object):
def __init__(self, objs):
self._idxs = {}
self._xobjs = []
self._yobjs = []
return
def __repr__(self):
return ('<Plane objs=%r>' % list(self))
def __iter__(self):
return self._idxs.iterkeys()
# add(obj): place an object in a certain area.
def add(self, obj):
self._idxs[obj] = len(self._idxs)
self._xobjs.append((obj.x0, obj))
self._xobjs.append((obj.x1, obj))
self._yobjs.append((obj.y0, obj))
self._yobjs.append((obj.y1, obj))
return
# finish()
def finish(self):
self._xobjs.sort()
self._yobjs.sort()
return
# find(): finds objects that are in a certain area.
def find(self, (x0,y0,x1,y1)):
i0 = bsearch(self._xobjs, x0)[0]
i1 = bsearch(self._xobjs, x1)[1]
xobjs = set( obj for (_,obj) in self._xobjs[i0:i1] )
i0 = bsearch(self._yobjs, y0)[0]
i1 = bsearch(self._yobjs, y1)[1]
yobjs = set( obj for (_,obj) in self._yobjs[i0:i1] )
xobjs.intersection_update(yobjs)
return sorted(xobjs, key=lambda obj: self._idxs[obj])
# create_bmp
def create_bmp(data, bits, width, height):
info = pack('<IiiHHIIIIII', 40, width, height, 1, bits, 0, len(data), 0, 0, 0, 0)
......
......@@ -3,8 +3,9 @@
RM=rm -f
#CMP=cmp
CMP=:
PYTHON=python
PDF2TXT=PYTHONPATH=.. $(PYTHON) ../tools/pdf2txt.py -Dx -p1
PYTHON=python2
PDF2TXT=PYTHONPATH=.. $(PYTHON) ../tools/pdf2txt.py -p1
HTMLS=$(HTMLS_FREE) $(HTMLS_NONFREE)
HTMLS_FREE= \
......@@ -48,8 +49,7 @@ XMLS_NONFREE= \
nonfree/naacl06-shinyama.xml \
nonfree/nlp2004slides.xml
all:
$(MAKE) test CMP=cmp
all: test
test: htmls texts xmls
......
This diff is collapsed.
......@@ -3,65 +3,13 @@
</head><body>
<span style="position:absolute; border: gray 1px solid; left:0px; top:50px; width:612px; height:792px;"></span>
<div style="position:absolute; top:50px;"><a name="1">Page 1</a></div>
<span style="position:absolute; border: cyan 1px solid; left:100px; top:119px; width:61px; height:27px;"></span>
<span style="position:absolute; border: magenta 1px solid; left:100px; top:119px; width:61px; height:27px;"></span>
<span style="position:absolute; left:100px; top:119px; font-size:27px;">H</span>
<span style="position:absolute; left:117px; top:119px; font-size:27px;">e</span>
<span style="position:absolute; left:130px; top:119px; font-size:27px;">l</span>
<span style="position:absolute; left:136px; top:119px; font-size:27px;">l</span>
<span style="position:absolute; left:141px; top:119px; font-size:27px;">o</span>
<span style="position:absolute; left:154px; top:119px; font-size:27px;"> </span>
<span style="position:absolute; border: cyan 1px solid; left:261px; top:119px; width:62px; height:27px;"></span>
<span style="position:absolute; border: magenta 1px solid; left:261px; top:119px; width:62px; height:27px;"></span>
<span style="position:absolute; left:261px; top:119px; font-size:27px;">W</span>
<span style="position:absolute; left:283px; top:119px; font-size:27px;">o</span>
<span style="position:absolute; left:297px; top:119px; font-size:27px;">r</span>
<span style="position:absolute; left:305px; top:119px; font-size:27px;">l</span>
<span style="position:absolute; left:310px; top:119px; font-size:27px;">d</span>
<span style="position:absolute; border: cyan 1px solid; left:100px; top:219px; width:61px; height:27px;"></span>
<span style="position:absolute; border: magenta 1px solid; left:100px; top:219px; width:61px; height:27px;"></span>
<span style="position:absolute; left:100px; top:219px; font-size:27px;">H</span>
<span style="position:absolute; left:117px; top:219px; font-size:27px;">e</span>
<span style="position:absolute; left:130px; top:219px; font-size:27px;">l</span>
<span style="position:absolute; left:136px; top:219px; font-size:27px;">l</span>
<span style="position:absolute; left:141px; top:219px; font-size:27px;">o</span>
<span style="position:absolute; left:154px; top:219px; font-size:27px;"> </span>
<span style="position:absolute; border: cyan 1px solid; left:261px; top:219px; width:62px; height:27px;"></span>
<span style="position:absolute; border: magenta 1px solid; left:261px; top:219px; width:62px; height:27px;"></span>
<span style="position:absolute; left:261px; top:219px; font-size:27px;">W</span>
<span style="position:absolute; left:284px; top:219px; font-size:27px;">o</span>
<span style="position:absolute; left:297px; top:219px; font-size:27px;">r</span>
<span style="position:absolute; left:305px; top:219px; font-size:27px;">l</span>
<span style="position:absolute; left:310px; top:219px; font-size:27px;">d</span>
<span style="position:absolute; border: cyan 1px solid; left:100px; top:319px; width:111px; height:27px;"></span>
<span style="position:absolute; border: magenta 1px solid; left:100px; top:319px; width:111px; height:27px;"></span>
<span style="position:absolute; left:100px; top:319px; font-size:27px;">H</span>
<span style="position:absolute; left:127px; top:319px; font-size:27px;">e</span>
<span style="position:absolute; left:150px; top:319px; font-size:27px;">l</span>
<span style="position:absolute; left:166px; top:319px; font-size:27px;">l</span>
<span style="position:absolute; left:181px; top:319px; font-size:27px;">o</span>
<span style="position:absolute; left:204px; top:319px; font-size:27px;"> </span>
<span style="position:absolute; border: cyan 1px solid; left:321px; top:319px; width:102px; height:27px;"></span>
<span style="position:absolute; border: magenta 1px solid; left:321px; top:319px; width:102px; height:27px;"></span>
<span style="position:absolute; left:321px; top:319px; font-size:27px;">W</span>
<span style="position:absolute; left:354px; top:319px; font-size:27px;">o</span>
<span style="position:absolute; left:377px; top:319px; font-size:27px;">r</span>
<span style="position:absolute; left:395px; top:319px; font-size:27px;">l</span>
<span style="position:absolute; left:410px; top:319px; font-size:27px;">d</span>
<span style="position:absolute; border: cyan 1px solid; left:100px; top:419px; width:111px; height:27px;"></span>
<span style="position:absolute; border: magenta 1px solid; left:100px; top:419px; width:111px; height:27px;"></span>
<span style="position:absolute; left:100px; top:419px; font-size:27px;">H</span>
<span style="position:absolute; left:127px; top:419px; font-size:27px;">e</span>
<span style="position:absolute; left:150px; top:419px; font-size:27px;">l</span>
<span style="position:absolute; left:165px; top:419px; font-size:27px;">l</span>
<span style="position:absolute; left:181px; top:419px; font-size:27px;">o</span>
<span style="position:absolute; left:204px; top:419px; font-size:27px;"> </span>
<span style="position:absolute; border: cyan 1px solid; left:321px; top:419px; width:102px; height:27px;"></span>
<span style="position:absolute; border: magenta 1px solid; left:321px; top:419px; width:102px; height:27px;"></span>
<span style="position:absolute; left:321px; top:419px; font-size:27px;">W</span>
<span style="position:absolute; left:353px; top:419px; font-size:27px;">o</span>
<span style="position:absolute; left:377px; top:419px; font-size:27px;">r</span>
<span style="position:absolute; left:395px; top:419px; font-size:27px;">l</span>
<span style="position:absolute; left:410px; top:419px; font-size:27px;">d</span>
<div style="position:absolute; top:0px;">Page: <a href="#1">1</a></div>
<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:100px; top:119px; width:61px; height:27px;"><span style="font-family: Helvetica; font-size:19px">Hello
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:261px; top:119px; width:62px; height:27px;"><span style="font-family: Helvetica; font-size:19px">World
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:100px; top:219px; width:61px; height:27px;"><span style="font-family: Helvetica; font-size:19px">Hello
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:261px; top:219px; width:62px; height:27px;"><span style="font-family: Helvetica; font-size:19px">World
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:100px; top:319px; width:111px; height:27px;"><span style="font-family: Helvetica; font-size:19px">H e l l o
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:321px; top:319px; width:102px; height:27px;"><span style="font-family: Helvetica; font-size:19px">W o r l d
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:100px; top:419px; width:111px; height:27px;"><span style="font-family: Helvetica; font-size:19px">H e l l o
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:321px; top:419px; width:102px; height:27px;"><span style="font-family: Helvetica; font-size:19px">W o r l d
<br></span></div><div style="position:absolute; top:0px;">Page: <a href="#1">1</a></div>
</body></html>
......@@ -3,42 +3,9 @@
</head><body>
<span style="position:absolute; border: gray 1px solid; left:0px; top:50px; width:612px; height:792px;"></span>
<div style="position:absolute; top:50px;"><a name="1">Page 1</a></div>
<span style="position:absolute; border: cyan 1px solid; left:0px; top:72px; width:218px; height:79px;"></span>
<span style="position:absolute; border: magenta 1px solid; left:0px; top:72px; width:218px; height:79px;"></span>
<span style="position:absolute; left:0px; top:96px; font-size:55px;">H</span>
<span style="position:absolute; left:34px; top:96px; font-size:55px;">e</span>
<span style="position:absolute; left:61px; top:96px; font-size:55px;">l</span>
<span style="position:absolute; left:72px; top:96px; font-size:55px;">l</span>
<span style="position:absolute; left:82px; top:96px; font-size:55px;">o</span>
<span style="position:absolute; left:109px; top:72px; font-size:55px;">H</span>
<span style="position:absolute; left:144px; top:72px; font-size:55px;">e</span>
<span style="position:absolute; left:170px; top:72px; font-size:55px;">l</span>
<span style="position:absolute; left:181px; top:72px; font-size:55px;">l</span>
<span style="position:absolute; left:192px; top:72px; font-size:55px;">o</span>
<span style="position:absolute; border: cyan 1px solid; left:194px; top:136px; width:48px; height:490px;"></span>
<span style="position:absolute; border: magenta 1px solid; left:194px; top:136px; width:48px; height:490px;"></span>
<span style="position:absolute; left:194px; top:136px; font-size:48px;"></span>
<span style="position:absolute; left:194px; top:184px; font-size:48px;"></span>
<span style="position:absolute; left:194px; top:232px; font-size:48px;"></span>
<span style="position:absolute; left:194px; top:280px; font-size:48px;"></span>
<span style="position:absolute; left:194px; top:328px; font-size:48px;"></span>
<span style="position:absolute; left:194px; top:352px; font-size:48px;"></span>
<span style="position:absolute; left:194px; top:400px; font-size:48px;"></span>
<span style="position:absolute; left:194px; top:448px; font-size:48px;"></span>
<span style="position:absolute; left:194px; top:496px; font-size:48px;"></span>
<span style="position:absolute; left:194px; top:544px; font-size:48px;"></span>
<span style="position:absolute; left:218px; top:599px; font-size:27px;">W</span>
<span style="position:absolute; border: cyan 1px solid; left:241px; top:575px; width:102px; height:51px;"></span>
<span style="position:absolute; border: magenta 1px solid; left:281px; top:575px; width:62px; height:27px;"></span>
<span style="position:absolute; left:281px; top:575px; font-size:27px;">W</span>
<span style="position:absolute; left:304px; top:575px; font-size:27px;">o</span>
<span style="position:absolute; left:317px; top:575px; font-size:27px;">r</span>
<span style="position:absolute; left:325px; top:575px; font-size:27px;">l</span>
<span style="position:absolute; left:330px; top:575px; font-size:27px;">d</span>
<span style="position:absolute; border: magenta 1px solid; left:241px; top:599px; width:40px; height:27px;"></span>
<span style="position:absolute; left:241px; top:599px; font-size:27px;">o</span>
<span style="position:absolute; left:254px; top:599px; font-size:27px;">r</span>
<span style="position:absolute; left:262px; top:599px; font-size:27px;">l</span>
<span style="position:absolute; left:268px; top:599px; font-size:27px;">d</span>
<div style="position:absolute; top:0px;">Page: <a href="#1">1</a></div>
<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:0px; top:72px; width:218px; height:79px;"><span style="font-family: Helvetica; font-size:38px">HelloHello
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:tb-rl; left:194px; top:136px; width:48px; height:490px;"><span style="font-family: unknown; font-size:33px">あいうえおあいうえお </span><span style="font-family: Helvetica; font-size:19px">W
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:241px; top:575px; width:102px; height:51px;"><span style="font-family: Helvetica; font-size:19px">World
<br>orld
<br></span></div><div style="position:absolute; top:0px;">Page: <a href="#1">1</a></div>
</body></html>
#!/usr/bin/env python
#!/usr/bin/env python2
from distutils.core import setup
from pdfminer import __version__
......
......@@ -5,4 +5,4 @@ RM=rm -f
all:
clean:
-$(RM) *.pyc *.pyo
-$(RM) *.pyc *.pyo *.cgic *.cgio