Commit 932c1d62 authored by Jakub Wilk's avatar Jakub Wilk

Correct a few grammatical errors in the manual pages.

parent 2f600519
......@@ -2,8 +2,10 @@ pdfminer (20110227+dfsg-1) UNRELEASED; urgency=low
* New upstream release.
+ Document the -V option in pdf2txt manual page.
* Correct a few grammatical errors in the manual pages. Thanks to Stefano
Rivera for help.
-- Jakub Wilk <jwilk@debian.org> Sun, 27 Feb 2011 13:36:21 +0100
-- Jakub Wilk <jwilk@debian.org> Sun, 27 Feb 2011 14:44:08 +0100
pdfminer (20101226+dfsg-1) experimental; urgency=low
......
......@@ -37,9 +37,9 @@
<refsection>
<title>Description</title>
<para>
<command>pdf2txt</command> extracts text contents from a PDF file. It extracts all the texts
that are to be rendered programmatically, ie. text represented as ASCII or Unicode strings. It
cannot recognize texts drawn as images that would require optical character recognition. It
<command>pdf2txt</command> extracts text contents from a PDF file. It extracts all the text
that is to be rendered programmatically, i.e. text represented as ASCII or Unicode strings. It
cannot recognize text drawn as images that would require optical character recognition. It
also extracts the corresponding locations, font names, font sizes, writing direction
(horizontal or vertical) for each text portion. You need to provide a password for protected
PDF documents when its access is restricted. You cannot extract any text from a PDF document
......@@ -60,7 +60,7 @@
<term><option>-p <replaceable>pageno</replaceable><replaceable>[,pageno,…]</replaceable></option></term>
<listitem>
<para>Specifies the comma-separated list of the page numbers to be extracted. Page numbers
are starting from one. By default, it extracts texts from all the pages.</para>
starts from one. By default, it extracts text from all the pages.</para>
</listitem>
</varlistentry>
<varlistentry>
......@@ -89,7 +89,7 @@
<varlistentry>
<term>xml</term>
<listitem>
<para>XML format. It provides the most information available.</para>
<para>XML format. It provides the most information.</para>
</listitem>
</varlistentry>
<varlistentry>
......@@ -137,13 +137,13 @@
<term><option>-W <replaceable>word-margin</replaceable></option></term>
<listitem>
<para>
These are the parameters used for layout analysis. In an actual PDF file, texts might be
These are the parameters used for layout analysis. In an actual PDF file, text portions might be
split into several chunks in the middle of its running, depending on the authoring
software. Therefore, text extraction needs to splice text chunks. In the figure below,
two text chunks whose distance is closer than the <replaceable>char-margin</replaceable>
is considered continuous and get grouped into one. Also, two lines whose distance is
closer than the <replaceable>line-margin</replaceable> is grouped as a text box, which
is a rectangular area that contains a “cluster” of texts. Furthermore, it may be
is a rectangular area that contains a “cluster” of text portions. Furthermore, it may be
required to insert blank characters (spaces) as necessary if the distance between two
words is greater than the <replaceable>word-margin</replaceable>, as a blank between
words might not be represented as a space, but indicated by the positioning of each word.
......@@ -165,7 +165,7 @@
<varlistentry>
<term><option>-A</option></term>
<listitem>
<para>Force to perform layout analysis for all the text strings, including texts contained
<para>Force layout analysis for all the text strings, including text contained
in figures.</para>
</listitem>
</varlistentry>
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment