Skip to content
Commits on Source (25)
ENTREZ DIRECT - README
ENTREZ DIRECT: COMMAND LINE ACCESS TO NCBI ENTREZ DATABASES
Entrez Direct (EDirect) is an advanced method for accessing the NCBI's set of interconnected Entrez databases (publication, nucleotide, protein, structure, gene, variation, expression, etc.) from a terminal window. It uses command-line arguments for the query terms and combines individual operations with UNIX pipes.
Searching, retrieving, and parsing data from NCBI databases through the Unix command line.
EDirect also provides an argument-driven function that simplifies the extraction of data from document summaries or other results that are returned in XML format. Queries can move seamlessly between EDirect commands and UNIX utilities or scripts to perform actions that cannot be accomplished entirely within Entrez.
INTRODUCTION
EDirect consists of a set of scripts that are downloaded to the user's computer. If you extract the archive in your home directory, you may need to enter:
Entrez Direct (EDirect) provides access to the NCBI's suite of interconnected databases (biomedical literature, nucleotide and protein sequence, molecular structure, gene, genome assembly, gene expression, clinical variation, etc.) from a Unix terminal window. Search terms are given in command-line arguments. Individual operations are connected with Unix pipes to allow construction of multi-step queries. Selected records can then be retrieved in a variety of formats.
PATH=$PATH:$HOME/edirect
EDirect also includes an argument-driven function that simplifies the extraction of data from document summaries or other results that are in structured XML format. This can eliminate the need for writing custom software to answer ad hoc questions. Queries can move seamlessly between EDirect commands and Unix utilities or scripts to perform actions that cannot be accomplished entirely within Entrez.
in a terminal window to temporarily add EDirect functions to the PATH environment variable so they can be run by name. You can then try EDirect by copying the sample query below and pasting it into the terminal window for execution:
PROGRAMMATIC ACCESS
esearch -db pubmed -query "Beadle AND Tatum AND Neurospora" |
Several underlying network services provide access to different facets of Entrez. These include searching by indexed terms, looking up precomputed neighbors or links, filtering results by date or category, and downloading record summaries or reports. The same functionalities are available on the web or when using programmatic methods.
EDirect navigation programs (esearch, elink, efilter, and efetch) communicate by means of a small structured message, which can be passed invisibly between operations with a Unix pipe. The message includes the current database, so it does not need to be given as an argument after the first step.
All EDirect commands are designed to work on large sets of data. There is no need to write a script to loop over records one at a time. Intermediate results are stored on the Entrez history server. For best performance, obtain an API Key from NCBI, and place the following line in your .bash_profile file:
export NCBI_API_KEY=user_api_key_goes_here
Each program also has a -help command that prints detailed information about available arguments.
NAVIGATION FUNCTIONS
Esearch performs a new Entrez search using terms in indexed fields. It requires a -db argument for the database name and uses -query to obtain the search terms. For PubMed, without field qualifiers, the server uses automatic term mapping to compose a search strategy by translating the supplied query:
esearch -db pubmed -query "selective serotonin reuptake inhibitor"
Search terms can be also qualified with bracketed field names:
esearch -db nucleotide -query "insulin [PROT] AND rodents [ORGN]"
Elink looks up precomputed neighbors within a database, or finds associated records in other databases:
elink -related
elink -target gene
Efilter limits the results of a previous query, with shortcuts that can also be used in esearch:
efilter -molecule genomic -location chloroplast -country sweden
Efetch downloads selected records or reports in a designated format:
efetch -format abstract
ENTREZ EXPLORATION
Individual query commands are connected by a Unix vertical bar pipe symbol:
esearch -db pubmed -query "transposition immunity" | efetch -format medline
PubMed related articles are calculated by a statistical algorithm using the title, abstract, and medical subject headings (MeSH terms). These connections between papers can be used for knowledge discovery.
Lycopene cyclase converts lycopene to beta-carotene, the immediate precursor of vitamin A. An initial search on the enzyme results in 232 articles. Looking up precomputed neighbors returns 14,387 PubMed papers, some of which might be expected to discuss adjacent steps in the biosynthetic pathway:
esearch -db pubmed -query "lycopene cyclase" |
elink -related |
efilter -query "NOT historical article [FILT]" |
efetch -format docsum |
xtract -pattern DocumentSummary -if Author -and Title \
-element Id -first "Author/Name" -element Title |
grep -i -e enzyme -e synthesis |
sort -t $'\t' -k 2,3f |
column -s $'\t' -t |
head -n 10 |
cut -c 1-80
This query returns the PubMed ID, first author name, and article title for PubMed "neighbors" (related citations) of the original publications. It then requires specific words in the resulting rows, sorts alphabetically by author name and title, aligns the columns, and truncates the lines for easier viewing:
2960822 Anton IA A eukaryotic repressor protein, the qa-1S gene prod
5264137 Arroyo-Begovich A In vitro formation of an active multienzyme complex
14942736 BONNER DM Gene-enzyme relationships in Neurospora.
5361218 Caroline DF Pyrimidine synthesis in Neurospora crassa: gene-enz
123642 Case ME Genetic evidence on the organization and action of
elink -target protein |
efilter -organism mouse |
efetch -format fasta
Linking to the protein database finds 251,887 sequence records, each of which has standardized organism information from the NCBI taxonomy. Limiting to proteins in mice returns 39 records. (Animals do not encode the genes involved in carotene biosynthesis.) Records are then retrieved in FASTA format. As anticipated, the results include the enzyme that splits beta-carotene into two molecules of retinal:
...
>NP_067461.2 beta,beta-carotene 15,15'-dioxygenase isoform 1 [Mus musculus]
MEIIFGQNKKEQLEPVQAKVTGSIPAWLQGTLLRNGPGMHTVGESKYNHWFDGLALLHSFSIRDGEVFYR
SKYLQSDTYIANIEANRIVVSEFGTMAYPDPCKNIFSKAFSYLSHTIPDFTDNCLINIMKCGEDFYATTE
TNYIRKIDPQTLETLEKVDYRKYVAVNLATSHPHYDEAGNVLNMGTSVVDKGRTKYVIFKIPATVPDSKK
KGKSPVKHAEVFCSISSRSLLSPSYYHSFGVTENYVVFLEQPFKLDILKMATAYMRGVSWASCMSFDRED
KTYIHIIDQRTRKPVPTKFYTDPMVVFHHVNAYEEDGCVLFDVIAYEDSSLYQLFYLANLNKDFEEKSRL
TSVPTLRRFAVPLHVDKDAEVGSNLVKVSSTTATALKEKDGHVYCQPEVLYEGLELPRINYAYNGKPYRY
IFAAEVQWSPVPTKILKYDILTKSSLKWSEESCWPAEPLFVPTPGAKDEDDGVILSAIVSTDPQKLPFLL
ILDAKSFTELARASVDADMHLDLHGLFIPDADWNAVKQTPAETQEVENSDHPTDPTAPELSHSENDFTAG
HGGSSL
...
STRUCTURED DATA EXTRACTION
The xtract program uses command-line arguments to direct the conversion of XML data into a tab-delimited table. The -pattern argument divides the results into rows, while placement of data into columns is controlled by -element.
Formatting arguments allow extensive customization of the output. The line break between -pattern objects can be changed with -ret, and the tab character between -element fields can be replaced by -tab.
The -sep argument is used to distinguish multiple elements of the same type, and controls their separation independently of the -tab argument. The -sep value also applies to unrelated -element arguments that are grouped with commas. The query:
efetch -db pubmed -id 6271474,1413997,16589597 -format docsum |
xtract -pattern DocumentSummary -sep "|" -element Id PubDate Name
returns a table with individual author names separated by vertical bars:
6271474 1981 Casadaban MJ|Chou J|Lemaux P|Tu CP|Cohen SN
1413997 1992 Oct Mortimer RK|Contopoulou CR|King JS
16589597 1954 Dec Garber ED
Selection arguments are specialized derivatives of -element. Among these are positional commands (-first and -last) and numeric processing operations (including -num, -len, -sum, -min, -max, and -avg). There are also functions that perform sequence coordinate conversion (-0-based, -1-based, and -ucsc-based).
NESTED EXPLORATION
Exploration arguments (-pattern, -group, -block, and -subset) limit data extraction to specified regions of the XML, visiting all relevant objects one at a time. This design allows nested exploration of complex, hierarchical data to be controlled by a linear chain of command-line argument statements.
PubmedArticle XML contains the MeSH terms applied to a publication. Each MeSH term can have its own unique set of qualifiers. A single level of nested exploration within the current pattern:
esearch -db gene -query "beta-carotene oxygenase 1" -organism human |
elink -target pubmed | efilter -released last_year | efetch -format xml |
xtract -pattern PubmedArticle -element MedlineCitation/PMID \
-block MeshHeading \
-pfc "\n" -sep "/" -element DescriptorName,QualifierName
retains the proper association of subheadings for each MeSH term:
30396924
Age Factors
Animals
Cell Cycle Proteins/deficiency/genetics/metabolism
Cellular Senescence/physiology
...
CONDITIONAL EXECUTION
Conditional processing arguments (-if and -unless) restrict exploration by object name and value. These may be used in conjunction with string or numeric constraints:
esearch -db pubmed -query "Casadaban MJ [AUTH]" |
efetch -format xml |
xtract -pattern PubmedArticle -if "#Author" -lt 6 \
-block Author -if LastName -is-not Casadaban \
-sep ", " -tab "\n" -element LastName,Initials |
sort-uniq-count-rank
to select papers with fewer than 6 authors and print a table of the most frequent coauthors:
11 Chou, J
8 Cohen, SN
7 Groisman, EA
...
SAVING DATA IN VARIABLES
A value can be recorded in a variable and used wherever needed. Variables are created by a hyphen followed by a name consisting of a string of capital letters or digits (e.g., -PMID). Values are retrieved by placing an ampersand before the variable name (e.g., "&PMID") in an -element statement:
efetch -db pubmed -id 3201829,6301692,781293 -format xml |
xtract -pattern PubmedArticle -PMID MedlineCitation/PMID \
-block Author -element "&PMID" \
-sep " " -tab "\n" -element Initials,LastName
producing a list of authors, with the PubMed Identifier (PMID) in the first column of each row:
3201829 JR Johnston
3201829 CR Contopoulou
3201829 RK Mortimer
6301692 MA Krasnow
6301692 NR Cozzarelli
781293 MJ Casadaban
The variable can be used even though the original object is no longer visible inside the -block section.
SEQUENCE QUALIFIERS
The NCBI represents sequence records in a data model based on the central dogma of molecular biology. A sequence can have multiple features, which carry information about the biology of a given region, including the transformations involved in gene expression. A feature can have multiple qualifiers, which store specific details about that feature (e.g., name of the gene, genetic code used for translation).
The data hierarchy is easily explored using a -pattern {sequence} -group {feature} -block {qualifier} construct. As a convenience, an -insd helper function is provided for generating the appropriate nested extraction commands from feature and qualifier names on the command line. Processing the results of a search on cone snail venom:
esearch -db protein -query "conotoxin" -feature mat_peptide |
efetch -format gpc |
xtract -insd complete mat_peptide "%peptide" product peptide |
grep -i conotoxin | sort -t $'\t' -u -k 2,2n
returns the accession, length, name, and sequence for a sample of neurotoxic peptides:
ADB43131.1 15 conotoxin Cal 1b LCCKRHHGCHPCGRT
ADB43128.1 16 conotoxin Cal 5.1 DPAPCCQHPIETCCRR
AIC77105.1 17 conotoxin Lt1.4 GCCSHPACDVNNPDICG
ADB43129.1 18 conotoxin Cal 5.2 MIQRSQCCAVKKNCCHVG
ADD97803.1 20 conotoxin Cal 1.2 AGCCPTIMYKTGACRTNRCR
AIC77085.1 21 conotoxin Bt14.8 NECDNCMRSFCSMIYEKCRLK
ADB43125.1 22 conotoxin Cal 14.2 GCPADCPNTCDSSNKCSPGFPG
AIC77154.1 23 conotoxin Bt14.19 VREKDCPPHPVPGMHKCVCLKTC
...
EDirect will run on UNIX and Macintosh computers that have the Perl language installed, and under the Cygwin UNIX-emulation environment on Windows PCs.
INSTALLATION
EDirect consists of a set of scripts and programs that are downloaded to the user's computer.
EDirect will run on Unix and Macintosh computers that have the Perl language installed, and under the Cygwin Unix-emulation environment on Windows PCs.
To install the EDirect software, copy the following commands and paste them into a terminal window:
cd ~
/bin/bash
perl -MNet::FTP -e \
'$ftp = new Net::FTP("ftp.ncbi.nlm.nih.gov", Passive => 1);
$ftp->login; $ftp->binary;
$ftp->get("/entrez/entrezdirect/edirect.tar.gz");'
gunzip -c edirect.tar.gz | tar xf -
rm edirect.tar.gz
builtin exit
export PATH=${PATH}:$HOME/edirect >& /dev/null || setenv PATH "${PATH}:$HOME/edirect"
./edirect/setup.sh
This downloads several scripts into an "edirect" folder in the user's home directory. The setup.sh script then downloads any missing Perl modules, and may print an additional command for updating the PATH environment variable in the user's configuration file. Copy that command, if present, and paste it into the terminal window to complete the installation process. The editing instructions will look something like:
echo "export PATH=\$PATH:\$HOME/edirect" >> $HOME/.bash_profile
DOCUMENTATION
Documentation for EDirect is on the web at:
http://www.ncbi.nlm.nih.gov/books/NBK179288
Questions or comments on EDirect may be sent to eutilities@ncbi.nlm.nih.gov.
Information on how to obtain an API Key is described in this NCBI blogpost:
https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities
Questions or comments on EDirect may be sent to info@ncbi.nlm.nih.gov.
......@@ -21,6 +21,7 @@ if [ "$#" -gt 0 ]
then
target="$1"
MASTER=$(cd "$target" && pwd)
CONFIG=${MASTER}
shift
else
if [ -z "${EDIRECT_PUBMED_MASTER}" ]
......@@ -70,7 +71,7 @@ do
mkdir -p "$MASTER/$dir"
done
for dir in Current Indexed Inverted Merged Pubmed
for dir in Indexed Inverted Merged Pubmed
do
mkdir -p "$WORKING/$dir"
done
......@@ -98,3 +99,20 @@ fetch-pubmed -path "$MASTER/Archive" |
xtract -pattern Author -if Affiliation -contains Medicine \
-pfx "Archive is " -element Initials
echo ""
if [ -n "$CONFIG" ]
then
target=bash_profile
if ! grep "$target" "$HOME/.bashrc" >/dev/null 2>&1
then
if [ ! -f $HOME/.$target ] || grep 'bashrc' "$HOME/.$target" >/dev/null 2>&1
then
target=bashrc
fi
fi
echo ""
echo "For convenience, please execute the following to save the archive path to a variable:"
echo ""
echo " echo \"export EDIRECT_PUBMED_MASTER='${CONFIG}'\" >>" "\$HOME/.$target"
echo ""
fi
ncbi-entrez-direct (10.9.20190205+ds-1) unstable; urgency=medium
* New upstream release.
* debian/man/{archive-pubmed,download-pubmed,edirect,efilter,
entrez-phrase-search,esearch,espell,fetch-pubmed,index-pubmed,
phrase-search,rchive,stream-pubmed,transmute,xtract}.1: Update
accordingly.
* debian/man/{local-phrase-search,pm-{clean,current,erase,log,repack,uids,
verify}}.1: Retire per the corresponding scripts.
* debian/rules: Stop installing retired scripts local-phrase-search and
(implicitly) pm-{clean,current,erase,log,repack,uids,verify}.
-- Aaron M. Ucko <ucko@debian.org> Wed, 06 Feb 2019 22:51:27 -0500
ncbi-entrez-direct (10.5.20181204+ds-2) unstable; urgency=medium
* debian/control: Actually build-depend on dh-golang.
......
.TH ARCHIVE-PUBMED 1 2018-10-08 NCBI "NCBI Entrez Direct User's Manual"
.TH ARCHIVE-PUBMED 1 2019-02-06 NCBI "NCBI Entrez Direct User's Manual"
.SH NAME
archive\-pubmed \- populate a local NCBI PubMed archive from scratch
.SH SYNOPSIS
......@@ -34,7 +34,7 @@ Defaults to the primary local archive directory when not set.
.BR download\-pubmed (1),
.BR fetch\-pubmed (1),
.BR index\-pubmed (1),
.BR local\-phrase\-search (1),
.BR phrase\-search (1),
.BR pm\-prepare (1),
.BR pm\-refresh (1),
.BR pm\-stash (1),
......
.TH DOWNLOAD-PUBMED 1 2018-11-12 NCBI "NCBI Entrez Direct User's Manual"
.TH DOWNLOAD-PUBMED 1 2019-02-06 NCBI "NCBI Entrez Direct User's Manual"
.SH NAME
download\-pubmed \- download a NCBI PubMed archive dump
.SH SYNOPSIS
......@@ -20,12 +20,9 @@ Archive section(s) to download.
.BR ftp\-cp (1),
.BR ftp\-ls (1),
.BR index\-pubmed (1),
.BR local\-phrase\-search (1),
.BR pm\-clean (1),
.BR pm\-log (1),
.BR phrase\-search (1),
.BR pm\-prepare (1),
.BR pm\-refresh (1),
.BR pm\-stash (1),
.BR pm\-verify (1),
.BR rchive (1),
.BR transmute (1).
.TH EDIRECT 1 2018-10-08 NCBI "NCBI Entrez Direct User's Manual"
.TH EDIRECT 1 2019-02-06 NCBI "NCBI Entrez Direct User's Manual"
.SH NAME
edirect \- access NCBI Entrez from the command line
.SH SYNOPSIS
......@@ -71,6 +71,10 @@ Print the internal URL query and XML results of each step.
Specify a particular server for quality assurance testing.
.SH ENVIRONMENT
.TP
.B EDIRECT_DO_AUTO_ABBREV
Accept (currently) unambiguous truncated option names,
as in \fB\-verbo\fP for \fB\-verbose\fP.
.TP
.B NCBI_API_KEY
NCBI E\-Utilities API key,
allowing for a higher request rate
......
.TH EFILTER 1 2018-11-18 NCBI "NCBI Entrez Direct User's Manual"
.TH EFILTER 1 2019-02-06 NCBI "NCBI Entrez Direct User's Manual"
NCBI "NCBI Entrez Direct User's Manual"
.SH NAME
efilter \- filter and/or sort NCBI Entrez search results
.SH SYNOPSIS
\fBefilter\fP (\fBefetch \-filter\fP)
[\|\fB\-help\fP\|]
[\|\fB\-query\fP\ \fIstr\fP\|]
[\|\fB\-q\fP\|[\|\fBuery\fP\|] \fIstr\fP\|]
[\|\fB\-sort\fP\ \fIfield\fP\|]
[\|\fB\-days\fP\ \fIN\fP\|]
[\|\fB\-datetype\fP\ \fIfield\fP\|]
......@@ -15,6 +15,8 @@ efilter \- filter and/or sort NCBI Entrez search results
[\|\fB\-pairs\ \fIfield\fP\|]
[\|\fB\-spell\fP\|]
[\|\fB\-pub\fP\ \fItype\fP\|]
[\|\fB\-journal\fP\ \fIname\fP\|]
[\|\fB\-released\fP\ \fIwhen\fP\|]
[\|\fB\-country\fP\ \fIname\fP\|]
[\|\fB\-feature\fP\ \fItype\fP\|]
[\|\fB\-location\fP\ \fItype\fP\|]
......@@ -23,6 +25,7 @@ efilter \- filter and/or sort NCBI Entrez search results
[\|\fB\-source\fP\ \fItype\fP\|]
[\|\fB\-status\ alive\fP\|]
[\|\fB\-type\fP\ \fItype\fP\|]
[\|\fB\-class\fP\ \fIclass\fP\|]
[\|\fB\-label\fP\ \fIname\fP\|]
.SH DESCRIPTION
\fBefilter\fP filters and/or sorts results
......@@ -30,7 +33,7 @@ from a previous \fBedirect\fP(1) search.
.SH OPTIONS
.SS Query Specification
.TP
\fB\-query\fP\ \fIstr\fP
\fB\-q\fP\|[\|\fBuery\fP\|]\fP\ \fIstr\fP
Limit results to those matching the given query string.
.SS Document Order
.TP
......@@ -69,12 +72,19 @@ Correct misspellings in query.
.BR free ,
.BR historical ,
.BR journal ,
.BR last_week ,
.BR last_month ,
.BR last_year ,
.BR preprint ,
.BR review ,
.BR structured .
.TP
\fB\-journal\fP\ \fIname\fP
.BR pnas ,
\fB"j bacteriol"\fP, ...
.TP
\fB\-released\fP\ \fIwhen\fP
.BR last_week ,
.BR last_month ,
.BR last_year ,
.BR prev_years .
.SS Sequence Filters
.TP
\fB\-country\fP\ \fIname\fP
......@@ -134,6 +144,17 @@ Correct misspellings in query.
\fB\-type\fP\ \fItype\fP
.BR coding ,
.BR pseudo .
.SS SNP Filters
.TP
\fB\-class\fP\ \fIclass\fP
.BR acceptor ,
.BR donor ,
.BR frameshift ,
.BR indel ,
.BR intron ,
.BR missense ,
.BR nonsense ,
.BR synonymous .
.SS Miscellaneous Arguments
.TP
\fB\-help\fP
......
.TH ENTREZ-PHRASE-SEARCH 1 2018-09-16 NCBI "NCBI Entrez Direct User's Manual"
.TH ENTREZ-PHRASE-SEARCH 1 2019-02-06 NCBI "NCBI Entrez Direct User's Manual"
.SH NAME
entrez\-phrase\-search \- search NCBI Entrez for phrases
.SH SYNOPSIS
......@@ -45,4 +45,4 @@ Phrase to search for.
.BR edirect (1),
.BR esearch (1),
.BR filter\-stop\-words (1),
.BR local\-phrase\-search (1).
.BR phrase\-search (1).
.TH ESEARCH 1 2017-10-05 NCBI "NCBI Entrez Direct User's Manual"
.TH ESEARCH 1 2019-02-06 NCBI "NCBI Entrez Direct User's Manual"
.SH NAME
esearch \- search an NCBI Entrez database
.SH SYNOPSIS
\fBesearch\fP (\fBedirect\ \-search\fP)
[\|\fB\-help\fP\|]
\fB\-db\fP\ \fIname\fP
\fB\-query\fP\ \fIstr\fP
\fB\-q\fP\|[\|\fBuery\fP\|]\fP\ \fIstr\fP
[\|\fB\-sort\fP\ \fIfield\fP\|]
[\|\fB\-days\fP\ \fIN\fP\|]
[\|\fB\-datetype\fP\ \fIfield\fP\|]
......@@ -23,7 +23,7 @@ esearch \- search an NCBI Entrez database
\fB\-db\fP\ \fIname\fP
Entrez database name.
.TP
\fB\-query\fP\ \fIstr\fP
\fB\-q\fP\|[\|\fBuery\fP\|]\fP\ \fIstr\fP
Query string.
.SS Document Order
.TP
......
.TH ESPELL 1 2017-01-24 NCBI "NCBI Entrez Direct User's Manual"
.TH ESPELL 1 2019-02-06 NCBI "NCBI Entrez Direct User's Manual"
.SH NAME
espell \- spell\-correct an NCBI Entrez query
.SH SYNOPSIS
\fBespell\fP (\fBedirect\ \-spell\fP)
[\|\fB\-help\fP\|]
\fB\-db\fP\ \fIname\fP
\fB\-query\fP\ \fIstr\fP
\fB\-q\fP\|[\|\fBuery\fP\|]\fP\ \fIstr\fP
.SH DESCRIPTION
\fBespell\fP produces an NCBI \fBeSpellResult\fP XML document
indicating what changes, if any,
......@@ -18,7 +18,7 @@ Print usage information.
\fB\-db\fP\ \fIname\fP
Entrez database name.
.TP
\fB\-query\fP\ \fIstr\fP
\fB\-q\fP\|[\|\fBuery\fP\|]\fP\ \fIstr\fP
Query string.
.SH SEE ALSO
.BR edirect (1),
......
.TH FETCH-PUBMED 1 2018-10-08 NCBI "NCBI Entrez Direct User's Manual"
.TH FETCH-PUBMED 1 2019-02-06 NCBI "NCBI Entrez Direct User's Manual"
.SH NAME
fetch\-pubmed \- fetch records from a local NCBI PubMed archive by UID
.SH SYNOPSIS
......@@ -26,7 +26,7 @@ Expected to hold an absolute path;
mandatory when not supplying a path on the command line.
.SH SEE ALSO
.BR archive\-pubmed (1),
.BR local\-phrase\-search (1),
.BR phrase\-search (1),
.BR rchive (1),
.BR stream\-pubmed (1),
.BR xtract (1).
.TH INDEX-PUBMED 1 2018-10-08 NCBI "NCBI Entrez Direct User's Manual"
.TH INDEX-PUBMED 1 2019-02-06 NCBI "NCBI Entrez Direct User's Manual"
.SH NAME
index\-pubmed \- populate and index a local NCBI PubMed archive from scratch
.SH SYNOPSIS
......@@ -35,8 +35,7 @@ Defaults to the primary local archive directory when not set.
.BR archive\-pubmed (1),
.BR download\-pubmed (1),
.BR fetch\-pubmed (1),
.BR local\-phrase\-search (1),
.BR pm\-current (1),
.BR phrase\-search (1),
.BR pm\-index (1),
.BR pm\-invert (1),
.BR pm\-merge (1),
......
.TH LOCAL-PHRASE-SEARCH 1 2018-11-18 NCBI "NCBI Entrez Direct User's Manual"
.SH NAME
local\-phrase\-search, phrase\-search \- search an indexed local NCBI PubMed archive
.SH SYNOPSIS
.RB [\| local\- \|] phrase\-search
[\|\fB\-h\fP|\|\fB\-help\fP|\fB\-\-help\fP\|]
[\|\fB\-path\fP|\fB\-master\fP\ \fIdir\fP\|]
[\|\fB\-count\fP|\fB\-counts\fP|\fB\-search\fP|\fB\-exact\fP|\fB\-query\fP\|]
\fIquery...\fP
.SH DESCRIPTION
.RB [\| local\- \|] phrase\-search
searches an indexed local NCBI PubMed archive
(as prepared by \fBindex\-pubmed\fP(1)).
.SH OPTIONS
.TP
\fB\-h\fP|\|\fB\-help\fP|\fB\-\-help\fP
Print usage information.
.TP
\fB\-path\fP|\fB\-master\fP\ \fIdir\fP
Search the local archive in \fIdir\fP.
.TP
\fB\-count\fP
Print terms and counts, merging wildcards.
.TP
\fB\-counts\fP
Expand wildcards; print individual term counts.
.TP
\fB\-search\fP
Expansive search using stemmed words.
.TP
\fB\-exact\fP
Strict search for article title round\-tripping.
.TP
\fB\-query\fP (default mode)
Search on words or phrases in Boolean formulas.
.TP
\fIquery...\fP
What to search for;
may contain standard Boolean operators
\fBAND\fP, \fBOR\fP, and \fBNOT\fP,
using parentheses as needed for grouping.
.SH ENVIRONMENT
.TP
.B EDIRECT_PUBMED_MASTER
Local archive directory to use
in the absence of \fB\-path\fP/\fB\-master\fP.
Expected to hold an absolute path;
mandatory when not supplying a path on the command line.
.SH SEE ALSO
.BR entrez\-phrase\-search (1),
.BR fetch\-pubmed (1),
.BR index\-pubmed (1),
.BR rchive (1),
.BR xtract (1).
.so man1/local-phrase-search.1
.TH PHRASE-SEARCH 1 2019-02-06 NCBI "NCBI Entrez Direct User's Manual"
.SH NAME
phrase\-search \- search an indexed local NCBI PubMed archive
.SH SYNOPSIS
.RB phrase\-search
[\|\fB\-h\fP|\|\fB\-help\fP|\fB\-\-help\fP\|]
[\|\fB\-path\fP|\fB\-master\fP\ \fIdir\fP\|]
[\|\fB\-count\fP|\fB\-counts\fP|\fB\-search\fP|\fB\-exact\fP|\fB\-query\fP\|]
\fIquery...\fP
.SH DESCRIPTION
.RB phrase\-search
searches an indexed local NCBI PubMed archive
(as prepared by \fBindex\-pubmed\fP(1)).
.SH OPTIONS
.TP
\fB\-h\fP|\|\fB\-help\fP|\fB\-\-help\fP
Print usage information.
.TP
\fB\-path\fP|\fB\-master\fP\ \fIdir\fP
Search the local archive in \fIdir\fP.
.TP
\fB\-count\fP
Print terms and counts, merging wildcards.
.TP
\fB\-counts\fP
Expand wildcards; print individual term counts.
.TP
\fB\-search\fP
Expansive search using stemmed words.
.TP
\fB\-exact\fP
Strict search for article title round\-tripping.
.TP
\fB\-query\fP (default mode)
Search on words or phrases in Boolean formulas.
.TP
\fIquery...\fP
What to search for;
may contain standard Boolean operators
\fBAND\fP, \fBOR\fP, and \fBNOT\fP,
using parentheses as needed for grouping.
.SH ENVIRONMENT
.TP
.B EDIRECT_PUBMED_MASTER
Local archive directory to use
in the absence of \fB\-path\fP/\fB\-master\fP.
Expected to hold an absolute path;
mandatory when not supplying a path on the command line.
.SH SEE ALSO
.BR entrez\-phrase\-search (1),
.BR fetch\-pubmed (1),
.BR index\-pubmed (1),
.BR rchive (1),
.BR xtract (1).
.TH PM-CLEAN 1 2018-10-08 NCBI "NCBI Entrez Direct User's Manual"
.SH NAME
pm\-clean \- clean up NCBI PubMed XML formatting
.SH SYNOPSIS
.B pm\-clean
\fIdir\fP
.SH DESCRIPTION
\fBpm\-clean\fP cleans up the formatting
of the compressed NCBI PubMed XML files in the current directory,
writing its output to correspondingly named files in \fIdir\fP.
.SH OPTIONS
.TP
\fIdir\fP
Place output in \fIdir\fP.
.SH SEE ALSO
.BR download\-pubmed (1),
.BR xtract (1).
.TH PM-CURRENT 1 2018-11-18 NCBI "NCBI Entrez Direct User's Manual"
.SH NAME
pm\-current \- extract latest versions of NCBI PubMed XML records
.SH SYNOPSIS
.B pm\-current
\fItarget\fP
\fIarchive\fP
.SH DESCRIPTION
\fBpm\-current\fP extracts the latest versions
of the compressed NCBI PubMed XML files
archived under \fIarchive\fP,
writing its output to files in \fItarget\fP
with names along the lines of \fBpubmed001.xml.gz\fP,
replacing any existing \fB*.xml.gz\fP files there.
.SH OPTIONS
.TP
\fItarget\fP
Place output in \fItarget\fP.
.TP
\fIarchive\fP
Read XML from \fIarchive\fP.
.SH SEE ALSO
.BR download\-pubmed (1),
.BR index\-pubmed (1),
.BR pm\-index (1),
.BR stream-pubmed (1),
.BR xtract (1).
.TH PM-ERASE 1 2018-09-16 NCBI "NCBI Entrez Direct User's Manual"
.SH NAME
pm\-erase \- remove selected records from a local NCBI PubMed archive
.SH SYNOPSIS
\fBpm\-erase\fP
\fIdir\fP
.SH DESCRIPTION
\fBpm\-erase\fP reads a list of NCBI PubMed identifiers
from standard input
and removes the corresponding records
from the local archive in \fIdir\fP.
.SH SEE ALSO
.BR archive\-pubmed (1),
.BR index\-pubmed (1).
.TH PM-LOG 1 2018-10-08 NCBI "NCBI Entrez Direct User's Manual"
.SH NAME
pm\-log \- summarize contents of NCBI PubMed dumps (updates)
.SH SYNOPSIS
\fBpm\-log\fP
.SH DESCRIPTION
\fBpm\-log\fP examines the NCBI PubMed article sets
in the current directory
and writes a list of added PubMed IDs (as is)
and any deleted PubMed IDs
(with \fBD\fP in a tab\-delimited second column)
to a file named \fBtransactions.txt\fP
in the current directory.
(If such a file already exists, \fBpm\-log\fP will overwrite it.)
.SH SEE ALSO
.BR download\-pubmed (1),
.BR xtract (1).
.TH PM-REPACK 1 2018-09-16 NCBI "NCBI Entrez Direct User's Manual"
.SH NAME
pm\-repack \- repair and repack NCBI PubMed article sets
.SH SYNOPSIS
\fBpm\-repack\fP
.SH DESCRIPTION
\fBpm\-repack\fP replaces
the compressed NCBI PubMed article sets in the current directory
with repaired uncompressed equivalents.
.SH SEE ALSO
.BR rchive (1),
.BR xtract (1).
.TH PM-UIDS 1 2018-09-16 NCBI "NCBI Entrez Direct User's Manual"
.SH NAME
pm\-uids \- summarize contents of NCBI PubMed tries
.SH SYNOPSIS
\fBpm\-uids\fP
\fIdir\fP
.SH DESCRIPTION
\fBpm\-uids\fP lists the NCBI PubMed article IDs
in the \fBrchive\fP\-produced trie in \fIdir\fP.
.SH SEE ALSO
.BR archive\-pubmed (1),
.BR index\-pubmed (1),
.BR pm\-stash,
.BR rchive (1).