Table of contents
detex - a filter to strip TeX commands from a .tex file.
detex
[-?]
[-a]
[-c]
[-e
envlist]
[-h]
[-l]
[-m]
[-n]
[-s]
[-v]
[-w]
[filename[.tex]
...]
detex
reads each file in sequence, and removes all comments, all
TeX
control sequences, and all text in inline math mode and display math
mode, and writes the remainder on the standard output.
By default,
detex
follows
\input
commands. If a file cannot be opened, a warning
message is printed and the command is ignored.
If no input files are given on the command line,
detex
reads from standard input.
detex
assumes the standard character classes (category codes) are being used
for
TeX,
and it allows white space between control sequences and magic
characters like `{' when recognizing things like
LaTeX
environments.
The
TEXINPUTS
environment variable is used to find
\input
and
\include
files.
Command-line options are single letters, and letter case is ignored.
For compatibility with GNU and POSIX conventions, options may be
introduced by either a single or a double hyphen:
-v
and
--v
are equivalent.
Multiple single-letter options can be collapsed into a single
multiletter option:
-a -c -l
and
-acl
are equivalent.
To avoid confusion with options, if a filename
begins with a hyphen, it must be disguised by a
leading absolute or relative directory path, e.g.
/tmp/-foo.tex
or
./-foo.tex.
-
-?
-
Display a brief help message on
stderr,
and then exit immediately with a success status code (0 on UNIX).
-
-a
-
Display an author credit on
stderr,
and then exit immediately with a success status code (0 on UNIX).
-
-c
-
In LaTeX mode, echo the arguments to
\cite,
\ref,
and
\pageref
macros; they are otherwise normally discarded. This option can be
useful when sending the output to a style checker.
Besides
\cite,
detex
also recognizes the
authordate1-4,
chicago,
and
harvard
citation command variants:
\altcite,
\citeA,
\citeANP,
\citeN,
\citeNP,
\citeyear,
\citeyearNP,
\fullcite,
\fullciteA,
\pageref,
\ref,
\shortcite,
and
\shortciteA.
-
-e envlist
-
Text in various environments of
LaTeX
is ignored. The default ignored environments are
align,
alignat,
array,
eqnarray,
equation,
figure,
gather,
multline,
picture,
table
and
verbatim.
The
-e
option can be used to specify a comma-separated
list
of environments to ignore. The list replaces the defaults, so
specifying an empty list effectively prevents all environments from
being ignored.
-
-h
-
Display a brief help message on
stderr,
and then exit immediately with a success status code (0 on UNIX).
-
-l
-
detex
normally assumes that it is dealing with plain TeX, or a variant
such as extended plain TeX, or AmSTeX. However, if the magic
sequence
\begin{document}
appears in the text, or an Emacs-style mode comment
% -*-LaTeX-*-
is found, or the input file has a
.ltx
extension, then
detex
assumes it is dealing with LaTeX source and it recognizes additional
constructs used in
LaTeX.
These include the
\include
and
\includeonly
commands.
The
-l
option can be used to force LaTeX mode, which is useful if the input
files would not otherwise be recognized as LaTeX files.
An Emacs-style mode comment
% -*-TeX-*-
turns off LaTeX mode.
-
-m
-
Instead of completely discarding math mode, citation, and
cross-reference text, mark their presence by a single word
[CITE],
[LABEL],
[MATH],
[PAGEREF],
or
[REF].
This is useful when the output is filtered by the doubled-word
utility,
dw(1),
because it reduces the number of bogus warnings.
-
-n
-
Ignore
\input
and
\include
commands. This allows processing of a file without examining its
subsidiary files.
-
-s
-
Older versions of
detex
would replace control sequences with a space character to prevent
words from running together. However, this caused accents in the
middle of words to break words, generating `spelling errors' that were
not desirable. The
-s
option requests the old functionality.
-
-v
-
Print a version number and date on
stderr,
and then exit immediately with a success status code (0 on UNIX).
-
-w
-
Output a word list, one `word' (string of two or more
letters and apostrophes beginning with a letter)
per line, and all other characters ignored.
Without
-w,
the output follows the original, apart with the deletions mentioned
elsewhere. Newline characters are preserved where possible so that
the lines of output match the input as closely as possible. This helps
relate line-numbered warning and error messages back to the original
source files when other tools are applied to
detex's
output.
Nesting of
\input
commands is allowed but the number of opened files must not exceed the
system's limit on the number of simultaneously opened files.
detex
ignores unrecognized option characters after printing a warning
message.
-
TEXINPUTS
-
TeX input directory search path. This is a colon-separated list of
directories to search for
\input
and
\include
files that lack a directory prefix.
detex
requires no additional files beyond those named on its command line.
dw(1),
emacs(1)
lacheck(1),
tex(1).
detex
is not a complete
TeX
interpreter, so it can be confused by some constructs. Most errors
result in too much, rather than too little, output.
Running LaTeX source without a
\begin{document}
through
detex
may produce errors.
Suggestions for improvements are encouraged.
Daniel Trinkle
Department of Computer Science
Purdue University
1398 Computer Science Building
West Lafayette, IN 47907-1398
USA
Email: [email protected]
WWW URL: http://www.cs.purdue.edu/people/trinkle