MANDOC_ESCAPE(3) | Library Functions Manual | MANDOC_ESCAPE(3) |
mandoc_escape
—
parse roff escape sequences
#include
<sys/types.h>
#include
<mandoc.h>
enum mandoc_esc
mandoc_escape
(const char **end,
const char **start, int
*sz);
This function scans a roff(7) escape sequence.
An escape sequence consists of
Arguments can be given in the following forms; some escape sequence identifiers only accept some of these forms as specified below. The first three forms are called the standard forms.
[
argument]
(
ar[
ar]
.[
a]
.Upon function entry, end is expected to point to the escape sequence identifier. The values passed in as start and sz are ignored and overwritten.
By design, this function cannot handle those
roff(7) escape sequences that
require in-place expansion, in particular user-defined strings
\*
, number registers \n
,
width measurements \w
, and numerical expression
control \B
. These are handled by
roff_res
(),
a private preprocessor function called from
roff_parseln
(),
see the file roff.c.
The function
mandoc_escape
()
is used
-Tascii
and
-Thtml
, for formatting purposes, see the files
term.c and html.c,Upon function return, the pointer end is set to the character after the end of the escape sequence, such that the calling higher-level parser can easily continue.
For escape sequences taking an argument, the pointer
start is set to the beginning of the argument and
sz is set to the length of the argument. For escape
sequences not taking an argument, start is set to the
character after the end of the sequence and sz is set
to 0. Both start and sz may be
NULL
; in that case, the argument and the length are
not returned.
For sequences taking an argument, the function
mandoc_escape
() returns one of the following
values:
ESCAPE_FONT
\f
taking an argument in
standard form: \f[
, \f(
,
\f
a. Two-character arguments
starting with the character ‘C’ are reduced to one-character
arguments by skipping the ‘C’. More specific values are
returned for the most commonly used arguments:
argument | return value |
R
or 1 |
ESCAPE_FONTROMAN |
I
or 2 |
ESCAPE_FONTITALIC |
B
or 3 |
ESCAPE_FONTBOLD |
P |
ESCAPE_FONTPREV |
BI |
ESCAPE_FONTBI |
ESCAPE_SPECIAL
\C
taking an argument
delimited with the single quote character and, as a special exception, the
escape sequences
not having
an identifier, that is, those where the argument, in standard form,
directly follows the initial backslash: \C'
,
\[
, \(
,
\
a. Note that the
one-character argument short form can only be used for argument characters
that do not clash with escape sequence identifiers.
If the argument matches one of the forms described below under
ESCAPE_UNICODE
, that value is returned
instead.
The ESCAPE_SPECIAL
special character
escape sequences can be rendered using the functions
mchars_spec2cp
() and
mchars_spec2str
() described in the
mchars_alloc(3)
manual.
ESCAPE_UNICODE
ESCAPE_SPECIAL
, but with an argument of the forms
u
XXXX,
u
YXXXX, or
u10
XXXX where
X and Y are hexadecimal digits
and Y is not zero: \C'u
,
\[u
. As a special exception,
start is set to the character after the
u
, and the sz return value
does not include the u
either.
Such Unicode character escape sequences can be rendered using
the function mchars_num2uc
() described in the
mchars_alloc(3)
manual.
ESCAPE_NUMBERED
\N
followed by a delimited
argument. The delimiter character is arbitrary except that digits cannot
be used. If a digit is encountered instead of the opening delimiter, that
digit is considered to be the argument and the end of the sequence, and
ESCAPE_IGNORE
is returned.
Such ASCII character escape sequences can be rendered using
the function mchars_num2char
() described in the
mchars_alloc(3)
manual.
ESCAPE_OVERSTRIKE
\o
followed by an argument
delimited by an arbitrary character.ESCAPE_IGNORE
\s
followed by an argument
in standard form or by an argument delimited by the single quote
character: \s'
, \s[
,
\s(
,
\s
a. As a special
exception, an optional ‘+’ or ‘-’
character is allowed after the ‘s’ for all forms.\F
,
\g
, \k
,
\M
, \m
,
\n
, \V
, and
\Y
followed by an argument in standard
form.\A
,
\b
, \D
,
\R
, \X
, and
\Z
followed by an argument delimited by an
arbitrary character.\H
,
\h
, \L
,
\l
, \S
,
\v
, and \x
followed by
an argument delimited by a character that cannot occur in numerical
expressions. However, if any character that can occur in numerical
expressions is found instead of a delimiter, the sequence is
considered to end with that character, and
ESCAPE_ERROR
is returned.ESCAPE_ERROR
For sequences that do not take an argument, the function
mandoc_escape
() returns one of the following
values:
ESCAPE_SKIPCHAR
ESCAPE_NOSPACE
ESCAPE_IGNORE
This function is implemented in mandoc.c.
This function has been available since mandoc 1.11.2.
Kristaps Dzonsons
<kristaps@bsd.lv>
Ingo Schwarze
<schwarze@openbsd.org>
The function doesn't cleanly distinguish between sequences that are valid and supported, valid and ignored, valid and unsupported, syntactically invalid, or undefined. For sequences that are ignored or unsupported, it doesn't tell whether that deficiency is likely to cause major formatting problems and/or loss of document content. The function is already rather complicated and still parses some sequences incorrectly.
July 4, 2017 | Linux 6.6.28-gentoo-dist |