4 re_comp, re_exec, re_subs, re_modw, re_fail \- regular expression handling
6 Dept. of Computer Science
24 .B void re_fail(msg, op)
36 These functions implement
38 partial regular expressions and supporting facilities.
41 compiles a pattern string into an internal form (a deterministic finite-state
42 automaton) to be executed by
46 returns 0 if the pattern is compiled successfully, otherwise it returns an
47 error message string. If
49 is called with a 0 or a \fInull\fR string, it returns without changing the
50 currently compiled regular expression.
53 supports the same limited set of
54 .I regular expressions
67 .if n .ta 0.8i +0.8i +0.8i
68 .if t .ta 0.5i +0.5i +0.5i
70 [1] \fIchar\fR Matches itself, unless it is a special
71 character (meta-character): \fB. \\ [ ] * + ^ $\fR
74 [2] \fB.\fR Matches \fIany\fR character.
77 [3] \fB\\\fR Matches the character following it, except
78 when followed by a digit 1 to 9, \fB(\fR, fB)\fR, \fB<\fR or \fB>\fR.
79 (see [7], [8] and [9]) It is used as an escape character for all
80 other meta-characters, and itself. When used
81 in a set ([4]), it is treated as an ordinary
85 [4] \fB[\fIset\fB]\fR Matches one of the characters in the set.
86 If the first character in the set is \fB^\fR,
87 it matches a character NOT in the set. A
90 is used to specify a set of
95 inclusive. The special
96 characters \fB]\fR and \fB-\fR have no special
97 meaning if they appear as the first chars
101 [a-z] any lowercase alpha
102 [^]-] any char except ] and -
103 [^A-Z] any char except
105 [a-zA-Z0-9] any alphanumeric
109 [5] \fB*\fR Any regular expression form [1] to [4], followed by
110 closure char (*) matches zero or more matches of
114 [6] \fB+\fR Same as [5], except it matches one or more.
117 [7] A regular expression in the form [1] to [10], enclosed
118 as \\(\fIform\fR\\) matches what form matches. The enclosure
119 creates a set of tags, used for [8] and for
120 pattern substitution in
122 The tagged forms are numbered
126 [8] A \\ followed by a digit 1 to 9 matches whatever a
127 previously tagged regular expression ([7]) matched.
130 [9] \fB\\<\fR Matches the beginning of a \fIword\fR,
131 that is, an empty string followed by a
132 letter, digit, or _ and not preceded by
133 a letter, digit, or _ .
135 \fB\\>\fR Matches the end of a \fIword\fR,
136 that is, an empty string preceded
137 by a letter, digit, or _ , and not
138 followed by a letter, digit, or _ .
141 [10] A composite regular expression
142 \fIxy\fR where \fIx\fR and \fIy\fR
143 are in the form of [1] to [10] matches the longest
144 match of \fIx\fR followed by a match for \fIy\fR.
147 [11] \fB^ $\fR a regular expression starting with a \fB^\fR character
148 and/or ending with a \fB$\fR character, restricts the
149 pattern matching to the beginning of the line,
150 and/or the end of line [anchors]. Elsewhere in the
151 pattern, \fB^\fR and \fB$\fR are treated as ordinary characters.
157 executes the internal form produced by
159 and searches the argument string for the regular expression described
163 returns 1 if the last regular expression pattern is matched within the string,
164 0 if no match is found. In case of an internal error (corrupted internal
167 calls the user-supplied
171 The strings passed to both
175 may have trailing or embedded newline characters. The strings
176 must be terminated by nulls.
181 pattern substitution, after a successful match is found by
183 The source string parameter to
185 is copied to the destination string with the following interpretation;
190 [1] & Substitute the entire matched string in the destination.
193 [2] \\\fIn\fR Substitute the substring matched by a tagged subpattern
194 numbered \fIn\fR, where \fIn\fR is between 1 to 9, inclusive.
197 [3] \\\fIchar\fR Treat the next character literally,
198 unless the character is a digit ([2]).
203 If the copy operation with the substitutions is successful,
206 If the source string is corrupted, or the last call to
212 add new characters into an internal table to
213 change the re_exec's understanding of what
214 a \fIword\fR should look like, when matching with \fB\\<\fR and \fB\\>\fR
215 constructs. If the string parameter is 0 or null string,
216 the table is reset back to the default, which contains \fBA-Z a-z 0-9 _\fR .
219 is a user-supplied routine to handle internal errors.
223 with an error message string, and the opcode character that caused the error.
226 routine simply prints the message and the opcode character to
231 In the examples below, the
233 describes the internal form after the pattern is compiled. For additional
234 details, refer to the sources.
239 nfaform: CHR f CHR o CLO CHR o END CLO ANY END END
240 matches: \fIfo foo fooo foobar fobar foxx ...\fR
243 nfaform: CHR f CHR o CCL 2 o b CHR a CCL 2 r z END
244 matches: \fIfobar fooar fobaz fooaz\fR
247 nfaform: CHR f CHR o CHR o CHR \\ CLO CHR \\ END END
248 matches: \fIfoo\\ foo\\\\ foo\\\\\\ ...\fR
250 \\(foo\\)[1-3]\\1 (same as foo[1-3]foo, but takes less internal space)
251 nfaform: BOT 1 CHR f CHR o CHR o EOT 1 CCL 3 1 2 3 REF 1 END
252 matches: \fIfoo1foo foo2foo foo3foo\fR
255 nfaform: BOT 1 CHR f CHR o CLO ANY END EOT 1 CHR - REF 1 END
256 matches: \fIfoo-foo fo-fo fob-fob foobar-foobar ...\fR
259 returns one of the following strings if an error occurs:
263 \fINo previous regular expression,
267 Undetermined reference,
270 Null pattern inside \e(\e),
271 Null pattern inside \e<\e>,
272 Too many \e(\e) pairs,
280 \fISoftware tools\fR Kernighan & Plauger
281 \fISoftware tools in Pascal\fR Kernighan & Plauger
282 \fIGrep sources\fR [rsx-11 C dist] David Conroy
283 \fIEd - text editor\fR Unix Programmer's Manual
284 \fIAdvanced editing on Unix\fR B. W. Kernighan
285 \fIRegExp sources\fR Henry Spencer
287 .SH "HISTORY AND NOTES"
288 These routines are derived from various implementations
291 books, and David Conroy's
293 They are NOT derived from licensed/restricted software.
294 For more interesting/academic/complicated implementations,
308 as well as their licensed counterparts, sometimes better.
309 In very few instances, they
310 are about 10% to 15% slower.
314 usenet: utzoo!yetti!oz
316 bitnet: oz@yusol || oz@yuyetti
318 ed(1), ex(1), egrep(1), fgrep(1), grep(1), regex(3)
320 These routines are \fIPublic Domain\fR. You can get them
323 The internal storage for the \fInfa form\fR is not checked for
324 overflows. Currently, it is 1024 bytes.