Eneboo - Documentación para desarrolladores
|
The QRegExp class provides pattern matching using regular expressions. Más...
#include <qregexp.h>
Tipos públicos | |
enum | CaretMode { CaretAtZero, CaretAtOffset, CaretWontMatch, CaretAtZero, CaretAtOffset, CaretWontMatch } |
enum | CaretMode { CaretAtZero, CaretAtOffset, CaretWontMatch, CaretAtZero, CaretAtOffset, CaretWontMatch } |
Métodos públicos | |
QRegExp () | |
QRegExp (const QString &pattern, bool caseSensitive=TRUE, bool wildcard=FALSE) | |
QRegExp (const QRegExp &rx) | |
~QRegExp () | |
QRegExp & | operator= (const QRegExp &rx) |
bool | operator== (const QRegExp &rx) const |
bool | operator!= (const QRegExp &rx) const |
bool | isEmpty () const |
bool | isValid () const |
QString | pattern () const |
void | setPattern (const QString &pattern) |
bool | caseSensitive () const |
void | setCaseSensitive (bool sensitive) |
bool | wildcard () const |
void | setWildcard (bool wildcard) |
bool | minimal () const |
void | setMinimal (bool minimal) |
bool | exactMatch (const QString &str) const |
int | match (const QString &str, int index=0, int *len=0, bool indexIsStart=TRUE) const |
int | search (const QString &str, int offset=0) const |
int | search (const QString &str, int offset, CaretMode caretMode) const |
int | searchRev (const QString &str, int offset=-1) const |
int | searchRev (const QString &str, int offset, CaretMode caretMode) const |
int | matchedLength () const |
int | numCaptures () const |
QStringList | capturedTexts () |
QString | cap (int nth=0) |
int | pos (int nth=0) |
QString | errorString () |
QRegExp () | |
QRegExp (const QString &pattern, bool caseSensitive=TRUE, bool wildcard=FALSE) | |
QRegExp (const QRegExp &rx) | |
~QRegExp () | |
QRegExp & | operator= (const QRegExp &rx) |
bool | operator== (const QRegExp &rx) const |
bool | operator!= (const QRegExp &rx) const |
bool | isEmpty () const |
bool | isValid () const |
QString | pattern () const |
void | setPattern (const QString &pattern) |
bool | caseSensitive () const |
void | setCaseSensitive (bool sensitive) |
bool | wildcard () const |
void | setWildcard (bool wildcard) |
bool | minimal () const |
void | setMinimal (bool minimal) |
bool | exactMatch (const QString &str) const |
int | match (const QString &str, int index=0, int *len=0, bool indexIsStart=TRUE) const |
int | search (const QString &str, int offset=0) const |
int | search (const QString &str, int offset, CaretMode caretMode) const |
int | searchRev (const QString &str, int offset=-1) const |
int | searchRev (const QString &str, int offset, CaretMode caretMode) const |
int | matchedLength () const |
int | numCaptures () const |
QStringList | capturedTexts () |
QString | cap (int nth=0) |
int | pos (int nth=0) |
QString | errorString () |
Métodos públicos estáticos | |
static QString | escape (const QString &str) |
static QString | escape (const QString &str) |
The QRegExp class provides pattern matching using regular expressions.
regular expression
Regular expressions, or "regexps", provide a way to find patterns within text. This is useful in many contexts, for example:
Validation A regexp can be used to check whether a piece of text meets some criteria, e.g. is an integer or contains no whitespace. Searching Regexps provide a much more powerful means of searching text than simple string matching does. For example we can create a regexp which says "find one of the words 'mail', 'letter' or 'correspondence' but not any of the words 'email', 'mailman' 'mailer', 'letterbox' etc." Search and Replace A regexp can be used to replace a pattern with a piece of text, for example replace all occurrences of '&' with '&' except where the '&' is already followed by 'amp;'. String Splitting A regexp can be used to identify where a string should be split into its component fields, e.g. splitting tab-delimited strings.
We present a very brief introduction to regexps, a description of Qt's regexp language, some code examples, and finally the function documentation itself. QRegExp is modeled on Perl's regexp language, and also fully supports Unicode. QRegExp can also be used in the weaker 'wildcard' (globbing) mode which works in a similar way to command shells. A good text on regexps is {Mastering Regular Expressions: Powerful Techniques for Perl and Other Tools} by Jeffrey E. Friedl, ISBN 1565922573.
Experienced regexp users may prefer to skip the introduction and go directly to the relevant information.
In case of multi-threaded programming, note that QRegExp depends on QThreadStorage internally. For that reason, QRegExp should only be used with threads started with QThread, i.e. not with threads started with platform-specific APIs.
enum QRegExp::CaretMode |
The CaretMode enum defines the different meanings of the caret (^) in a regular expression. The possible values are:
CaretAtZero The caret corresponds to index 0 in the searched string.
CaretAtOffset The caret corresponds to the start offset of the search.
CaretWontMatch The caret never matches.
enum QRegExp::CaretMode |
QRegExp::QRegExp | ( | ) |
Constructs an empty regexp.
Constructs a regular expression object for the given pattern string. The pattern must be given using wildcard notation if wildcard is TRUE (default is FALSE). The pattern is case sensitive, unless caseSensitive is FALSE. Matching is greedy (maximal), but can be changed by calling setMinimal().
QRegExp::QRegExp | ( | const QRegExp & | rx | ) |
Constructs a regular expression as a copy of rx.
QRegExp::~QRegExp | ( | ) |
Destroys the regular expression and cleans up its internal data.
QRegExp::QRegExp | ( | ) |
QRegExp::QRegExp | ( | const QRegExp & | rx | ) |
QRegExp::~QRegExp | ( | ) |
Returns the text captured by the nth subexpression. The entire match has index 0 and the parenthesized subexpressions have indices starting from 1 (excluding non-capturing parentheses).
QRegExp rxlen( "(\\d+)(?:\\s*)(cm|inch)" ); int pos = rxlen.search( "Length: 189cm" ); if ( pos > -1 ) { QString value = rxlen.cap( 1 ); // "189" QString unit = rxlen.cap( 2 ); // "cm" // ... }
The order of elements matched by cap() is as follows. The first element, cap(0), is the entire matching string. Each subsequent element corresponds to the next capturing open left parentheses. Thus cap(1) is the text of the first capturing parentheses, cap(2) is the text of the second, and so on.
cap_in_a_loop Some patterns may lead to a number of matches which cannot be determined in advance, for example:
QRegExp rx( "(\\d+)" ); str = "Offsets: 12 14 99 231 7"; QStringList list; pos = 0; while ( pos >= 0 ) { pos = rx.search( str, pos ); if ( pos > -1 ) { list += rx.cap( 1 ); pos += rx.matchedLength(); } } // list contains "12", "14", "99", "231", "7"
QStringList QRegExp::capturedTexts | ( | ) |
Returns a list of the captured text strings.
The first string in the list is the entire matched string. Each subsequent list element contains a string that matched a (capturing) subexpression of the regexp.
For example:
QRegExp rx( "(\\d+)(\\s*)(cm|inch(es)?)" ); int pos = rx.search( "Length: 36 inches" ); QStringList list = rx.capturedTexts(); // list is now ( "36 inches", "36", " ", "inches", "es" )
The above example also captures elements that may be present but which we have no interest in. This problem can be solved by using non-capturing parentheses:
QRegExp rx( "(\\d+)(?:\\s*)(cm|inch(?:es)?)" ); int pos = rx.search( "Length: 36 inches" ); QStringList list = rx.capturedTexts(); // list is now ( "36 inches", "36", "inches" )
Note that if you want to iterate over the list, you should iterate over a copy, e.g.
QStringList list = rx.capturedTexts(); QStringList::Iterator it = list.begin(); while( it != list.end() ) { myProcessing( *it ); ++it; }
Some regexps can match an indeterminate number of times. For example if the input string is "Offsets: 12 14 99 231 7" and the regexp, {rx}, is (\d+)+, we would hope to get a list of all the numbers matched. However, after calling
{rx.search(str)}, capturedTexts() will return the list ( "12", "12" ), i.e. the entire match was "12" and the first subexpression matched was "12". The correct approach is to use cap() in a loop .
The order of elements in the string list is as follows. The first element is the entire matching string. Each subsequent element corresponds to the next capturing open left parentheses. Thus capturedTexts()[1] is the text of the first capturing parentheses, capturedTexts()[2] is the text of the second and so on (corresponding to $1, $2, etc., in some other regexp languages).
QStringList QRegExp::capturedTexts | ( | ) |
bool QRegExp::caseSensitive | ( | ) | const |
bool QRegExp::caseSensitive | ( | ) | const |
Returns TRUE if case sensitivity is enabled; otherwise returns FALSE. The default is TRUE.
QString QRegExp::errorString | ( | ) |
Returns a text string that explains why a regexp pattern is invalid the case being; otherwise returns "no error occurred".
QString QRegExp::errorString | ( | ) |
Returns the string str with every regexp special character escaped with a backslash. The special characters are $, (, ), *, +, ., ?, [, \, ], ^, {, | and }.
Example:
s1 = QRegExp::escape( "bingo" ); // s1 == "bingo" s2 = QRegExp::escape( "f(x)" ); // s2 == "f\\(x\\)"
This function is useful to construct regexp patterns dynamically:
QRegExp rx( "(" + QRegExp::escape(name) + "|" + QRegExp::escape(alias) + ")" );
Returns TRUE if str is matched exactly by this regular expression; otherwise returns FALSE. You can determine how much of the string was matched by calling matchedLength().
For a given regexp string, R, exactMatch("R") is the equivalent of search("^R$") since exactMatch() effectively encloses the regexp in the start of string and end of string anchors, except that it sets matchedLength() differently.
For example, if the regular expression is blue, then exactMatch() returns TRUE only for input blue
. For inputs bluebell
, blutak
and lightblue
, exactMatch() returns FALSE and matchedLength() will return 4, 3 and 0 respectively.
Although const, this function sets matchedLength(), capturedTexts() and pos().
bool QRegExp::isEmpty | ( | void | ) | const |
Returns TRUE if the pattern string is empty; otherwise returns FALSE.
If you call exactMatch() with an empty pattern on an empty string it will return TRUE; otherwise it returns FALSE since it operates over the whole string. If you call search() with an empty pattern on any string it will return the start offset (0 by default) because the empty pattern matches the 'emptiness' at the start of the string. In this case the length of the match returned by matchedLength() will be 0.
See QString::isEmpty().
bool QRegExp::isEmpty | ( | ) | const |
bool QRegExp::isValid | ( | void | ) | const |
Returns TRUE if the regular expression is valid; otherwise returns FALSE. An invalid regular expression never matches.
The pattern [a-z is an example of an invalid pattern, since it lacks a closing square bracket.
Note that the validity of a regexp may also depend on the setting of the wildcard flag, for example *.html is a valid wildcard regexp but an invalid full regexp.
bool QRegExp::isValid | ( | ) | const |
int QRegExp::match | ( | const QString & | str, |
int | index = 0 , |
||
int * | len = 0 , |
||
bool | indexIsStart = TRUE |
||
) | const |
int QRegExp::match | ( | const QString & | str, |
int | index = 0 , |
||
int * | len = 0 , |
||
bool | indexIsStart = TRUE |
||
) | const |
Attempts to match in str, starting from position index. Returns the position of the match, or -1 if there was no match.
The length of the match is stored in *len, unless len is a null pointer.
If indexIsStart is TRUE (the default), the position index in the string will match the start of string anchor, ^, in the regexp, if present. Otherwise, position 0 in str will match.
Use search() and matchedLength() instead of this function.
int QRegExp::matchedLength | ( | ) | const |
Returns the length of the last matched string, or -1 if there was no match.
int QRegExp::matchedLength | ( | ) | const |
bool QRegExp::minimal | ( | ) | const |
bool QRegExp::minimal | ( | ) | const |
Returns TRUE if minimal (non-greedy) matching is enabled; otherwise returns FALSE.
int QRegExp::numCaptures | ( | ) | const |
Returns the number of captures contained in the regular expression.
int QRegExp::numCaptures | ( | ) | const |
Returns TRUE if this regular expression is not equal to rx; otherwise returns FALSE.
Copies the regular expression rx and returns a reference to the copy. The case sensitivity, wildcard and minimal matching options are also copied.
Returns TRUE if this regular expression is equal to rx; otherwise returns FALSE.
Two QRegExp objects are equal if they have the same pattern strings and the same settings for case sensitivity, wildcard and minimal matching.
QString QRegExp::pattern | ( | ) | const |
QString QRegExp::pattern | ( | ) | const |
Returns the pattern string of the regular expression. The pattern has either regular expression syntax or wildcard syntax, depending on wildcard().
Returns the position of the nth captured text in the searched string. If nth is 0 (the default), pos() returns the position of the whole match.
Example:
QRegExp rx( "/([a-z]+)/([a-z]+)" ); rx.search( "Output /dev/null" ); // returns 7 (position of /dev/null) rx.pos( 0 ); // returns 7 (position of /dev/null) rx.pos( 1 ); // returns 8 (position of dev) rx.pos( 2 ); // returns 12 (position of null)
For zero-length matches, pos() always returns -1. (For example, if cap(4) would return an empty string, pos(4) returns -1.) This is due to an implementation tradeoff.
Attempts to find a match in str from position offset (0 by default). If offset is -1, the search starts at the last character; if -2, at the next to last character; etc.
Returns the position of the first match, or -1 if there was no match.
The caretMode parameter can be used to instruct whether ^ should match at index 0 or at offset.
You might prefer to use QString::find(), QString::contains() or even QStringList::grep(). To replace matches use QString::replace().
Example:
QString str = "offsets: 1.23 .50 71.00 6.00"; QRegExp rx( "\\d*\\.\\d+" ); // primitive floating point matching int count = 0; int pos = 0; while ( (pos = rx.search(str, pos)) != -1 ) { count++; pos += rx.matchedLength(); } // pos will be 9, 14, 18 and finally 24; count will end up as 4
Although const, this function sets matchedLength(), capturedTexts() and pos().
Attempts to find a match backwards in str from position offset. If offset is -1 (the default), the search starts at the last character; if -2, at the next to last character; etc.
Returns the position of the first match, or -1 if there was no match.
The caretMode parameter can be used to instruct whether ^ should match at index 0 or at offset.
Although const, this function sets matchedLength(), capturedTexts() and pos().
void QRegExp::setCaseSensitive | ( | bool | sensitive | ) |
Sets case sensitive matching to sensitive.
If sensitive is TRUE, \.txt$ matches {readme.txt} but not
{README.TXT}.
void QRegExp::setCaseSensitive | ( | bool | sensitive | ) |
void QRegExp::setMinimal | ( | bool | minimal | ) |
void QRegExp::setMinimal | ( | bool | minimal | ) |
Enables or disables minimal matching. If minimal is FALSE, matching is greedy (maximal) which is the default.
For example, suppose we have the input string "We must be \<b>bold\</b>, very \<b>bold\</b>!" and the pattern <b>.*</b>. With the default greedy (maximal) matching, the match is "We must be <u>\<b>bold\</b>, very \<b>bold\</b></u>!". But with minimal (non-greedy) matching the first match is: "We must be <u>\<b>bold\</b></u>, very \<b>bold\</b>!" and the second match is "We must be \<b>bold\</b>, very <u>\<b>bold\</b></u>!". In practice we might use the pattern <b>[^<]+</b> instead, although this will still fail for nested tags.
void QRegExp::setPattern | ( | const QString & | pattern | ) |
Sets the pattern string to pattern. The case sensitivity, wildcard and minimal matching options are not changed.
void QRegExp::setPattern | ( | const QString & | pattern | ) |
void QRegExp::setWildcard | ( | bool | wildcard | ) |
Sets the wildcard mode for the regular expression. The default is FALSE.
Setting wildcard to TRUE enables simple shell-like wildcard matching. (See wildcard matching (globbing) .)
For example, r*.txt matches the string {readme.txt} in wildcard mode, but does not match
{readme}.
void QRegExp::setWildcard | ( | bool | wildcard | ) |
bool QRegExp::wildcard | ( | ) | const |
bool QRegExp::wildcard | ( | ) | const |
Returns TRUE if wildcard mode is enabled; otherwise returns FALSE. The default is FALSE.