Delicious Bookmark this on Delicious Share on Facebook SlashdotSlashdot It! Digg! Digg



PHP : Function Reference : Regular Expression Functions (Perl-Compatible)

Regular Expression Functions (Perl-Compatible)

Introduction

The syntax for patterns used in these functions closely resembles Perl. The expression should be enclosed in the delimiters, a forward slash (/), for example. Any character can be used for delimiter as long as it's not alphanumeric or backslash (\). If the delimiter character has to be used in the expression itself, it needs to be escaped by backslash. Since PHP 4.0.4, you can also use Perl-style (), {}, [], and <> matching delimiters. See Pattern Syntax for detailed explanation.

The ending delimiter may be followed by various modifiers that affect the matching. See Pattern Modifiers.

PHP also supports regular expressions using a POSIX-extended syntax using the POSIX-extended regex functions.

Note:

This extension maintains a global per-thread cache of compiled regular expressions (up to 4096).

Warning:

You should be aware of some limitations of PCRE. Read » http://www.pcre.org/pcre.txt for more info.

Requirements

No external libraries are needed to build this extension.

Installation

Beginning with PHP 4.2.0 these functions are enabled by default. You can disable the pcre functions with --without-pcre-regex. Use --with-pcre-regex=DIR to specify DIR where PCRE's include and library files are located, if not using bundled library. For older versions you have to configure and compile PHP with --with-pcre-regex[=DIR] in order to use these functions.

The windows version of PHP has built in support for this extension. You do not need to load any additional extension in order to use these functions.

Runtime Configuration

The behaviour of these functions is affected by settings in php.ini.

Table 243. PCRE Configuration Options

Name Default Changeable Changelog
pcre.backtrack_limit "100000" PHP_INI_ALL Available since PHP 5.2.0.
pcre.recursion_limit "100000" PHP_INI_ALL Available since PHP 5.2.0.


For further details and definitions of the PHP_INI_* constants, see the Appendix I, php.ini directives.

Here's a short explanation of the configuration directives.

pcre.backtrack_limit integer

PCRE's backtracking limit.

pcre.recursion_limit integer

PCRE's recursion limit. Please note that if you set this value to a high number you may consume all the available process stack and eventually crash PHP (due to reaching the stack size limit imposed by the Operating System).

Resource Types

This extension has no resource types defined.

Predefined Constants

The constants below are defined by this extension, and will only be available when the extension has either been compiled into PHP or dynamically loaded at runtime.

Table 244. PREG constants

constant description
PREG_PATTERN_ORDER Orders results so that $matches[0] is an array of full pattern matches, $matches[1] is an array of strings matched by the first parenthesized subpattern, and so on. This flag is only used with preg_match_all().
PREG_SET_ORDER Orders results so that $matches[0] is an array of first set of matches, $matches[1] is an array of second set of matches, and so on. This flag is only used with preg_match_all().
PREG_OFFSET_CAPTURE See the description of PREG_SPLIT_OFFSET_CAPTURE. This flag is available since PHP 4.3.0.
PREG_SPLIT_NO_EMPTY This flag tells preg_split() to return only non-empty pieces.
PREG_SPLIT_DELIM_CAPTURE This flag tells preg_split() to capture parenthesized expression in the delimiter pattern as well. This flag is available since PHP 4.0.5.
PREG_SPLIT_OFFSET_CAPTURE If this flag is set, for every occurring match the appendant string offset will also be returned. Note that this changes the return values in an array where every element is an array consisting of the matched string at offset 0 and its string offset within subject at offset 1. This flag is available since PHP 4.3.0 and is only used for preg_split().
PREG_NO_ERROR Returned by preg_last_error() if there were no errors. Available since PHP 5.2.0.
PREG_INTERNAL_ERROR Returned by preg_last_error() if there was an internal PCRE error. Available since PHP 5.2.0.
PREG_BACKTRACK_LIMIT_ERROR Returned by preg_last_error() if backtrack limit was exhausted. Available since PHP 5.2.0.
PREG_RECURSION_LIMIT_ERROR Returned by preg_last_error() if recursion limit was exhausted. Available since PHP 5.2.0.
PREG_BAD_UTF8_ERROR Returned by preg_last_error() if the last error was caused by malformed UTF-8 data (only when running a regex in UTF-8 mode). Available since PHP 5.2.0.
PCRE_VERSION PCRE version and release date (e.g. "7.0 18-Dec-2006"). Available since PHP 5.2.4.


Examples

Example 1713. Examples of valid patterns

  • /<\/\w+>/
  • |(\d{3})-\d+|Sm
  • /^(?i)php[34]/
  • {^\s+(\s+)?$}


Example 1714. Examples of invalid patterns

  • /href='(.*)' - missing ending delimiter
  • /\w+\s*\w+/J - unknown modifier 'J'
  • 1-\d3-\d3-\d4| - missing starting delimiter


Table of Contents

Pattern Modifiers — Describes possible modifiers in regex patterns
Pattern Syntax — Describes PCRE regex syntax
preg_grep — Return array entries that match the pattern
preg_last_error — Returns the error code of the last PCRE regex execution
preg_match_all — Perform a global regular expression match
preg_match — Perform a regular expression match
preg_quote — Quote regular expression characters
preg_replace_callback — Perform a regular expression search and replace using a callback
preg_replace — Perform a regular expression search and replace
preg_split — Split string by a regular expression

Code Examples / Notes » ref.pcre

richardh

There's a printable PDF PCRE cheat sheet available here:
http://www.phpguru.org/article.php?ne_id=67
Has the common metacharacters, quantifiers, pattern modifiers, character classes and assertions with short explanations.


steve

Something to bear in mind is that regex is actually a declarative programming language like prolog : your regex is a set of rules which the regex interpreter tries to match against a string.   During this matching, the interpreter will assume certain things, and continue assuming them until it comes up against a failure to match, which then causes it to backtrack.  Regex assumes "greedy matching" unless explicitly told not to, which can cause a lot of backtracking.  A general rule of thumb is that the more backtracking, the slower the matching process.
It is therefore vital, if you are trying to optimise your program to run quickly (and if you can't do without regex), to optimise your regexes to match quickly.
I recommend the use of a tool such as "The Regex Coach" to debug your regex strings.
http://weitz.de/files/regex-coach.exe (Windows installer) http://weitz.de/files/regex-coach.tgz (Linux tar archive)


nickspring

Regular Expressions Tutorial on russian language is accessible on http://www.pcre.ru

biju


misc

PCRE faster than POSIX RE? Not always.
In a recent search-engine project here at Cynergi, I had a simple loop with a few cute ereg_replace() functions that took 3min to process data. I changed that 10-line loop into a 100-line hand-written code for replacement and the loop now took 10s to process the same data! This opened my eye to what can *IN SOME CASES* be very slow regular expressions.
Lately I decided to look into Perl-compatible regular expressions (PCRE). Most pages claim PCRE are faster than POSIX, but a few claim otherwise. I decided on bechmarks of my own.
My first few tests confirmed PCRE to be faster, but... the results were slightly different than others were getting, so I decided to benchmark every case of RE usage I had on a 8000-line secure (and fast) Webmail project here at Cynergi to check it out.
The results? Inconclusive! Sometimes PCRE *are* faster (sometimes by a factor greater than 100x faster!), but some other times POSIX RE are faster (by a factor of 2x).
I still have to find a rule on when are one or the other faster. It's not only about search data size, amount of data matched, or "RE compilation time" which would show when you repeated the function often: one would *always* be faster than the other. But I didn't find a pattern here. But truth be said, I also didn't take the time to look into the source code and analyse the problem.
I can give you some examples, though. The POSIX RE
([0-9]{4})/([0-9]{2})/([0-9]{2})[^0-9]+
([0-9]{2}):([0-9]{2}):([0-9]{2})
is 30% faster in POSIX than when converted to PCRE (even if you use \d and \D and non-greedy matching). On the other hand, a similarly PCRE complex pattern
/[0-9]{1,2}[ \t]+[a-zA-Z]{3}[ \t]+[0-9]{4}[ \t]+[0-9]{1,2}:[0-9]{1,2}(:[0-9]{1,2})?[ \t]+[+-][0-9]{4}/
is 2.5x faster in PCRE than in POSIX RE. Simple replacement patterns like
ereg_replace( "[^a-zA-Z0-9-]+", "", $m );
are 2x faster in POSIX RE than PCRE. And then we get confused again because a POSIX RE pattern like
(^|\n|\r)begin-base64[ \t]+[0-7]{3,4}[ \t]+......
is 2x faster as POSIX RE, but the case-insensitive PCRE
/^Received[ \t]*:[ \t]*by[ \t]+([^ \t]+)[ \t]/i
is 30x faster than its POSIX RE version!
When it comes to case sensitivity, PCRE has so far seemed to be the best option. But I found some really strange behaviour from ereg/eregi. On a very simple POSIX RE
(^|\r|\n)mime-version[ \t]*:
I found eregi() taking 3.60s (just a number in a test benchmark), while the corresponding PCRE took 0.16s! But if I used ereg() (case-sensitive) the POSIX RE time went down to 0.08s! So I investigated further. I tried to make the POSIX RE case-insensitive itself. I got as far as this:
(^|\r|\n)[mM][iI][mM][eE]-vers[iI][oO][nN][ \t]*:
This version also took 0.08s. But if I try to apply the same rule to any of the 'v', 'e', 'r' or 's' letters that are not changed, the time is back to the 3.60s mark, and not gradually, but immediatelly so! The test data didn't have any "vers" in it, other "mime" words in it or any "ion" that might be confusing the POSIX parser, so I'm at a loss.
Bottom line: always benchmark your PCRE / POSIX RE to find the fastest!
Tests were performed with PHP 5.1.2 under Windows, from the command line.
Pedro Freire
cynergi.com


stronk7

One comment about 5.2.x and the pcre.backtrack_limit:
Note that this setting wasn't present under previous PHP releases and the behaviour (or limit) under those releases was, in practise,  higher so all these PCRE functions were able to "capture" longer strings.
With the arrival of the setting, defaulting to 100000 (less than 100K), you won't be able to match/capture strings over that size using, for example "ungreedy" modifiers.
So, in a lot of situations, you'll need to raise that (very small IMO) limit.
The worst part is that PHP simply won't match/capture those strings over pcre.backtrack_limit and will it be 100% silent about that (I think that throwing some NOTICE/WARNING if raised could help a lot to developers).
There is a lot of people suffering this changed behaviour from I've read on forums, bugs and so on).
Hope this note helps, ciao :-)


hrz

If you're venturing into new regular expression territory with a lack of useful examples then it would pay to get familiar with this page:
http://www.pcre.org/man.txt


ned baldessin

If you want to perform regular expressions on Unicode strings, the PCRE functions will NOT be of any help. You need to use the Multibyte extension : mb_ereg(), mb_eregi(), pb_ereg_replace() and so on. When doing so, be carefull to set the default text encoding to the same encoding used by the text you are searching and replacing in. You can do that with the mb_regex_encoding() function. You will probably also want to set the default encoding for the other mb_* string functions with mb_internal_encoding().
So when dealing with, say, french text, I start with these :
<?php
mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');
setlocale(LC_ALL, 'fr-fr');
?>


lgandras

I read this part, but i couldn't undertand a single word beacause before i must know Basic regular expression. Somebody put a link for PERL that is almost like PHP but here is one totally dedicated to PHP:
http://weblogtoolscollection.com/regex/regex.php


gokul

I came accross this nice tutorial for regural expression in perl
http://perldoc.perl.org/perlretut.html


tabac

Hello bermi
<?php
 if(preg_match("/((a+)?)+/", "a")){
   echo "Matched";
 }
?>
Segfault is always bad, but realize what you are asking here:
"Is there one or more occurrences of zero or one sequences of one or more 'a' ?"
Considering the backtracking algorithm used, the RE engine must consider if an infinite sequence of sub matches of which all but one has a length of zero.
This is a bug, but it is in line with the famous "ls -l /usr/../*/../*/../*/../*/../*" bug


hfuecks

Good PCRE tutorial at http://www.tote-taste.de/X-Project/regex/ - well explained but still in depth

Change Language


Follow Navioo On Twitter
.NET Functions
Apache-specific Functions
Alternative PHP Cache
Advanced PHP debugger
Array Functions
Aspell functions [deprecated]
BBCode Functions
BCMath Arbitrary Precision Mathematics Functions
PHP bytecode Compiler
Bzip2 Compression Functions
Calendar Functions
CCVS API Functions [deprecated]
Class/Object Functions
Classkit Functions
ClibPDF Functions [deprecated]
COM and .Net (Windows)
Crack Functions
Character Type Functions
CURL
Cybercash Payment Functions
Credit Mutuel CyberMUT functions
Cyrus IMAP administration Functions
Date and Time Functions
DB++ Functions
Database (dbm-style) Abstraction Layer Functions
dBase Functions
DBM Functions [deprecated]
dbx Functions
Direct IO Functions
Directory Functions
DOM Functions
DOM XML Functions
enchant Functions
Error Handling and Logging Functions
Exif Functions
Expect Functions
File Alteration Monitor Functions
Forms Data Format Functions
Fileinfo Functions
filePro Functions
Filesystem Functions
Filter Functions
Firebird/InterBase Functions
Firebird/Interbase Functions (PDO_FIREBIRD)
FriBiDi Functions
FrontBase Functions
FTP Functions
Function Handling Functions
GeoIP Functions
Gettext Functions
GMP Functions
gnupg Functions
Net_Gopher
Haru PDF Functions
hash Functions
HTTP
Hyperwave Functions
Hyperwave API Functions
i18n Functions
IBM Functions (PDO_IBM)
IBM DB2
iconv Functions
ID3 Functions
IIS Administration Functions
Image Functions
Imagick Image Library
IMAP
Informix Functions
Informix Functions (PDO_INFORMIX)
Ingres II Functions
IRC Gateway Functions
PHP / Java Integration
JSON Functions
KADM5
LDAP Functions
libxml Functions
Lotus Notes Functions
LZF Functions
Mail Functions
Mailparse Functions
Mathematical Functions
MaxDB PHP Extension
MCAL Functions
Mcrypt Encryption Functions
MCVE (Monetra) Payment Functions
Memcache Functions
Mhash Functions
Mimetype Functions
Ming functions for Flash
Miscellaneous Functions
mnoGoSearch Functions
Microsoft SQL Server Functions
Microsoft SQL Server and Sybase Functions (PDO_DBLIB)
Mohawk Software Session Handler Functions
mSQL Functions
Multibyte String Functions
muscat Functions
MySQL Functions
MySQL Functions (PDO_MYSQL)
MySQL Improved Extension
Ncurses Terminal Screen Control Functions
Network Functions
Newt Functions
NSAPI-specific Functions
Object Aggregation/Composition Functions
Object property and method call overloading
Oracle Functions
ODBC Functions (Unified)
ODBC and DB2 Functions (PDO_ODBC)
oggvorbis
OpenAL Audio Bindings
OpenSSL Functions
Oracle Functions [deprecated]
Oracle Functions (PDO_OCI)
Output Control Functions
Ovrimos SQL Functions
Paradox File Access
Parsekit Functions
Process Control Functions
Regular Expression Functions (Perl-Compatible)
PDF Functions
PDO Functions
Phar archive stream and classes
PHP Options&Information
POSIX Functions
Regular Expression Functions (POSIX Extended)
PostgreSQL Functions
PostgreSQL Functions (PDO_PGSQL)
Printer Functions
Program Execution Functions
PostScript document creation
Pspell Functions
qtdom Functions
Radius
Rar Functions
GNU Readline
GNU Recode Functions
RPM Header Reading Functions
runkit Functions
SAM - Simple Asynchronous Messaging
Satellite CORBA client extension [deprecated]
SCA Functions
SDO Functions
SDO XML Data Access Service Functions
SDO Relational Data Access Service Functions
Semaphore
SESAM Database Functions
PostgreSQL Session Save Handler
Session Handling Functions
Shared Memory Functions
SimpleXML functions
SNMP Functions
SOAP Functions
Socket Functions
Standard PHP Library (SPL) Functions
SQLite Functions
SQLite Functions (PDO_SQLITE)
Secure Shell2 Functions
Statistics Functions
Stream Functions
String Functions
Subversion Functions
Shockwave Flash Functions
Swish Functions
Sybase Functions
TCP Wrappers Functions
Tidy Functions
Tokenizer Functions
Unicode Functions
URL Functions
Variable Handling Functions
Verisign Payflow Pro Functions
vpopmail Functions
W32api Functions
WDDX Functions
win32ps Functions
win32service Functions
xattr Functions
xdiff Functions
XML Parser Functions
XML-RPC Functions
XMLReader functions
XMLWriter Functions
XSL functions
XSLT Functions
YAZ Functions
YP/NIS Functions
Zip File Functions
Zlib Compression Functions
eXTReMe Tracker