Delicious Bookmark this on Delicious Share on Facebook SlashdotSlashdot It! Digg! Digg



PHP : Function Reference : Tokenizer Functions : token_get_all

token_get_all

Split given source into PHP tokens (PHP 4 >= 4.2.0, PHP 5)
array token_get_all ( string source )

token_get_all() parses the given source string into PHP language tokens using the Zend engine's lexical scanner.

For a list of parser tokens, see Appendix S, List of Parser Tokens, or use token_name() to translate a token value into its string representation.

Parameters

source

The PHP source to parse.

Return Values

An array of token identifiers. Each individual token identifier is either a single character (i.e.: ;, ., >, !, etc...), or a three element array containing the token index in element 0, the string content of the original token in element 1 and the line number in element 2.

Examples

Example 2559. token_get_all() examples

<?php
$tokens
= token_get_all('<?php echo; ?>'); /* => array(
                                                 array(T_OPEN_TAG, '<?php'),
                                                 array(T_ECHO, 'echo'),
                                                 ';',
                                                 array(T_CLOSE_TAG, '?>') ); */

/* Note in the following example that the string is parsed as T_INLINE_HTML
  rather than the otherwise expected T_COMMENT (T_ML_COMMENT in PHP <5).
  This is because no open/close tags were used in the "code" provided.
  This would be equivalent to putting a comment outside of <?php ?> tags in a normal file. */
$tokens = token_get_all('/* comment */'); // => array(array(T_INLINE_HTML, '/* comment */'));
?>


ChangeLog

Version Description
5.2.2 Line numbers are returned in element 2

Code Examples / Notes » token_get_all

bishop

You may want to know the line and column number at which a token begins (or ends). Since this tokenizer interface doesn't provide that information, you have to track it manually, like below:
<?php
function update_line_and_column_positions($c, &$line, &$col)
{
   // update line count
   $numNewLines = substr_count($c, "\n");
   if (1 <= $numNewLines) {
       // have new lines, add them in
       $line += $numNewLines;
       $col  =  1;
       // skip to right past the last new line, as it won't affect the column position
       $c = substr($c, strrpos($c, "\n") + 1);
       if ($c === false) {
           $c = '';
       }
   }
   // update column count
   $col += strlen($c);
}
?>
Now use it, something like:
<?php
$line = 1;
$col  = 1;
foreach ($tokens as $token) {
   if (is_array($token)) {
       list ($token, $text) = $token;
   } else if (is_string($token)) {
       $text = $token;
   }
   update_line_and_column_positions($text, $line, $col);
}
?>
Note this assumes that your desired coordinate system is 1-based (eg (1,1) is the upper left). Zero-based is left as an exercise for the reader.


leon atkinson

This function parses PHP code.  Here's an example of it's use.
<?
   $code = '<?$a = 3;?>';
   foreach(token_get_all($code) as $c)
   {
       if(is_array($c))
       {
           print(token_name($c[0]) . ": '" . htmlentities($c[1]) . "'\n");
       }
       else
       {
           print("$c\n");
       }
   }
?>


phpcomments

Regarding bertrand at toggg dot com's comment:  there is another case of the { } curly braces being used in PHP, but the token_get_all() function treats it just like a code block: string index.  Example:
<?php
$text = "Hello";
if ($text{ 0 } == 'H') {
   echo "This example uses { for both a PHP block and a string index.";
}
?>
Just in case some people were wondering.  Since PHP treats them as the same token, it makes some things a little more interesting for parsing.  You can't just assume that { ... } is a code block, it could just be a number referring to an index of a string.


bertrand

If you want to retrieve the PHP blocks then you will count up the opening curly braces '{' and down the closing ones '}' (counter zero means block finished)
CAUTION: the opening curly braces token can take 3 values:
1) '{' for all PHP code blocks,
2) T_CURLY_OPEN for "protected" variables within strings as "{$var}"
3) T_DOLLAR_OPEN_CURLY_BRACES for extended format "${var}"
On the other hand, closing token is allways '}' !
So counting up must take place on the 3 tokens:
'{' , T_CURLY_OPEN and T_DOLLAR_OPEN_CURLY_BRACES
Have fun with PHP tokenizer !


Change Language


Follow Navioo On Twitter
token_get_all
token_name
eXTReMe Tracker