Delicious Bookmark this on Delicious Share on Facebook SlashdotSlashdot It! Digg! Digg



PHP : Function Reference : Tidy Functions

Tidy Functions

Introduction

Tidy is a binding for the Tidy HTML clean and repair utility which allows you to not only clean and otherwise manipulate HTML documents, but also traverse the document tree.

Requirements

To use Tidy, you will need libtidy installed, available on the tidy homepage » http://tidy.sourceforge.net/.

Installation

Tidy is currently available for PHP 4.3.x and PHP 5 as a PECL extension from » http://pecl.php.net/package/tidy.

Note:

Tidy 1.0 is just for PHP 4.3.x, while Tidy 2.0 is just for PHP 5.

If » PEAR is available on your *nix-like system you can use the pear installer to install the tidy extension, by the following command: pecl install tidy.

You can always download the tar.gz package and install tidy by hand:

Example 2532. tidy install by hand in PHP 4.3.x

gunzip tidy-xxx.tgz
tar -xvf tidy-xxx.tar
cd tidy-xxx
phpize
./configure && make && make install


Windows users can download the extension dll from » http://pecl4win.php.net/ext.php/php_tidy.dll.

In PHP 5 you need only to compile using the --with-tidy option.

Runtime Configuration

The behaviour of these functions is affected by settings in php.ini.

Table 321. Tidy Configuration Options

Name Default Changeable Changelog
tidy.default_config "" PHP_INI_SYSTEM Available since PHP 5.0.0.
tidy.clean_output "0" PHP_INI_USER PHP_INI_PERDIR in PHP 5. Available since PHP 5.0.0.


For further details and definitions of the PHP_INI_* constants, see the Appendix I, php.ini directives.

Here's a short explanation of the configuration directives.

tidy.default_config string

Default path for tidy config file.

tidy.clean_output boolean

Turns on/off the output repairing by Tidy.

Warning:

Do not turn on tidy.clean_output if you are generating non-html content such as dynamic images.

Resource Types

This extension has no resource types defined.

Predefined Classes

tidyNode

Methods

Properties

  • value - the value of the node (e.g. the html text)

  • name - the name of the tag (e.g. html, a, etc..)

  • type - the type of the node (one of the constants above, e.g. TIDY_NODETYPE_PHP)

  • line* - the line where the node starts

  • column* - the column where the node starts

  • proprietary* - TRUE if the node refers to a proprietary tag

  • id - the ID of the tag (one of the constants above, e.g. TIDY_TAG_FRAME)

  • attribute - an array with the attributes of the current node, or NULL if there aren't any

  • child - an array with the child tidyNodes, or NULL if there aren't any

Note:

The properties marked with * are just available since PHP 5.1.0.

Predefined Constants

The constants below are defined by this extension, and will only be available when the extension has either been compiled into PHP or dynamically loaded at runtime.

Each TIDY_TAG_XXX represents a HTML tag. For example, TIDY_TAG_A represents a <a href="XX">link</a> tag. Each TIDY_ATTR_XXX represents a HTML atribute. For example TIDY_ATTR_HREF would represent the href atribute in the previous example.

The following constants are defined:

Table 322. tidy tag constants

constant
TIDY_TAG_UNKNOWN
TIDY_TAG_A
TIDY_TAG_ABBR
TIDY_TAG_ACRONYM
TIDY_TAG_ALIGN
TIDY_TAG_APPLET
TIDY_TAG_AREA
TIDY_TAG_B
TIDY_TAG_BASE
TIDY_TAG_BASEFONT
TIDY_TAG_BDO
TIDY_TAG_BGSOUND
TIDY_TAG_BIG
TIDY_TAG_BLINK
TIDY_TAG_BLOCKQUOTE
TIDY_TAG_BODY
TIDY_TAG_BR
TIDY_TAG_BUTTON
TIDY_TAG_CAPTION
TIDY_TAG_CENTER
TIDY_TAG_CITE
TIDY_TAG_CODE
TIDY_TAG_COL
TIDY_TAG_COLGROUP
TIDY_TAG_COMMENT
TIDY_TAG_DD
TIDY_TAG_DEL
TIDY_TAG_DFN
TIDY_TAG_DIR
TIDY_TAG_DIV
TIDY_TAG_DL
TIDY_TAG_DT
TIDY_TAG_EM
TIDY_TAG_EMBED
TIDY_TAG_FIELDSET
TIDY_TAG_FONT
TIDY_TAG_FORM
TIDY_TAG_FRAME
TIDY_TAG_FRAMESET
TIDY_TAG_H1
TIDY_TAG_H2
TIDY_TAG_H3
TIDY_TAG_H4
TIDY_TAG_H5
TIDY_TAG_H6
TIDY_TAG_HEAD
TIDY_TAG_HR
TIDY_TAG_HTML
TIDY_TAG_I
TIDY_TAG_IFRAME
TIDY_TAG_ILAYER
TIDY_TAG_IMG
TIDY_TAG_INPUT
TIDY_TAG_INS
TIDY_TAG_ISINDEX
TIDY_TAG_KBD
TIDY_TAG_KEYGEN
TIDY_TAG_LABEL
TIDY_TAG_LAYER
TIDY_TAG_LEGEND
TIDY_TAG_LI
TIDY_TAG_LINK
TIDY_TAG_LISTING
TIDY_TAG_MAP
TIDY_TAG_MARQUEE
TIDY_TAG_MENU
TIDY_TAG_META
TIDY_TAG_MULTICOL
TIDY_TAG_NOBR
TIDY_TAG_NOEMBED
TIDY_TAG_NOFRAMES
TIDY_TAG_NOLAYER
TIDY_TAG_NOSAVE
TIDY_TAG_NOSCRIPT
TIDY_TAG_OBJECT
TIDY_TAG_OL
TIDY_TAG_OPTGROUP
TIDY_TAG_OPTION
TIDY_TAG_P
TIDY_TAG_PARAM
TIDY_TAG_PLAINTEXT
TIDY_TAG_PRE
TIDY_TAG_Q
TIDY_TAG_RP
TIDY_TAG_RT
TIDY_TAG_RTC
TIDY_TAG_RUBY
TIDY_TAG_S
TIDY_TAG_SAMP
TIDY_TAG_SCRIPT
TIDY_TAG_SELECT
TIDY_TAG_SERVER
TIDY_TAG_SERVLET
TIDY_TAG_SMALL
TIDY_TAG_SPACER
TIDY_TAG_SPAN
TIDY_TAG_STRIKE
TIDY_TAG_STRONG
TIDY_TAG_STYLE
TIDY_TAG_SUB
TIDY_TAG_TABLE
TIDY_TAG_TBODY
TIDY_TAG_TD
TIDY_TAG_TEXTAREA
TIDY_TAG_TFOOT
TIDY_TAG_TH
TIDY_TAG_THEAD
TIDY_TAG_TITLE
TIDY_TAG_TR
TIDY_TAG_TR
TIDY_TAG_TT
TIDY_TAG_U
TIDY_TAG_UL
TIDY_TAG_VAR
TIDY_TAG_WBR
TIDY_TAG_XMP


Table 323. tidy attribute constants

constant
TIDY_ATTR_UNKNOWN
TIDY_ATTR_ABBR
TIDY_ATTR_ACCEPT
TIDY_ATTR_ACCEPT_CHARSET
TIDY_ATTR_ACCESSKEY
TIDY_ATTR_ACTION
TIDY_ATTR_ADD_DATE
TIDY_ATTR_ALIGN
TIDY_ATTR_ALINK
TIDY_ATTR_ALT
TIDY_ATTR_ARCHIVE
TIDY_ATTR_AXIS
TIDY_ATTR_BACKGROUND
TIDY_ATTR_BGCOLOR
TIDY_ATTR_BGPROPERTIES
TIDY_ATTR_BORDER
TIDY_ATTR_BORDERCOLOR
TIDY_ATTR_BOTTOMMARGIN
TIDY_ATTR_CELLPADDING
TIDY_ATTR_CELLSPACING
TIDY_ATTR_CHAR
TIDY_ATTR_CHAROFF
TIDY_ATTR_CHARSET
TIDY_ATTR_CHECKED
TIDY_ATTR_CITE
TIDY_ATTR_CLASS
TIDY_ATTR_CLASSID
TIDY_ATTR_CLEAR
TIDY_ATTR_CODE
TIDY_ATTR_CODEBASE
TIDY_ATTR_CODETYPE
TIDY_ATTR_COLOR
TIDY_ATTR_COLS
TIDY_ATTR_COLSPAN
TIDY_ATTR_COMPACT
TIDY_ATTR_CONTENT
TIDY_ATTR_COORDS
TIDY_ATTR_DATA
TIDY_ATTR_DATAFLD
TIDY_ATTR_DATAPAGESIZE
TIDY_ATTR_DATASRC
TIDY_ATTR_DATETIME
TIDY_ATTR_DECLARE
TIDY_ATTR_DEFER
TIDY_ATTR_DIR
TIDY_ATTR_DISABLED
TIDY_ATTR_ENCODING
TIDY_ATTR_ENCTYPE
TIDY_ATTR_FACE
TIDY_ATTR_FOR
TIDY_ATTR_FRAME
TIDY_ATTR_FRAMEBORDER
TIDY_ATTR_FRAMESPACING
TIDY_ATTR_GRIDX
TIDY_ATTR_GRIDY
TIDY_ATTR_HEADERS
TIDY_ATTR_HEIGHT
TIDY_ATTR_HREF
TIDY_ATTR_HREFLANG
TIDY_ATTR_HSPACE
TIDY_ATTR_HTTP_EQUIV
TIDY_ATTR_ID
TIDY_ATTR_ISMAP
TIDY_ATTR_LABEL
TIDY_ATTR_LANG
TIDY_ATTR_LANGUAGE
TIDY_ATTR_LAST_MODIFIED
TIDY_ATTR_LAST_VISIT
TIDY_ATTR_LEFTMARGIN
TIDY_ATTR_LINK
TIDY_ATTR_LONGDESC
TIDY_ATTR_LOWSRC
TIDY_ATTR_MARGINHEIGHT
TIDY_ATTR_MARGINWIDTH
TIDY_ATTR_MAXLENGTH
TIDY_ATTR_MEDIA
TIDY_ATTR_METHOD
TIDY_ATTR_MULTIPLE
TIDY_ATTR_NAME
TIDY_ATTR_NOHREF
TIDY_ATTR_NORESIZE
TIDY_ATTR_NOSHADE
TIDY_ATTR_NOWRAP
TIDY_ATTR_OBJECT
TIDY_ATTR_OnAFTERUPDATE
TIDY_ATTR_OnBEFOREUNLOAD
TIDY_ATTR_OnBEFOREUPDATE
TIDY_ATTR_OnBLUR
TIDY_ATTR_OnCHANGE
TIDY_ATTR_OnCLICK
TIDY_ATTR_OnDATAAVAILABLE
TIDY_ATTR_OnDATASETCHANGED
TIDY_ATTR_OnDATASETCOMPLETE
TIDY_ATTR_OnDBLCLICK
TIDY_ATTR_OnERRORUPDATE
TIDY_ATTR_OnFOCUS
TIDY_ATTR_OnKEYDOWN
TIDY_ATTR_OnKEYPRESS
TIDY_ATTR_OnKEYUP
TIDY_ATTR_OnLOAD
TIDY_ATTR_OnMOUSEDOWN
TIDY_ATTR_OnMOUSEMOVE
TIDY_ATTR_OnMOUSEOUT
TIDY_ATTR_OnMOUSEOVER
TIDY_ATTR_OnMOUSEUP
TIDY_ATTR_OnRESET
TIDY_ATTR_OnROWENTER
TIDY_ATTR_OnROWEXIT
TIDY_ATTR_OnSELECT
TIDY_ATTR_OnSUBMIT
TIDY_ATTR_OnUNLOAD
TIDY_ATTR_PROFILE
TIDY_ATTR_PROMPT
TIDY_ATTR_RBSPAN
TIDY_ATTR_READONLY
TIDY_ATTR_REL
TIDY_ATTR_REV
TIDY_ATTR_RIGHTMARGIN
TIDY_ATTR_ROWS
TIDY_ATTR_ROWSPAN
TIDY_ATTR_RULES
TIDY_ATTR_SCHEME
TIDY_ATTR_SCOPE
TIDY_ATTR_SCROLLING
TIDY_ATTR_SELECTED
TIDY_ATTR_SHAPE
TIDY_ATTR_SHOWGRID
TIDY_ATTR_SHOWGRIDX
TIDY_ATTR_SHOWGRIDY
TIDY_ATTR_SIZE
TIDY_ATTR_SPAN
TIDY_ATTR_SRC
TIDY_ATTR_STANDBY
TIDY_ATTR_START
TIDY_ATTR_STYLE
TIDY_ATTR_SUMMARY
TIDY_ATTR_TABINDEX
TIDY_ATTR_TARGET
TIDY_ATTR_TEXT
TIDY_ATTR_TITLE
TIDY_ATTR_TOPMARGIN
TIDY_ATTR_TYPE
TIDY_ATTR_USEMAP
TIDY_ATTR_VALIGN
TIDY_ATTR_VALUE
TIDY_ATTR_VALUETYPE
TIDY_ATTR_VERSION
TIDY_ATTR_VLINK
TIDY_ATTR_VSPACE
TIDY_ATTR_WIDTH
TIDY_ATTR_WRAP
TIDY_ATTR_XML_LANG
TIDY_ATTR_XML_SPACE
TIDY_ATTR_XMLNS


Table 324. tidy nodetype constants

constant description
TIDY_NODETYPE_ROOT root node
TIDY_NODETYPE_DOCTYPE doctype
TIDY_NODETYPE_COMMENT HTML comment
TIDY_NODETYPE_PROCINS Processing Instruction
TIDY_NODETYPE_TEXT Text
TIDY_NODETYPE_START start tag
TIDY_NODETYPE_END end tag
TIDY_NODETYPE_STARTEND empty tag
TIDY_NODETYPE_CDATA CDATA
TIDY_NODETYPE_SECTION XML section
TIDY_NODETYPE_ASP ASP code
TIDY_NODETYPE_JSTE JSTE code
TIDY_NODETYPE_PHP PHP code
TIDY_NODETYPE_XMLDECL XML declaration


Examples

This simple example shows basic Tidy usage.

Example 2533. Basic Tidy usage

<?php
ob_start
();
?>
<html>a html document</html>
<?php
$html
= ob_get_clean();

// Specify configuration
$config = array(
         
'indent'         => true,
         
'output-xhtml'   => true,
         
'wrap'           => 200);

// Tidy
$tidy = new tidy;
$tidy->parseString($html, $config, 'utf8');
$tidy->cleanRepair();

// Output
echo $tidy;
?>


Table of Contents

ob_tidyhandler — ob_start callback function to repair the buffer
tidy_access_count — Returns the Number of Tidy accessibility warnings encountered for specified document
tidy_clean_repair — Execute configured cleanup and repair operations on parsed markup
tidy_config_count — Returns the Number of Tidy configuration errors encountered for specified document
tidy::__construct — Constructs a new tidy object
tidy_diagnose — Run configured diagnostics on parsed and repaired markup
tidy_error_count — Returns the Number of Tidy errors encountered for specified document
tidy_get_body — Returns a tidyNode Object starting from the <body> tag of the tidy parse tree
tidy_get_config — Get current Tidy configuration
tidy_get_error_buffer — Return warnings and errors which occurred parsing the specified document
tidy_get_head — Returns a tidyNode Object starting from the <head> tag of the tidy parse tree
tidy_get_html_ver — Get the Detected HTML version for the specified document
tidy_get_html — Returns a tidyNode Object starting from the <html> tag of the tidy parse tree
tidy_get_opt_doc — Returns the documentation for the given option name
tidy_get_output — Return a string representing the parsed tidy markup
tidy_get_release — Get release date (version) for Tidy library
tidy_get_root — Returns a tidyNode object representing the root of the tidy parse tree
tidy_get_status — Get status of specified document
tidy_getopt — Returns the value of the specified configuration option for the tidy document
tidy_is_xhtml — Indicates if the document is a XHTML document
tidy_is_xml — Indicates if the document is a generic (non HTML/XHTML) XML document
tidy_load_config — Load an ASCII Tidy configuration file with the specified encoding
tidy_node->get_attr — Return the attribute with the provided attribute id
tidy_node->get_nodes — Return an array of nodes under this node with the specified id
tidy_node->next — Returns the next sibling to this node
tidy_node->prev — Returns the previous sibling to this node
tidy_parse_file — Parse markup in file or URI
tidy_parse_string — Parse a document stored in a string
tidy_repair_file — Repair a file and return it as a string
tidy_repair_string — Repair a string using an optionally provided configuration file
tidy_reset_config — Restore Tidy configuration to default values
tidy_save_config — Save current settings to named file
tidy_set_encoding — Set the input/output character encoding for parsing markup
tidy_setopt — Updates the configuration settings for the specified tidy document
tidy_warning_count — Returns the Number of Tidy warnings encountered for specified document
tidyNode->hasChildren — Returns true if this node has children
tidyNode->hasSiblings — Returns true if this node has siblings
tidyNode->isAsp — Returns true if this node is ASP
tidyNode->isComment — Returns true if this node represents a comment
tidyNode->isHtml — Returns true if this node is part of a HTML document
tidyNode->isJste — Returns true if this node is JSTE
tidyNode->isPhp — Returns true if this node is PHP
tidyNode->isText — Returns true if this node represents text (no markup)
tidyNode::getParent — returns the parent node of the current node

Code Examples / Notes » ref.tidy

shuster

Valid XHTML STRICT
<?php
if (function_exists('tidy_repair_string'))
{
$xhtml = tidy_repair_string($xhtml, array('output-xhtml' => true, 'show-body-only' => true, 'doctype' => 'strict', 'drop-font-tags' => true, 'drop-proprietary-attributes' => true, 'lower-literals' => true, 'quote-ampersand' => true, 'wrap' => 0), 'raw');
}
?>


tonygambone

Using PHP 5.1.2 on Win32/IIS, I noticed that even with "output-xhtml: yes," tidy was adding the deprecated name attribute to form tags (using the value of the id attribute).  Grabbing the latest dll from the snaps link at the top of the page fixed this.

mohan

To those who need to install libtidy on mac os x , here is a guide that worked for me :
If you're on Mac OS X, you'll need to tell the Makefile that you use
ranlib:
   $ export set RANLIB=ranlib
Change to the directory with the Makefile in it, and run make.
This example uses the GNU make Makefile.
   $ cd tidy/build/gmake/
   $ make
   if [ ! -d ./obj ]; then mkdir ./obj; fi
   gcc -o obj/access.o ...
   ... etc etc etc ...
Install the libs, headers and the tidy executable:
   $ sudo make install
If you're on Mac OS X, you'll have to run ranlib again on the installed
lib:
   $ sudo ranlib /usr/local/lib/libtidy.a


guillaume

To install correctly Tidy for PHP5 on Ubuntu, follow this link :
http://ubuntuforums.org/showthread.php?t=195636
In fact, you need to run a "make clean" before the commands "make" and "make install"


paul cook

To get libtidy and PHP 5.0.5 compiled on OS X Tiger this is what I needed to do:
1) download and upack the tidy source.
2) cd tidy-source-dir
3) >> /bin/sh build/gnuauto/setup.sh
4) then you can configure/make/make install as normal
PHP build generates errors because of tidy so I needed to edit the platform.h file like this (use your favorite command line editor):
5) >> sudo emacs /usr/local/include/platform.h
6) comment out line 508 which was causing the 'duplicate "unsigned" ' error in the PHP build.
7) configure/make/make install PHP as normal using --with-tidy=/usr/local
Restart apache and everything works now.  HTH someone.


19-feb-2005 11:47

There is a HTML/XHTML validator based on tidy at http://validator.aborla.net/
It is released under LGPL.


jon dowland bugs

Rough installation instructions for debian/testing:
Use debian's apt package manager to install the required development packages
$ apt-get install php4-dev php4-pear libtidy-dev
Then use pear to install tidy
$ pear install tidy
Note: I did /not/ have success installing the tarball locally. Only using this method was the .so put in the correct place.
I also had to add an entry to the php.ini
$ echo extension=tidy.so >> /etc/php4/apache/php.ini
$ apachectl restart
...and you're done.


13-jan-2005 07:20

Just in case anyone else has been having problems using the tidy extension in *PHP4 v4.3.10. Here is a working example:
$html = '<HTML><HEAD></HEAD><BODY>Hello World</BODY></HTML>';
$config = array('indent'=> TRUE,
               'output-xhtml' => TRUE,
               'wrap' => 80);
tidy_set_encoding('UTF8');
foreach ($config as $key => $value) {
  tidy_setopt($key,$value);
}
tidy_parse_string($html);
tidy_clean_repair();
echo tidy_get_output();
Resultant HTML should be similar to:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
</head>
<body>
Hello World
</body>
</html>


tom

It should be noted that the examples on this page apply ONLY to PHP5. None of the functions in the manual apply to PHP4. The names are the same but arguments are different on some of them (tidy_parse_string).
If you wish to use tidy in PHP 4.3.x you can use the following example instead:
<?php
$tidyhtml = ob_get_contents();
if( function_exists( 'tidy_parse_string' ) ) {
       tidy_set_encoding('iso-8859-1');
       tidy_parse_string($tidyhtml);
       tidy_setopt('output-xhtml', TRUE);
       tidy_setopt('indent', TRUE);
       tidy_setopt('indent-spaces', 2);
       tidy_setopt('wrap', 200);
       tidy_clean_repair();
       $tidyhtml = tidy_get_output();
}
ob_end_clean();
echo $tidyhtml;
?>
Hope that helps somebody.


bill dot mccuistion

Installing tidy on Fedora Core 2 required three libraries:
tidy...
tidy-devel...
libtidy...
All of which I found at http://rpm.pbone.net
Then, finally, could "./configure --with-tidy"
Hope this helps someone out.  This was "REALLY" hard (for me) to figure out as no where else was clearly documented.


doodleelephant

I'm installing PHP 5.0.2 on Redhat Linux (I forget the version. Enterprise WS 3 I think) I had troubles installing the libtidy. It consistently complained that it could not find 'libtidy'. I finally got a clue into how to install it (in build/gnuauto/readme.txt). This is how I finally got it to install (after lots of trial and error):
First, don't get the binary distribution of of tidy.sf.net. It's not what you want. You need the source distribution.
Command by command this is what I did:
=======
wget http://tidy.sourceforge.net/src/tidy_src.tgz
tar -xzf tidy_src.tgz
cd tidy
/bin/sh build/gnuauto/setup.sh
./configure --prefix=/usr
make
make install
cd [php source directory]
./configure --with-tidy=/usr --[other extensions]
make
make install
=======
Tada. Finally it doesn't complain when I configure PHP about the installation. The info I needed was stuck in that build/gnuauto/readme.txt file in the tidy directory.
Took me a while. Hope my trials can help others save time.
Doodleelephant


info att tcknetwork doot com

I have been searching for an easy way to check an entire website against HTML/XHTML formatting (no error, compilant, etc.), tidy is very useful for that :
<?php
/** aready checked pages */
$e=array();
/** webpages to check */
$t=array("/web/test.com/");
/** forbidden extensions (typically linked ressources) */
$x=explode(",","jpg,gif,png,doc,xls,pdf");
echo "<pre>";
while ($t[0]) {
// already checked or a ressource => skip
if (in_array($t[0],$e) || in_array(substr($t[0],-3),$x)) array_shift($t);
else { $c=array_shift($t); $e[]=$c; $t=array_merge($t,ck($c)); }
}
echo "</pre>";
/**
check_vailidty($url,$server)
return : list of the internal links of the page
*/
function ck($u,$s="http://127.0.0.1") {
$c=array("indent"=>1,"output-xhtml"=>1,"accessibility-check"=>3);
$t=tidy_parse_string(file_get_contents($s.$u),$c);
tidy_clean_repair($t);
if (tidy_error_count($t)) { // we have error, display them
 echo "FAIL ".htmlentities($u)." (".tidy_error_count($t)." errors)\n";
 echo htmlentities(tidy_get_error_buffer($t))."\n";
} else { // all right
 echo "OK ".htmlentities($u)."\n";
}
// return all the links inside the page
return gl(tidy_get_root($t),substr($u,-1)=="/"?$u:dirname($u)."/");
}
/**
get_links($tinynode,$baseurl)
return : list of the links
*/
function gl($t,$b) {
 $r=array();
 $c=count($t->child);
 for ($i=0;$i<$c;$i++) {
   $e=&$t->child[$i];
   if ($e->name=="a") { // a link
     $h=$e->attribute["href"]; // url
     if (substr($h,0,4)!="http") { // prevent external links
       $r[]=sp(substr($h,0,1)=="/"?$h:$b.$h);
     }
   } else { // not a link, search recursively inside
     $r=array_merge($r,gl($e,$b));
   }
 }
 return $r;
}
/**
simplify_path($path)
return : simplified path
*/
function sp($p) {
 while ($o!=$p) {
   $o=$p;
   $p=str_replace(array("//","/./"),"/",$p);
   $p=preg_replace("/\/[^\/]+\/..\//","/",$p);
 }
 return $p;
}
?>
Limitation : does not detect javascript-generated links. Check about set_time_limit(0) if you have a lot of webpages.


matteo dot contri

i had many problem with a javascript that grab mouse event on image and tidy (obviously).
I found this solution:
'output-xhtml'  => false
and everything is working again!


patatraboum

<?php
//
//The tidy tree of your favorite !
//For PHP 5 (CGI)
//Thanks to john@php.net
//
$file="http://www.php.net";
//
$cns=get_defined_constants(true);
$tidyCns=array("tags"=>array(),"types"=>array());
foreach($cns["tidy"] as $cKey=>$cVal){
if($cPos=strpos($cKey,$cStr="TAG")) $tidyCns["tags"][$cVal]="$cStr : ".substr($cKey,$cPos+strlen($cStr)+1);
elseif($cPos=strpos($cKey,$cStr="TYPE")) $tidyCns["types"][$cVal]="$cStr : ".substr($cKey,$cPos+strlen($cStr)+1);
}
$tidyNext=array();
//
echo "<html><head><meta http-equiv='Content-Type' content='text/html; charset=windows-1252'><title>Tidy Tree :: $file</title></head>";
echo "<body><pre>";
//
tidyTree(tidy_get_root(tidy_parse_file($file)),0);
//
function tidyTree($tidy,$level){
global $tidyCns,$tidyNext;
$tidyTab=array();
$tidyKeys=array("type","value","id","attribute");
foreach($tidy as $pKey=>$pVal){
if(in_array($pKey,$tidyKeys)) $tidyTab[array_search($pKey,$tidyKeys)]=$pVal;
}
ksort($tidyTab);
foreach($tidyTab as $pKey=>$pVal){
switch($pKey){
case 0 :
if($pVal==4) $value=true; else $value=false;
echo indent(true,$level).$tidyCns["types"][$pVal]."\n"; break;
case 1 :
if($value){
echo indent(false,$level)."VALEUR : ".str_replace("\n","\n".indent(false,$level),$pVal)."\n";
}
break;
case 2 :
echo indent(false,$level).$tidyCns["tags"][$pVal]."\n"; break;
case 3 :
if($pVal!=NULL){
echo indent(false,$level)."ATTRIBUTS : ";
foreach ($pVal as $aKey=>$aVal) echo "$aKey=$aVal "; echo "\n";
}
}
}
if($tidy->hasChildren()){
$level++; $i=0;
$tidyNext[$level]=true;
echo indent(false,$level)."\n";
foreach($tidy->child as $child){
$i++;
if($i==count($tidy->child)) $tidyNext[$level]=false;
tidyTree($child,$level);
}
}
else echo indent(false,$level)."\n";
}
//
function indent($tidyType,$level){
global $tidyNext;
$indent="";
for($i=1;$i<=$level;$i++){
if($i<$level||!$tidyType){
if($tidyNext[$i]) $str="|  "; else $str="   ";
}
else $str="+--";
$indent=$indent.$str;
}
return $indent;
}
//
echo "</pre></body></html>";
//
?>


Change Language


Follow Navioo On Twitter
.NET Functions
Apache-specific Functions
Alternative PHP Cache
Advanced PHP debugger
Array Functions
Aspell functions [deprecated]
BBCode Functions
BCMath Arbitrary Precision Mathematics Functions
PHP bytecode Compiler
Bzip2 Compression Functions
Calendar Functions
CCVS API Functions [deprecated]
Class/Object Functions
Classkit Functions
ClibPDF Functions [deprecated]
COM and .Net (Windows)
Crack Functions
Character Type Functions
CURL
Cybercash Payment Functions
Credit Mutuel CyberMUT functions
Cyrus IMAP administration Functions
Date and Time Functions
DB++ Functions
Database (dbm-style) Abstraction Layer Functions
dBase Functions
DBM Functions [deprecated]
dbx Functions
Direct IO Functions
Directory Functions
DOM Functions
DOM XML Functions
enchant Functions
Error Handling and Logging Functions
Exif Functions
Expect Functions
File Alteration Monitor Functions
Forms Data Format Functions
Fileinfo Functions
filePro Functions
Filesystem Functions
Filter Functions
Firebird/InterBase Functions
Firebird/Interbase Functions (PDO_FIREBIRD)
FriBiDi Functions
FrontBase Functions
FTP Functions
Function Handling Functions
GeoIP Functions
Gettext Functions
GMP Functions
gnupg Functions
Net_Gopher
Haru PDF Functions
hash Functions
HTTP
Hyperwave Functions
Hyperwave API Functions
i18n Functions
IBM Functions (PDO_IBM)
IBM DB2
iconv Functions
ID3 Functions
IIS Administration Functions
Image Functions
Imagick Image Library
IMAP
Informix Functions
Informix Functions (PDO_INFORMIX)
Ingres II Functions
IRC Gateway Functions
PHP / Java Integration
JSON Functions
KADM5
LDAP Functions
libxml Functions
Lotus Notes Functions
LZF Functions
Mail Functions
Mailparse Functions
Mathematical Functions
MaxDB PHP Extension
MCAL Functions
Mcrypt Encryption Functions
MCVE (Monetra) Payment Functions
Memcache Functions
Mhash Functions
Mimetype Functions
Ming functions for Flash
Miscellaneous Functions
mnoGoSearch Functions
Microsoft SQL Server Functions
Microsoft SQL Server and Sybase Functions (PDO_DBLIB)
Mohawk Software Session Handler Functions
mSQL Functions
Multibyte String Functions
muscat Functions
MySQL Functions
MySQL Functions (PDO_MYSQL)
MySQL Improved Extension
Ncurses Terminal Screen Control Functions
Network Functions
Newt Functions
NSAPI-specific Functions
Object Aggregation/Composition Functions
Object property and method call overloading
Oracle Functions
ODBC Functions (Unified)
ODBC and DB2 Functions (PDO_ODBC)
oggvorbis
OpenAL Audio Bindings
OpenSSL Functions
Oracle Functions [deprecated]
Oracle Functions (PDO_OCI)
Output Control Functions
Ovrimos SQL Functions
Paradox File Access
Parsekit Functions
Process Control Functions
Regular Expression Functions (Perl-Compatible)
PDF Functions
PDO Functions
Phar archive stream and classes
PHP Options&Information
POSIX Functions
Regular Expression Functions (POSIX Extended)
PostgreSQL Functions
PostgreSQL Functions (PDO_PGSQL)
Printer Functions
Program Execution Functions
PostScript document creation
Pspell Functions
qtdom Functions
Radius
Rar Functions
GNU Readline
GNU Recode Functions
RPM Header Reading Functions
runkit Functions
SAM - Simple Asynchronous Messaging
Satellite CORBA client extension [deprecated]
SCA Functions
SDO Functions
SDO XML Data Access Service Functions
SDO Relational Data Access Service Functions
Semaphore
SESAM Database Functions
PostgreSQL Session Save Handler
Session Handling Functions
Shared Memory Functions
SimpleXML functions
SNMP Functions
SOAP Functions
Socket Functions
Standard PHP Library (SPL) Functions
SQLite Functions
SQLite Functions (PDO_SQLITE)
Secure Shell2 Functions
Statistics Functions
Stream Functions
String Functions
Subversion Functions
Shockwave Flash Functions
Swish Functions
Sybase Functions
TCP Wrappers Functions
Tidy Functions
Tokenizer Functions
Unicode Functions
URL Functions
Variable Handling Functions
Verisign Payflow Pro Functions
vpopmail Functions
W32api Functions
WDDX Functions
win32ps Functions
win32service Functions
xattr Functions
xdiff Functions
XML Parser Functions
XML-RPC Functions
XMLReader functions
XMLWriter Functions
XSL functions
XSLT Functions
YAZ Functions
YP/NIS Functions
Zip File Functions
Zlib Compression Functions
eXTReMe Tracker