Delicious Bookmark this on Delicious Share on Facebook SlashdotSlashdot It! Digg! Digg



PHP : Function Reference : DOM Functions : DOMDocument->loadHTML()

DOMDocument->loadHTML()

Load HTML from a string ()

DOMDocument{
boolloadHTML(stringsource);
}

The function parses the HTML contained in the string source. Unlike loading XML, HTML does not have to be well-formed to load. This function may also be called statically to load and create a DOMDocument object. The static invocation may be used when no DOMDocument properties need to be set prior to loading.

Parameters

source

The HTML string.

Return Values

Returns TRUE on success or FALSE on failure.

Errors/Exceptions

If an empty string is passed as the source, a warning will be generated. This warning is not generated by libxml and cannot be handled using libxml's error handling functions.

Examples

Example522.Creating a Document

<?php
$doc
= new DOMDocument();
$doc->loadHTML("<html><body>Test<br></body></html>");
echo
$doc->saveHTML();
?>


Related Examples ( Source code ) » dom domdocument loadhtml



Code Examples / Notes » dom domdocument loadhtml

hanhvansu

When using loadHTML() to process UTF-8 pages, you may meet the problem that the output of dom functions are not like the input. For example, if you want to get "Cạnh tranh", you will receive "Cạnh tranh".  I suggest we use mb_convert_encoding before load UTF-8 page :
<?php
$pageDom = new DomDocument();
$searchPage = mb_convert_encoding($htmlUTF8Page, 'HTML-ENTITIES', "UTF-8");
@$pageDom->loadHTML($htmlUTF8Page);
?>


ard

The comment from bigtree at DONTSPAM dot 29a dot nl
26-Apr-2005 11:15 was helpful.
In addition I noted that if your doctype declaration is not valid, DomDocument::loadHtml won't respect your charset=utf-8. It made me crazy. Beware!


bigtree

Pay attention when loading html that has a different charset than iso-8859-1. Since this method does not actively try to figure out what the html you are trying to load is encoded in (like most browsers do), you have to specify it in the html head. If, for instance, your html is in utf-8, make sure you have a meta tag in the html's head section:
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
</head>
If you do not specify the charset like this, all high-ascii bytes will be html-encoded. It is not enough to set the dom document you are loading the html in to UTF-8.


romain dot lalaut

Note that the elements of such document will have no namespace even with <html xmlns="http://www.w3.org/1999/xhtml">

xuanbn

If you use loadHTML() to process utf HTML string (eg in Vietnamese), you may experience result in garbage text, while some files were OK. Even your HTML already have meta charset  like
 <meta http-equiv="content-type" content="text/html; charset=utf-8">
I have discovered that, to help loadHTML() process utf file correctly, the meta tag should come first, before any utf string appear. For example, this HTML file
<html>
<head>
   <meta http-equiv="content-type" content="text/html; charset=utf-8">
   <title> Vietnamese - Tiếng Việt</title>
 </head>
<body></body>
</html>
will be OK with loadHTML() when <meta> tag appear <title> tag.
But the file below will not regcornize by loadHTML() because <title> tag contains utf string appear before <meta> tag.
<html>
<head>
   <title> Vietnamese - Tiếng Việt</title>
   <meta http-equiv="content-type" content="text/html; charset=utf-8">
 </head>
<body></body>
</html>


Change Language


Follow Navioo On Twitter
DOMAttr->__construct()
DOMAttr->isId()
DOMCharacterData->appendData()
DOMCharacterData->deleteData()
DOMCharacterData->insertData()
DOMCharacterData->replaceData()
DOMCharacterData->substringData()
DOMComment->__construct()
DOMDocument->__construct()
DOMDocument->createAttribute()
DOMDocument->createAttributeNS()
DOMDocument->createCDATASection()
DOMDocument->createComment()
DOMDocument->createDocumentFragment()
DOMDocument->createElement()
DOMDocument->createElementNS()
DOMDocument->createEntityReference()
DOMDocument->createProcessingInstruction()
DOMDocument->createTextNode()
DOMDocument->getElementById()
DOMDocument->getElementsByTagName()
DOMDocument->getElementsByTagNameNS()
DOMDocument->importNode()
DOMDocument->load()
DOMDocument->loadHTML()
DOMDocument->loadHTMLFile()
DOMDocument->loadXML()
DOMDocument->normalizeDocument()
DOMDocument->registerNodeClass()
DOMDocument->relaxNGValidate()
DOMDocument->relaxNGValidateSource()
DOMDocument->save()
DOMDocument->saveHTML()
DOMDocument->saveHTMLFile()
DOMDocument->saveXML()
DOMDocument->schemaValidate()
DOMDocument->schemaValidateSource()
DOMDocument->validate()
DOMDocument->xinclude()
DOMDocumentFragment->appendXML()
DOMElement->__construct()
DOMElement->getAttribute()
DOMElement->getAttributeNode()
DOMElement->getAttributeNodeNS()
DOMElement->getAttributeNS()
DOMElement->getElementsByTagName()
DOMElement->getElementsByTagNameNS()
DOMElement->hasAttribute()
DOMElement->hasAttributeNS()
DOMElement->removeAttribute()
DOMElement->removeAttributeNode()
DOMElement->removeAttributeNS()
DOMElement->setAttribute()
DOMElement->setAttributeNode()
DOMElement->setAttributeNodeNS()
DOMElement->setAttributeNS()
DOMElement->setIdAttribute()
DOMElement->setIdAttributeNode()
DOMElement->setIdAttributeNS()
DOMEntityReference->__construct()
DOMImplementation->__construct()
DOMImplementation->createDocument()
DOMImplementation->createDocumentType()
DOMImplementation->hasFeature()
DOMNamedNodeMap->getNamedItem()
DOMNamedNodeMap->getNamedItemNS()
DOMNamedNodeMap->item()
DOMNode->appendChild()
DOMNode->cloneNode()
DOMNode->hasAttributes()
DOMNode->hasChildNodes()
DOMNode->insertBefore()
DOMNode->isDefaultNamespace()
DOMNode->isSameNode()
DOMNode->isSupported()
DOMNode->lookupNamespaceURI()
DOMNode->lookupPrefix()
DOMNode->normalize()
DOMNode->removeChild()
DOMNode->replaceChild()
DOMNodelist->item()
DOMProcessingInstruction->__construct()
DOMText->__construct()
DOMText->isWhitespaceInElementContent()
DOMText->splitText()
DOMXPath->__construct()
DOMXPath->evaluate()
DOMXPath->query()
DOMXPath->registerNamespace()
dom_import_simplexml
eXTReMe Tracker