Extract HTML from PHP

Do not user regular expression, use DOMDocument instead

Function provided:

<?php

function parseHTMLURL($url = 'http://www.google.com') {    

    $content = file_get_contents($url);

    

    // removing some text for ignoring error

    $content = str_replace('id="special-offer-block"', "", $content);

    $dom = new DOMDocument();

    $html = $dom->loadHTML($content);

    

    $node = $dom->getElementById("node");

    

    // Go to level-2 div

    $divs = $node->getElementsByTagName("div");    

    foreach($divs as $k => $d) {        

        $node = $d;

        break;

    }

    

    $divs = $node->getElementsByTagName("div");    

    foreach($divs as $k => $d) {        

        $node = $d;

        break;

    }

        

    // get DomElement innerHTML

    $elem = $d;

    $innerHTML = ''; 

    $children = $elem->childNodes;

    foreach ($children as $child) {

            $tmp_doc = new DOMDocument();

            $tmp_doc->appendChild($tmp_doc->importNode($child,true));       

            $out = $tmp_doc->saveHTML();            

            $innerHTML .= $out;

    }         

    return $innterHTML;

}

*: Install php-xml for this extension

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: