0

I have a situation in which I think I may have to use regex to alter html tag content or src based on the class attribute.

To document I will be parsing will be either nicely formed html, partial html or php files.

EG I would need to change/fill these tags with inner content: fileX.php

<?php
echo <<<_END
<div class="identifyingClass1"></div>
<div class="identifyingClass2"><span>holding content</span></div>
<img src='http://source.com/to/change' class='identifyingClass3' alt='descrip'/>
_END;

Resulting fileX.php

<?php
echo <<<_END
<div class="identifyingClass1">New content jsd soisvkbsdv</div>
<div class="identifyingClass2">More new content</div>
<img src='new/source.tiff' class='identifyingClass3' alt='descrip'/>
_END;

The html could be complete, could be separated by php, be as is, be inside a hereDOC...

Is the best way to achieve this to just use regex or has anyone seen or used a class for this kind of thing?

3
  • 1
    You would probably benefit from the canonical answer on this topic. Commented Nov 13, 2012 at 22:53
  • 1
    -1 for poor research effort - there are hundreds (??) of SO questions with nearly this same title? Commented Nov 13, 2012 at 22:57
  • Non of the DOMDocuments claim if they can handle parsing php files though, clean html fine but not php templates Commented Nov 13, 2012 at 23:49

2 Answers 2

2

Regex is evil for such case. Better you work on the generated html. Here's how you do it.

Enable output buffering. On the ob_start function add your own callback. Process the generated html with DOMDocument inside the handler. Something like this,

function my_handler($contents){
     $doc = DOMDocument::loadHTML ($contents);
     // change your document here and return it later
     return $doc->saveHTML();
}
ob_start('my_handler');
Sign up to request clarification or add additional context in comments.

4 Comments

I've edited the original q, will DOMdocument or any of the other parsers be able handle that?
If you output html fragment it will be appended to output buffer. And after http request is finished you'd want to render a full html page, dont you? this callback fires just then. So you can write any html fragment. Its not a problem as long as these are parts of a lard html document. If its not valid html doc, It'll work too. but will generate a lot of warnings.
No i need to be able to edit the actual php file its self, open the php file but not run it, chnage the content of some of the html, then save the changes to the php file.
Then you can use sed. But note there will be a combination of patterns. better you grep the sources and see the pattern then create regular expression on sed
0

As already stated, RegEx is not recommended for doing such kind of things. Look at this excellent answer. My personal favourite is SimleDom which provides a jQuery-like syntax and makes working with HTML in PHP actually joyful ;).

2 Comments

I've edited the original q, will DOMdocument or any of the other parsers be able handle that?
<?php include "simple_html_dom.php"; $html = file_get_html('youfile.php'); echo $html->innertext; Can't you just do it like that?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.