2

I need to replace some strings in the data content of my html page. I can't use replace function directly because I need to change only the data section. It shouldn't modify any of the tags or attributes. I used HTMLParser for this. But I am stuck on writing it back to file. Using HTMLParser I can parse and get data content on which I will do necessary changes. But how to put it back to my html file ?

Please help. Here is my code:

class EntityHTML(HTMLParser.HTMLParser):
    def __init__(self, filename):
        HTMLParser.HTMLParser.__init__(self)
        f = open(filename)
        self.feed(f.read())

    def handle_starttag(self, tag, attrs):
        """Needn't do anything here"""
        pass

    def handle_data(self, data):
        print data
        data = data.replace(",", "&sbquo")
1
  • 1
    Please indent your code properly. Commented Sep 7, 2011 at 18:57

1 Answer 1

3

HTMLParser doesn't construct any representation in memory of your html file. You could do it yourself in handle_*() methods but a simpler way would be to use BeautifulSoup:

>>> import re
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup('<a title=,>,</a>')
>>> print soup
<a title=",">,</a>
>>> comma = re.compile(',')
>>> for t in soup.findAll(text=comma): t.replaceWith(t.replace(',', '&sbquo'))
>>> print soup
<a title=",">&sbquo</a>
Sign up to request clarification or add additional context in comments.

2 Comments

But when I try to write the soup into a file it gives me error saying: TypeError: expected a character buffer object
f = open(filename,"rw") f.write(soup)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.