python HTMLParser to replace some strings in the data of the html file

Question

I need to replace some strings in the data content of my html page. I can't use replace function directly because I need to change only the data section. It shouldn't modify any of the tags or attributes. I used HTMLParser for this. But I am stuck on writing it back to file. Using HTMLParser I can parse and get data content on which I will do necessary changes. But how to put it back to my html file ?

Please help. Here is my code:

class EntityHTML(HTMLParser.HTMLParser):
    def __init__(self, filename):
        HTMLParser.HTMLParser.__init__(self)
        f = open(filename)
        self.feed(f.read())

    def handle_starttag(self, tag, attrs):
        """Needn't do anything here"""
        pass

    def handle_data(self, data):
        print data
        data = data.replace(",", "&sbquo")

Please indent your code properly.

S.Lott
– S.Lott

2011-09-07 18:57:33 +00:00
Commented Sep 7, 2011 at 18:57 — S.Lott
– S.Lott, Commented Sep 7, 2011 at 18:57

jfs · Accepted Answer · 2011-09-07 20:03:11Z

3

HTMLParser doesn't construct any representation in memory of your html file. You could do it yourself in handle_*() methods but a simpler way would be to use BeautifulSoup:

>>> import re
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup('<a title=,>,</a>')
>>> print soup
<a title=",">,</a>
>>> comma = re.compile(',')
>>> for t in soup.findAll(text=comma): t.replaceWith(t.replace(',', '&sbquo'))
>>> print soup
<a title=",">&sbquo</a>

edited Sep 7, 2011 at 20:03

answered Sep 7, 2011 at 19:55

jfs

417k210 gold badges1k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Divya Over a year ago

But when I try to write the soup into a file it gives me error saying: TypeError: expected a character buffer object

Divya Over a year ago

f = open(filename,"rw") f.write(soup)

Collectives™ on Stack Overflow

python HTMLParser to replace some strings in the data of the html file

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related