2

I have a string like below, which contains Chinese:

'<span class=H>宜家</span><span class=H>同款</span> 世纪宝贝儿童餐椅婴儿餐椅宝宝餐椅婴儿吃饭椅'

Now I would like to delete all html elements for this string as expected:

'宜家同款世纪宝贝儿童餐椅婴儿餐椅宝宝餐椅婴儿吃饭椅'

May I know how to do this by python and re? thanks a lot!

2 Answers 2

5

This is something trivial to solve with BeautifulSoup HTML parser:

>>> from bs4 import BeautifulSoup
>>>
>>> data = '<span class=H>宜家</span><span class=H>同款</span> 世纪宝贝儿童餐椅婴儿餐椅宝宝餐椅婴儿吃饭椅'
>>> soup = BeautifulSoup(data)
>>> soup.text
'宜家同款 世纪宝贝儿童餐椅婴儿餐椅宝宝餐椅婴儿吃饭椅'
Sign up to request clarification or add additional context in comments.

1 Comment

It looks a good solution. I just thought to use regex and I did't get a correct solution. Thanks a lot, I will try this.
1

For a simple solution that uses just regex, you can search the following pattern and replace all occurrences of it with an empty string:

\s*<[^>]+>\s*

For instance:

p = re.compile( '\s*<[^>]+>\s*')
p.sub( '', '<span class=H>宜家</span><span class=H>同款</span> 世纪宝贝儿童餐椅婴儿餐椅宝宝餐椅婴儿吃饭椅')

Disclaimer: This will, by no means, handle every possible variation of legal HTML, but, as long as all of the input data, is as simple as the data in your example, it will work. You could make changes to the pattern, as necessary, to handle slightly more complex inputs. However, if your intent is to handle any well-formed HTML document as input, then you should consider an actual HTML parser rather than using regex.

2 Comments

By including \s like this /\s*<[^>]+>\s*/g will eliminate all the spaces in the result.
@PedroPinheiro Pood point. I didn't notice that the desired output in the OP did have the spaces removed. I'll update my answer accordingly. However, the bookend-slashes are not necessary in Python. Also, re.sub uses the global option by default, so the g is also unnecessary.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.