0

I am trying to replace few urls in a long string.

A sample here:

s = 'https://www.yellowpages.ca/bus/Alberta/Edmonton/MNS-Enterprise-\nLtd/8114324.html, https://411.ca/business/profile/13300641'

Due to the newline character within the url, the match will always stop at \n. I tried

re.sub(r'(https?://[\S]*)', 'website__', s, re.DOTALL)

but the result breaks at \n

'website__\nLtd/8114324.html, website__'
0

1 Answer 1

1

You can add \n and use

re.sub(r'https?://[\n\S]+\b', '<URL>', s)

See the regex demo. Details:

  • https?:// - http:// or https://
  • [\n\S]+ - one or more newline or non-whitespace chars
  • \b - until the rightmost word boundary.

See the Python demo:

import re
s = 'https://www.yellowpages.ca/bus/Alberta/Edmonton/MNS-Enterprise-\nLtd/8114324.html, https://411.ca/business/profile/13300641'
print( re.sub(r'https?://[\n\S]+\b', 'website__', s) )
# => website__, website__
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.