1

I have the following line:

b = re.sub('^xMain (\S+)/y1,/y0 (\S+ )(.*)$', 'xMain \2\1\3', a)

where a is:

xMain Buchan/y1,/y0 Angus Sub1

Why does b come out as 'xMain \x02\x01\x03'? My intention is to de-invert a name. In Regexbuddy this works OK but not in Python 2.7.

1
  • It does not come out as "xMain". It comes out as "xMain" followed by a space followed by three unprintable characters. Commented Nov 12, 2013 at 15:11

1 Answer 1

2

You see unprintable characters because \2\1\3 have meaning in a regular python string too, as octal escape codes:

>>> '\2'
'\x02'
>>> 'xMain \2\1\3'
'xMain \x02\x01\x03'

They never make it to the re.sub() function as written.

Use a raw string literal instead:

b = re.sub('^xMain (\S+)/y1,/y0 (\S+ )(.*)$', r'xMain \2\1\3', a)

Note the r'...' string. In a raw string literal \... escape codes are not interpreted, leaving the back-references in place for the re module to use:

>>> r'xMain \2\1\3'
'xMain \\2\\1\\3'

The alternative would be to double the backslashes, escaping the escape:

b = re.sub('^xMain (\S+)/y1,/y0 (\S+ )(.*)$', 'xMain \\2\\1\\3', a)

Either way, your replacement pattern now works as expected:

>>> import re
>>> a = 'xMain Buchan/y1,/y0 Angus Sub1'
>>> re.sub('^xMain (\S+)/y1,/y0 (\S+ )(.*)$', r'xMain \2\1\3', a)
'xMain Angus BuchanSub1'
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.