2

I have a file that contains segments that form a word in the following format <+segment1 segment2 segment3 segment4+>, what I want to have is an output with all the segments beside each other to form one word (So basically I want to remove the space between the segments and the <+ +> sign surronding the segments). So for example:

Input:

<+play ing+> <+game s .+>

Output:

playing games. 

I tried first detecting the pattern using \<\+(.*?)\+\> but I cannot seem to know how to remove the spaces

2 Answers 2

3

Use this Python code:

import re
line = '<+play ing+> <+game s .+>'
line = re.sub(r'<\+\s*(.*?)\s*\+>', lambda z: z.group(1).replace(" ", ""), line)
print(line)

Results: playing games.

The lambda removes spaces additionally.

REGEX EXPLANATION

--------------------------------------------------------------------------------
  <                        '<'
--------------------------------------------------------------------------------
  \+                       '+'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \+                       '+'
--------------------------------------------------------------------------------
  >                        '>'
Sign up to request clarification or add additional context in comments.

Comments

0

I assume that spaces can be converted to empty strings except when they are preceded by '>' and are followed by '<'. That is, the space in the string '> <' is not to be replaced by an empty string.

You can replace each match of the following regular expression with an empty string:

<\+|\+>|(?<!>) | (?!<)

Regex demo<¯\(ツ)>Python code

This expression can be broken down as follows.

<\+     # Match '<+'
|       # or
\+>     # Match '<+'
|       # or
(?<!>)  # Negative lookbehind asserts current location is not preceded by '>'
[ ]     # Match a space
|       # or
[ ]     # Match a space
(?!<)   # Negative lookahead asserts current location is not followed by '<'

I've placed each space in a character class above so it is visible.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.