Java Regex-based string replacement

Question

Looking for a regex based String replacement in Java on the below use-case. I'm doing some Groovy based XML processing and due to some custom processing (won't go much in detail on this), the resulting XML has some invalid tags, for e.g.

<?xml version='1.0' encoding='UTF-8'?>
<Customer id="xyz" xmlns='http://abc.com'>
<order orderGroup="mock">
    <entry>
        <key>test</key>
    </entry>
</order orderGroup="mock">
</Customer id="xyz">

If you note, the end tags of the element names containing attributes are messed up. The XML is just treated as a string, so what I want to do is, just replace occurrences of such end tags via string regex based replacements. For e.g. replace

</order orderGroup="mock"> with </order>, 
</Customer id="xyz"> with </Customer>

Any idea if there is quick Java String based regex I can use for doing such replacements ?

Thanks.

What happens when you load the javadoc, hit Ctrl-F, and type "regex"? Why don't you fix the faulty "custom processing" which generates garbage instead of trying to workaround the problem? — JB Nizet
– JB Nizet, Commented Mar 22, 2013 at 23:51

Evgeniy Dorofeev · Accepted Answer · 2013-03-23 00:15:47Z

5

try

    xml = xml.replaceAll("</([^ >]+).*?>", "</$1>");

answered Mar 23, 2013 at 0:15

Evgeniy Dorofeev

137k31 gold badges209 silver badges288 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Alan Moore Over a year ago

+1, but I would have used </([^\s>]+)[^>]+>. .*? is a fickle friend; why put yourself at its mercy when you can so easily say exactly what you want?

Evgeniy Dorofeev Over a year ago

I agree about \\s, but it seems regex converts "<e1><e2></e2></e1>" -> "<e1><e2></e></e>"

Vivin Paliath · Accepted Answer · 2013-03-23 00:00:08Z

2

The easiest solution is to fix your custom XML processing and have it generate valid XML.

The easy solution is to use something like JTidy to clean up your XML.

If you must use regex, you could try something like this:

Pattern pattern = Pattern.compile("</([A-Za-z]+) [^>]+>");
Matcher matcher = pattern.matcher(xml);

if(matcher.find()) {
   xml = matcher.replaceAll(matcher.group(1));
}

I haven't tested this out, so keep that in mind. There might be a few issues.

Explanation of the regex:

<         -> The opening angle bracket of the tag
/         -> The / that marks a closing tag
(         -> Start of a capturing group. We want to capture the actual ending tag.
[A-Za-z]+ -> One or more alphabetic characters (upper and lowercase)
)         -> End of the capturing group.
          -> A space.
[^>]+     -> One or more of anything that is not a closing angle-bracket.
>         -> The closing angle bracket of the tag.

edited Mar 23, 2013 at 0:00

answered Mar 22, 2013 at 23:54

Vivin Paliath

95.8k42 gold badges230 silver badges302 bronze badges

3 Comments

codehammer Over a year ago

Thanks Vivin! That works to an extent. Only issue is, it replaced even the start and ending angle brackets. In other words, it results in Customer instead of <Customer>

Alan Moore Over a year ago

As Evgeniy's answer shows, this solution is much more verbose than it needs to be. In particular, it's never necessary to call find() before doing the substitution. replaceAll() does that itself, and if there are no matches it returns the original string unchanged. You don't need to call methods like group(n) for the replacement string, either. If there happen to be any question marks or backslashes in the string you'll get a runtime exception; that's not a problem if you use "$1".

Vivin Paliath Over a year ago

Yes his answer is much better.

Collectives™ on Stack Overflow

Java Regex-based string replacement

2 Answers 2

2 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related