1

I have to parse an xml which has xml elements with attributes whose values can be multiline with spaces and line breaks.

I am parsing using minidom but the multiline attribute values which i am getting are not having line breaks.

How to get such values using minidom? If not minidom which other library supports such attributes?

0

2 Answers 2

1

This is not the matter of minidom or whatever-dom. It is XML standard, who tells that attribute value

For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value

https://www.w3.org/TR/2008/REC-xml-20081126/#attdecls

That means, you wont ever find \n (linefeed), \r (carriage return) or \t (tab) character in the value of XML attribute, at least if your parser follows the rules.

Sign up to request clarification or add additional context in comments.

Comments

0

According to XML-Spec - 3.3.3 Attribute-Value Normalization newlines are not allowed and get replaced by spaces.

Before the value of an attribute is passed to the application or checked for validity, the XML processor must normalize the attribute value by applying the algorithm below, or by using some other method such that the value passed to the application is the same as that produced by the algorithm.

  1. All line breaks must have been normalized on input to #xA as described in 2.11 End-of-Line Handling, so the rest of this algorithm operates on text normalized in this way.

  2. Begin with a normalized value consisting of the empty string.

  3. For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and continuing to the last, do the following:

    • For a character reference, append the referenced character to the normalized value.

    • For an entity reference, recursively apply step 3 of this algorithm to the replacement text of the entity.

    • For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value.

    • For another character, append the character to the normalized value.

(emphasis mine)

See open "bug" xml.dom.minidom does not escape CR, LF and TAB characters within attribute values as well)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.