ActionScript, E4X, and White Space

english mobile

I've been reading XML into ActionScript 3 for many years, and I still haven't solved all the issues with it. Given that my searches for solutions to these issues turn up so little, I decided to share the solutions I have found, especially when it comes to white space.

I think at least part of the reason that these problems are so hard to solve is that the XML documentation for many of the properties and methods we need to use is frustratingly circular. So let me start by laying out what some of these do in practice.

ignoreWhiteSpace

I think ignoreWhiteSpace is the property we first try to use to correct some of the issues with E4X. This property will, in fact, fix issues that are caused by having "extra" children that you weren't expecting to be parsing when you use normal formatting, like new lines and tabs, in your XML. However, it then causes other white spaces that might have been pretty important to disappear.

An alternative to using ignoreWhiteSpace is to use xmlVar.elements() instead of xmlVar.children(). This works relatively well unless you need to access something that's not an element, like a comment, or you need to pass the XML to Adobe code (like new DataProvider()) that for some reason is not equipped to handle the vagaries of their own E4X implementation.

prettyPrinting

We're all familiar with the problem where the default XML parsing adds even more formatting than we included in the original XML file, so that

<root>
 <p>The quick brown fox jumped over the
 <b>lazy</b> dog.</p>
</root>
becomes
<root>
 <p>The quick brown fox jumped over the 
  <b>lazy</b> 
 dog.</p>
</root>

I'm sure there are some versions of Flash where setting prettyPrinting to false fixes this problem, but in the version of the Flash player we target, setting this property has no effect. The good news is that TextField, the Flash Label component, and the Flex MX Label component support the condenseWhite property, which gets rid of this issue for text you want to display. I guess if you're using a spark label, the skin will need to use something that supports condenseWhite if you need this.

Wrap in CDATA

One of the reasons XML parsing is so thorny is that we're using the XML to store external content, often HTML-formatted. CDATA does work to preserve HTML formatting, but it has its own issues. One issue is that there's no real documentation about how to get the text out of CDATA.

Funny story--I was once on a Flex project where it turned out I needed to wrap the entire XML package I was building in CDATA. Try though I might, I couldn't find anything that told me how to get at the "stuff" inside the CDATA. With deadline fast approaching, I "hacked" it by giving the text to a Spark label (which did know how to do this), then reading the text out again and casting it to XML.

Some months later, I ultimately discovered that String(myNode) will retrieve the contents of a node, whether its contents are wrapped in CDATA or not. Unless the contents are not wrapped in CDATA but contain HTML formatting anyway. This is something that is quite likely to happen on my team, as the people who produce the XML see no reason they shoulld start wrapping everything in CDATA just because we're using ActionScript 3 now.

In addition, even when things are wrapped in CDATA when they should be, you have to dump your text in one line into CDATA because it faithfully reproduces all those \n's and \t's, and you wind up with a bunch of junk nodes if you have to cast the contents back to XML. This results in XML that isn't readable, and readability is one reason to use XML in the first place.

hasComplexContent

So, if you're getting XML and you know that sometimes certain nodes will contain HTML and sometimes that HTML will be wrapped in CDATA and sometimes it won't, what do you do? One thing you can do is create a function that checks for hasComplexContent and calls toString() if it returns false and children().toXMLString() if it returns true.

This works well, but one issue with it is you have to have access to this function wherever you're parsing XML. And since you can't really extend the XML Class, due to its dependence on static methods and properties, that means you either need to use a static method yourself (not my favorite thing) or you need to violate DRY, unless your application lends itself to only parsing XML nodes in exactly one place.

normalize()

This is probably my favorite method, because it gets rid of all the weird child nodes that contain "\n\t\t". This means that I can now parse formatted XML and not have to allow for weird extra junk in there. More importantly, I can pass that XML to "dumber" code, such as the Flash DataProvider constructor, and it works. It also does not take out the spaces around inline HTML tags, so you can normalize() the XML to make for easy parsing that you should get from ignoreWhiteSpace (but don't, due to the removed spaces), then you can call String(myXMLNode) on the node whether it has complex content or not, and it will work in a TextField with condenseWhite set to true. FTW!

0 comments: