Attaching non-text data in SOAP messages
June 19, 2006 XML is a fairly good format for exchanging documents since it is plain text, human readable and best of all, it's well-structured. It has won a lot of advocates through its ability to be a simultaneously human and machine-readable format”. What’s more, many people still consider XML as self-describing, a further reason to stand behind XML’s overriding dictum, XML is text, interop is everything. XML is well suited and widely used for data transfer. For example, SOAP messaging in Web services is based on XML (well, technically speaking SOAP 1.2 is based on XML Infoset). With SOAP messaging becoming more widespread as adoption grows within organisations, the challenge of how to send non-text based data along with your message is becoming more important. Many organisations now have “image and workflow” type applications for example, where a jpeg (say a scanned insurance claim) needs to be sent between applications. The challenge is not simply about how the non-text data (let’s call it binary data for arguments sake) is embedded/attached with the XML but if that data is understood in the various applications. The XML tools and standards for describing and manipulating XML (parsers, XSLT, XML Schemaetc) were not designed to work with binary data. They need text. Generally the answer is to embed the binary data in an XML document by encoding it as text using Base 64, a serialization that has been around for decades, is easy to implement and consequently has out the box interoperability across platforms. It even has support in XML Schema via the xsi:base64binary datatype. Of course solving technical problems is generally never that easy. There tend to be a tradeoff and base 64 is no different, there is a performance impact. Base 64 encodes your binary data into a textual representation that can squeeze into an XML document. It works by taking your binary data and translates it into a series of ASCII characters by encoding three octets (bytes) at a time, each consisting of eight bits and then representing them as four printable characters in the ASCII standard.
Increase your site traffic with a
paid inclusion
program
![]() Note the 4:3 ratio. So, effectively it uses 2^6 (64) ASCII characters to represent the binary, hence the name Base 64. This works well, since all platforms can decode and encode using this convention as 6 bit ASCII is widely supported. No special characters need to be dealt with. But there's an obvious drawback. Performance takes a noticeable hit for larger messages. Firstly, the Base 64 adds 30% to the size of the original binary format due to the 4(characters):3(binary bytes) ratio, incurring a greater latency over the wire, and secondly there is some expensive decoding and encoding to be done at either end. Particularly the decoding and some tests indicate factors of 3-4 times slower performance. However, for the most part it is certainly an option for smaller messages and is guaranteed to have excellent interoperability. For larger messages and applications that require speedy operation, Base 64 is not the solution. People recognized fairly early on after the birth of SOAP that they needed a way to attach a binary file to a SOAP message. This was exactly the approach of the first “attachment” specification that came about. But it failed. Microsoft and HP came up with the first SOAP attachment specification and wrote a short paper submitted as a “ note ” to the W3C. It was aptly named “SOAP with Attachments”, or SwA for short. The basic idea being that the binary message part, the “attachment”, would be thought of as a MIME attachment. Short for Multipurpose Internet Mail Extensions, as the name suggests it is a widely implemented specification for formatting non- ASCII mail message attachments. The SwA specification sets out how the SOAP body can contain a reference to the MIME message part (the attachment) simply with a URI. In effect the binary part is attached by a reference. See here for an example. SwA had problems and it never got passed the “note” stage at the W3C. Effectively it is now a dead standard although still used widely. It had two major failings. The first being usability/interoperability. SOAP infrastructure was created around the SOAP envelope and just didn’t cater well for attachments. An attachment by way of SwA meant two data models in one message. As eluded to earlier, these two data models do not operate with existing XML technology. As Mark Nottingham succinctly says: “Much of the value of XML and Web services resides in the ability to use generic XML tools - like XPath, XQuery, XSLT, XML encryption and digital signature and XML schema - to work with content. These tools don't work with non-XML content; if you want to index into such content, query it, transform it, encrypt it, sign it or describe it, you need to use a different mechanism, or even invent a new one.” "Other examples of usability/interoperability failings include WSDL’s cited problems with describing multipart MIME messages and ambiguities in how SOAP intermediary nodes deals with the MIME parts. There have been so many issues that there is even a WS-I interoperability profile for SwA. The second and probably more important failing was that SwA does not work with the “ composable ” character of SOAP. Basically the WS-* standards such as WS-Security were not written to work with attachments. WS-Security needs to work on all the data that needs to be digitally signed or encrypted, and that means all the data in the attachment. But if it can’t access it, then it won’t work. The signature is effectively invalid. The problems with SwA meant it clearly was not an ideal solution to the problem. In fact, by comparison simple Base 64 encoding was a better approach to attaching binary data since it maintains good interoperability. However the hunt was on for a better solution that melds both the interoperability and composability benefits of Base 64 with the performance benefits of SwA. Work started on a solution in the form of the Proposed Addendum to SOAP Messages with Attachments which was then the basis for the W3C XML Protocol Group to issue the Message Transmission Optimization Mechanism (MTOM) and XML-binary Optimized Packaging (XOP) specifications. MTOM sets the framework for XOP, and is described in the specification as follows: “The Abstract SOAP Transmission Optimization Feature enables SOAP bindings to optimize the transmission and/or wire format of a SOAP message by selectively encoding portions of the message, whilst still presentingan XML Infoset to the SOAP application.” "The two bold highlighted phrases above are important. “Presenting an XML Infoset” means we’ll only ever have one Infoset, therefore one data model which will maintain interoperability and usability with XML toolsets, and secondly. “By selectively” means we only need to work on the parts of the message which is non-text. MTOM solves the challenge of the one data model (infoset) and lets us bind to SOAP. But we still need to somehow attach the binary data and that is the job of XOP. Once again XOP uses MIME and base 64. XOP is described in the specification as: “A XOP package is created by placing a serialization of the XML Infoset inside of an extensible packaging format (such a MIME Multipart/Related, see [RFC 2387] ). Then, selected portions of its content that are base64-encoded binary data are extracted and re-encoded (i.e., the data is decoded from base64) and placed into the package. The locations of those selected portions are marked in the XML with a special element that links to the packaged data using URIs.” "Using base64 solves the second failing of SwA, that is working with the composable character of SOAP messages. Coupled together, MTOM and XOP allows us to select what parts of the message need to be sent over the wire as binary while still maintaining the Infoset. It allows us to attach binary data outside of the SOAP envelope as a message part but unlike SwA, this time the binary data is treated very much as it was within the SOAP envelope, one Infoset. This has added advantages as Mark points out, “ From an API perspective, XOP has some interesting implications. If an XML stack understands XOP encoding, your application doesn't need to be changed at all; when it wants to access the picture (for example), it can still get the character value of the content as a base64-encoded string. If XOP is in use, the implementation can automatically encode it on the fly.” The question is, how does this all work. A very good diagram taken from Matt Powell’s excellent Web Services, Opaque Data, and the Attachments Problem is shown below. The SOAP processing engines performs a temporary Base 64 encoding of the binary data just before the message hits the wire. This allows the SOAP processor to work on the Base 64 data, allowing for example a WS-Signature of the data to be taken and placed into the header. There is no need for expensive decoding at the other end, and the process works in reverse. An excellent example of all of the above can be found within the XOP specification here . XOP allows use to maintain the data model of the XML message as the attachment is effectively treated base 64 encoded data. If you are after implementations of MTOM and XOP, they already available in Java (JAX-WS) and .Net WSE. Source: Web Services.org Have your website professionally optimized by the search engine positioning experts at Rank for $ales. If your site has dropped in rankings since November 16, 2003, contact the search engine positioning experts at Rank for Sales. Get your business or company listed in the Global Business Listing directory and increase your business. It takes less then 24 hours to get a premium listing in the most powerful business search engine there is. Click here to find out all about it. For the best technical information on hardware, software, Internet applications, e-Commerce, B2B, Web services or IT-related industry news, visit Tech Blog. Reciprocal Link Exchange Program: If your company is engaged in the business of Web Services, the development of related Internet application, ecommerce or B2B development, Internet security services, Web hosting services or is involved in professional Search Engine Optimization, My Web Services is seriously interested in a worthwhile Reciprocal Link Exchange Trading Program with your company. Click here to get all the details.
Sponsored by
Internet Trends
Sponsored by
LCWHG
Sponsored by
ISEN |