Object Interconnections: CORBA and XML, Part 1: Versioning

Douglas C. Schmidt and Steve Vinoski



A frequently asked question in CORBA circles these days is how the XML (eXtensible Markup Language) [1] relates to CORBA applications. Since there's so much hype surrounding XML, it's hard to know what its real benefits are for CORBA-based systems. We've actually heard people say, "Why use CORBA when you can use XML?" Statements like this show there is significant confusion about what XML is, what it isn't, and specifically where it overlaps with CORBA, if at all.

In this column, we discuss how XML fits into the overall CORBA picture. To keep the column focused, we assume you have basic knowledge of XML. If you don't possess such knowledge, please refer to our favorite XML book, Essential XML [2]. As books go, it's one of those rare gems that can serve as both a tutorial and a reference.

XML — A Strongly-Hyped Language?

Some consider XML to be the next silver bullet. The sensible and experienced among us know, however, that no single tool can solve all the world's vexing technology challenges — not even CORBA! Like all other hype-ridden computing fads, XML is good for solving some things, but it can't solve everything. Unfortunately, the hype surrounding XML often conceals its real utility.

Comparing CORBA and XML is a classic case of comparing apples and oranges. XML is neither a distributed computing system nor a transport protocol. XML doesn't provide messaging or remote procedure calls. It doesn't allow you to invoke methods on distributed objects, nor does it dispatch incoming requests to programming language servants. It certainly doesn't address the inherent complexities of distributed systems, such as latency hiding, partial failures, causal ordering, or information assurance. On the other hand, CORBA is not a markup language. Its primary purpose is not the structured representation and definition of information, though it does provide support for such tasks via IDL.

XML and CORBA are mostly complementary. Fundamentally, CORBA is oriented around method invocation, while XML is all about describing data. They solve different problems. It's therefore not unreasonable to want to use them together, or even to hope that the sum is greater than the parts. In this column, we begin exploring the topic of versioning to compare and contrast CORBA and XML.

Versioning in CORBA

One of the biggest drawbacks of OMG IDL is that it lacks practical versioning support. For example, assume you're writing a bug tracking system, and you define a Bug type in IDL as follows:

struct Bug {
    long bugnum;
    string synopsis;
    string owner;

You proceed to write your bug tracking server application along with clients that allow you to submit new bugs as well as modify and browse your existing bugs.

As more users adopt your bug tracking system, they'll naturally want to extend it. One feature they'll undoubtedly want is to know who reported a given bug. So you go back to your Bug struct and change it by adding a new reported_by field:

struct Bug {
    long bugnum;
    string synopsis;
    string owner;
    string reported_by;  // added field

You recode your server application to take this new field into account and upgrade your running bug server. Unfortunately, all of your client applications break immediately, and your users start ringing your phone off its hook wondering why their bug tracking system no longer works. Assuming it's not feasible to apply the famous demotivational dictum [3] "Apathy: if we don't take care of the customer, maybe they'll stop bugging us," you'll need to devise a versioning strategy.

Some might point out that OMG IDL supports a versioning pragma. For example, when we add the field to the Bug struct, we might modify its version like this:

struct Bug { ... };  //new updated version

#pragma version Bug 1.1

This changes the version of the Bug struct from the default 1.0 (major version 1, minor version 0) to version 1.1. Unfortunately, this does absolutely nothing to address the problem. First, no version information for data types is passed between applications as part of requests or replies, so an application is unable to tell what version of a type it's sending or receiving. Second, the CORBA specification does not define how this version number should be modified when the data type changes. In other words, it fails to explain what constitutes a major modification, requiring a change in the major version number, and a minor modification, requiring a change in only the minor version number. It also fails to define compatibility rules between different version numbers. In other words, it's useless for all practical purposes.

The CORBA standard interoperability protocol, GIOP, is also partly to blame for versioning problems in CORBA applications. One of the fundamental design considerations of GIOP is that it carries no type information. It expects the target of a request to understand the types of all data being sent to it in that request. This design approach makes sense, given that an object always implicitly knows its interface and thus knows all types used by the operations and attributes of that interface. After all, if the object didn't know its own interface, it wouldn't be able to respond to client requests. GIOP takes advantage of this fact to avoid sending unneeded type information with every request. However, that choice means that types are not identified on the network (except for the value inside an Any), so applications are unable to receive types they do not expect or understand.

In our example, assume that a bug tracking client application invokes a request to get the details of bug number 49938. The bug tracking server application creates a Bug struct, fills in the details of bug number 49938 including the reported_by field, and returns the Bug struct to the caller. Unfortunately, the client was built with the original definition of the Bug struct, so it doesn't expect the reply to contain a value for the new reported_by field. As a result of this extra data in the reply, the client's ORB is unable to properly unmarshal it, so it raises a CORBA::MARSHAL exception back up to the client application. The net result is that clients are unable to communicate properly with the bug tracking server until they've all been recompiled with the new definition of the Bug struct and redeployed. Thus, all your users are dead in the water until they upgrade.

Obviously, directly changing the Bug struct was a bad idea, as it resulted in your clients and servers having different definitions for the same data type. Fortunately, though, there are other techniques for handling these types of versioning problems in CORBA. For example, rather than directly changing the definition of Bug, you can create a new type with the extra field:

struct Bug2 {  // new type
    long bugnum;
    string synopsis;
    string owner;
    string reported_by;

Leaving the original Bug type unchanged, you instead augment the bug tracking interface to accept both the original Bug type and the new Bug2 type. However, if you change the interface directly, you wind up once again needing to recompile and redeploy all your client applications so that they all have the same interface definition as the server. To augment an interface without changing it, you can apply a common CORBA versioning strategy based on inheritance and the Extension Interface pattern [4]. For example, assume you defined the original interface as follows:

interface BugTracker {
    exception NoSuchBug { long bugnum; };
    Bug get_details(in long bugnum) raises(NoSuchBug);
    // ...

To augment it to also handle the new Bug2 type, you derive a new interface, like this:

interface BugTracker2 : BugTracker {
    // Remember, CORBA IDL doesn't allow operation
    // overloading, so we must rename this operation.
    Bug2 get_details2(in long bugnum} raises(NoSuchBug);
    // ...

You recode your server application to make your servants support the BugTracker2 interface, and due to inheritance, they also still support the original BugTracker interface. Any clients that know only the original BugTracker interface continue to work and need not be rewritten, recompiled, or redeployed. New clients built specifically to use the new BugTracker2 interface work as well. Both types of clients continue to work because each type of client can successfully narrow to the specific interface it understands. Old clients narrow to BugTracker and rely only on its operations and data types, while new clients narrow to BugTracker2 and, because of inheritance, can still make use of the base BugTracker interface as well.

At first glance, this approach doesn't appear so bad. After all, by applying the Extension Interface pattern in the second approach, we added the necessary feature to the system without breaking any existing clients. Unfortunately, it's not so simple. What if the users now want to add another Bug field, this time to keep track of the date each bug was reported? Each change requires a new struct type and a new derived interface, with all new operations to handle the new struct type. In our example, we chose poor names for our new types by simply tacking a "2" onto the end of the original type names. Although long-time COM programmers might find the Extension Interface pattern intuitive, most CORBA/C++ programmers would get confused if they had four, five, or more versions to deal with simultaneously.

Brittleness is one of the biggest problems with data types defined in IDL. By "brittle" we mean that the types are hard to modify or version in practice without finding and changing all applications that use them. If you're lucky, you might get away with some simple changes like we did using the Extension Interface pattern in our example above, but most real-world applications aren't so lucky.

Another Versioning Approach

One way to try to avoid the brittleness explained above is to explicitly make your data types flexible and extensible to begin with. For example, consider the following classic type:

struct NamedValue {
    string name;
    any value;

This type is capable of holding any named data. CORBA tends to use this data type (or a sequence thereof) wherever it has to be able to handle any data type, such as in the parameter list for a request invoked through the DII (Dynamic Invocation Interface).

Using this flexible type, we can redefine our original Bug type:

typedef sequence<NamedValue> Bug;

Instead of a struct, Bug is now a sequence of named values. We've adopted the convention that each member of the sequence corresponds to a field in the original struct. In our original versioning problem, we wanted to add a new string member to our Bug struct named reported_by. With this new approach, adding a new field is trivial because we need only add a new item to our sequence:

CORBA::ULong len = my_bug.length();
my_bug.length(len + 1);
my_bug[len].name = CORBA::string_dup("reported_by");
my_bug[len].value <<= "elvis";

As long as all clients and servers using the Bug type follow the convention of indexing through a Bug field by field and operating only on those fields whose names they understand, all is well. You can add any field of any type you want to a Bug, and as long as you give that field a unique name, no properly coded application using the Bug type will break.

Unfortunately, this approach to type extensibility also has problems:

  • It lacks compile-time type safety. Any application can set the Any portion of a NamedValue to any type they like, and all applications must therefore check the type encoded in that Any before attempting to use it.
  • It lacks structure. Any application can create a Bug using any names and values for the fields. Two applications might decide to add a field with the same name, but with different types for the value.
  • It lacks capability. The C++ mapping of the IDL sequence type [5] is essentially a vector that lacks operations for appending or inserting at a given index. Creating a more capable type, such as a linked list or hash table, requires that you use IDL value types [6], which not all ORBs support.
  • It's inefficient. Any data types generally incur greater time and space overhead than other IDL data types. Although some ORBs optimize away some of this overhead, the Any-based approach will undoubtedly be bigger and slower than the original struct-based solution.
  • It requires additional programming effort. Perhaps worst of all, this approach requires a custom library built around the IDL sequence type to achieve type extensibility. The library must be able to manipulate fields by name and would need type information from the application to perform these manipulations correctly. The DynAny type [7] could come in handy for such a library. If you're smart or lucky enough, you might recognize in advance that this approach is useful for more than just your bug tracking system and create a general-purpose library to manipulate lists of NamedValues.

Versioning with XML

Unlike IDL, XML is designed to be extensible. Due to its flexibility, XML has become the "lingua franca" for data definition. This common language has two simple facts in its favor:

  1. XML can describe essentially any type of structured information in a regular and recognizable fashion. Data is represented using elements and attributes, with elements arranged hierarchically to impose structure.
  2. There exist a set of standard domain-independent tools for dealing with XML data definitions. Tools such as the DOM (Document Object Model) [8], the SAX (Simple API for XML) [9], and XSLT (XSL Transformations) [10] provide standard ways for creating, examining, and manipulating XML definitions, regardless of the data they describe.

XML allows you to avoid creating custom data definitions for your applications and also allows you to avoid creating custom tools to create, read, manipulate, and write those custom data definitions. This benefit is significant. The guts of many applications are primarily for data creation and manipulation, so if we can avoid creating and maintaining our own custom data and custom code to manipulate that data, we can realize significant savings in the cost of developing our software.

Since XML elements are tagged, they are often touted as being "self-describing." Of course, this is only partially true. Obviously, the tags identify the element data, but anyone dealing with the element must still understand the semantics implied by the tags, as well as the semantics of the data itself. Note, by the way, that CORBA applications that exchange Any data types have exactly the same issues.

For our bug tracking example, we might instead define our bug data in XML. The following shows an example of a bug defined in XML:

    <synopsis>DynStruct broken</synopsis>

For now, we won't focus on the details of this definition, though it should be intuitive to anyone who's seen XML (or even HTML) before. Instead, we want to focus on how manipulating this bug definition compares with manipulating the IDL Bug type based on the NamedValue type.

  • The IDL NamedValue approach relies on convention. Because the NamedValue-based Bug is a non-encapsulated data structure, chances are that some application will get the conventions wrong and manipulate the data incorrectly. You could, of course, wrap the sequence into a value type to encapsulate it, but then you risk reintroducing the GIOP versioning problems described above if your value type definition contains private data members that could change as a result of system evolution.

    With the XML approach, convention isn't needed. It's easy to see how the XML bug definition corresponds to our original Bug struct, as the two even have similar appearances. We can't say the same for the NamedValue approach. Also, unlike with the NamedValue approach, the element names (e.g., "reported_by" and "bugnum") of the XML bug are not just strings chosen by convention, but are part of the type definition.

  • The IDL approach requires custom coding to manipulate the Bug sequence, as we described above.

    With the XML approach, we can use standard tools, such as DOM or SAX, to manipulate bug instances in our applications. The DOM approach builds an in-memory tree representation of the bug structure, allowing our application to navigate the tree to examine and change values. SAX provides an event-driven API that allows us to register interest in certain XML parsing events, such as when a particular element starts and ends. For example, we could register callback handlers for each of the bug fields in order to process an incoming bug report. Implementations of both DOM and SAX are easy to find, such as the free tools of the Apache XML Project (available at http://xml.apache.org/). Once you learn to use them for one application, they work the same for your other applications, even if the XML data for those applications is completely different.

  • Versioning with the IDL approach, as described above, is problematic.

    With the XML approach, the bug definition can be extended without much trouble. For example, to add another field that records the date on which the bug was filed, we need only add a new <reported_on> element. Applications that don't understand that element can ignore it, while those that do understand it can make use of it. This explanation is somewhat simplified, as it's possible to be much more strict about this by validating a bug instance against a schema definition or DTD. (We have not shown a schema definition or DTD for our bug definition, but it would be similar to a type definition in a regular programming language.) However, unlike in the IDL case where it's simply impossible to fool GIOP into allowing a struct to be extended, the choice of whether to use schema validation is entirely up to the XML-based application.

Using XML to define your data types provides more flexibility for applications that need it, but note that you still need interfaces so that you can pass the XML data between applications. CORBA systems using XML end up using IDL to define the interfaces and methods supported by the system and using XML to define the data that's manipulated and exchanged within the system. Thus, we end up with the best of both worlds. Of course, the use of XML can still have some drawbacks because it, like the CORBA Any data types, generally incurs non-trivial time and space overhead for encoding/decoding, transmission, and parsing. While many business applications can afford this overhead, for other applications (particularly performance-sensitive real-time and embedded systems) it's a showstopper.

The final key to the puzzle is how best to pass XML between CORBA applications. One obvious approach is to simply send XML documents as strings or wstrings, and in fact many CORBA applications do this today. However, it's not always the best approach, and we'll discuss why in our next column.

Concluding Remarks

In this column, we investigated the OMG IDL versioning problem. We showed that IDL data definitions are brittle in the face of extension and evolution. We presented several approaches to handling versioning in CORBA systems, including how XML fits into the picture. Overall, applications whose data types change frequently should avoid defining those types in IDL and should instead define them in XML. This results in a system that uses IDL to define the interfaces and methods — the system control points — and uses XML to flexibly define the data that the system manipulates. Such systems play to the strengths of both IDL and XML, while avoiding many of their weaknesses.

Our next column will investigate more of the relationship between CORBA and XML, including a look at the OMG XMLDOM specification, as well as a discussion of SOAP, CORBA, and web services. We look forward to it and hope you do too.

Please feel free to comment on this column, or on any of our previous columns, by emailing us at object_connect@cs.wustl.edu.


[1] Extensible Markup Language (XML) 1.0 (Second Edition). W3C Recommendation. World Wide Web Consortium, http://www.w3.org/TR/REC-xml, October 6, 2000.

[2] D. Box, A. Skonnard, and J. Lam. Essential XML: Beyond Markup (Addison Wesley, 2000).

[3] Apathy, http://www.cs.wustl.edu/~schmidt/demotivation.html.

[4] D.C Schmidt, M. Stal, H. Rohnert, and F. Buschmann: Pattern-Oriented Software Architecture: Patterns for Concurrency and Distributed Objects, Volume 2 (Wiley & Sons, 2000), http://www.posa.uci.edu/.

[5] Object Management Group. IDL C++ Language Mapping. http://www.omg.org/technology/documents/formal/c++.htm, 1999.

[6] Object Management Group. CORBA 2.3.1 Specification. http://www.omg.org/technology/documents/formal/corba_2.htm, 1999.

[7] Michi Henning and Steve Vinoski. Advanced CORBA Programming with C++ (Addison Wesley, 1999).

[8] Document Object Model (DOM) Level 2 Core Specification. W3C Recommendation. World Wide Web Consortium, http://www.w3.org/TR/DOM-Level-2-Core/, November 13, 2000.

[9] SAX 2.0: The Simple API for XML. http://www.megginson.com/SAX/.

[10] XSL Transformations (XSLT). W3C Recommendation. World Wide Web Consortium, http://www.w3.org/TR/xslt, November 16, 1999.

About the Authors

Steve Vinoski is chief architect and vice president of Platform Technologies for IONA Technologies and is also an IONA Fellow. A frequent speaker at technical conferences, he has been giving CORBA tutorials around the globe since 1993. Steve helped put together several important OMG specifications, including CORBA 1.2, 2.0, 2.2, and 2.3; the OMG IDL C++ Language Mapping; the ORB Portability Specification; and the Objects By Value Specification. In 1996, he was a charter member of the OMG Architecture Board. He is currently the chair of the OMG IDL C++ Mapping Revision Task Force. He and Michi Henning are the authors of Advanced CORBA Programming with C++, published in January 1999 by Addison Wesley Longman.

Doug Schmidt is an associate professor member at the University of California, Irvine. His research focuses on patterns, optimization principles, and empirical analyses of object-oriented techniques that facilitate the development of high-performance, real-time distributed object computing middleware on parallel processing platforms running over high-speed networks and embedded system interconnects. He is the lead author of the book Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked Objects, published in 2000 by Wiley and Sons. He can be contacted at schmidt@uci.edu.