WSDL: A New XML-Based Web Site Description Language

Final

G. Wade Johnson

Advisor

Dr. Stephen Huang

Acknowledgements

I would like to express my sincere gratitude to my advisor, Dr. Stephen Huang, for his guidance and patience during this research. I would also like to thank my co-workers at Telescan, Inc., in particular Dr. Richard Carlin, Rick Hoselton, Tuyen Tran, John Gallagher, Julie Liu, Kathy Hoang, and Julie Carroll, all of whom provided some insight into this problem and its solution. Most importantly, I'd like to thank my wife, Debbie Campbell. Without her urging, I would not have started this degree, and without her support and encouragement, I would not have completed it.

Abstract

As web sites become larger and more dynamic, the difficulty of developing and maintaining them becomes more apparent. This thesis explores using higher-level abstractions to design web sites. In particular, a new Web Site Description Language (WSDL) is defined for describing the structure of a web site at a high level of abstraction. A Page Layout Language (PLL) is used to describe the general presentation of individual pages. Both of these languages are defined as Extensible Markup Language (XML) applications, allowing them to benefit from tools and libraries designed to support XML. These two high-level descriptions are combined with the content of the pages in a compilation phase that creates the entire web site. This extra phase allows for good separation of web site structure, page presentation, and content, with little cost at request time. In a commercial environment, earlier versions of the system have shown improvements in the areas of web site design, implementation, and maintenance. Preliminary testing shows that WSDL may generate even further improvements.

Chapter 1: Introduction

When the World Wide Web first began, most web sites were small. They could be built and maintained by a small number of people. But the Web has been growing at an ever increasing rate. The number of large web sites is also growing. Many of these web sites are constantly being revised. As web sites grow larger and more time and resources are applied to them, it becomes obvious that the old, ad hoc method of design is not working.

Some of the problem may be caused by an incorrect focus. Much of the development on the World Wide Web treats web sites simply as collections of web pages. In this context, a web page is a single HyperText Markup Language (HTML) document with its included images and style sheet information. But a web site is more than a collection of pages. The interconnections and navigation among the web pages and consistency of presentation and design of the pages determine the user experience and usability of the web site.

The Web Site Description Language (WSDL) uses XML to address page presentation and site structure. This web site description can be used on top of systems that use external content sources, such as databases, to build dynamic pages. The result is a system for generating flexible, maintainable, realistic web sites. Most importantly, the output from this system can be displayed by current browser technology, because it is pure HTML.

1.1: Comparison to Other Systems

Many other systems have been designed to generate a dynamic, robust web experience. These solutions range from fully dynamic pages using CGI scripts, through various forms of include technology and server-side scripting. Almost every one of these systems focuses on page generation. The only site-level support most of these systems provide is the ability to include common code or HTML in all pages.

WSDL, on the other hand, focuses on the web site as a whole. The output from the WSDL processor may even be used as input into one of the other systems. WSDL is not a complete replacement for any of these other technologies. It is an enhancement to any form of web site development. In addition, WSDL is designed to scale smoothly to larger, more complicated web sites. Finally, WSDL is designed to allow easy replication of site structure and page presentation. This feature is very useful for developers who must maintain or develop multiple semi-independent web sites.

1.1.1: CGI Scripts

Common Gateway Interface (CGI) scripts are a solution to a different problem. The CGI specification was developed at a time when most of the content on most web sites was static. The CGI script would then provide a limited amount of dynamic content on a fraction of the site.

The advantage of CGI was the relative ease of adding dynamic content, compared to altering the web server itself. The two main disadvantages of CGI are speed and the need for a separate program. Many current web developers are not programmers and, therefore, are wary of approaches that require programming knowledge.

CGI scripts are still useful and will probably continue to be useful for years to come. However, they are not suitable for the construction of a large production site.

1.1.2: Server Side Includes

Server Side Includes (SSIs) were created as a way to include boiler plate text into web pages. Early in the evolution of the web, the need for common sections of HTML on many pages in a site was recognized. This was an early attempt to begin creating web sites instead of web pages.

However, the focus of SSI is still web pages. Later, as other functionality was added, SSI evolved into the various forms of Server Side Scripting.

1.1.3: Server Side Scripting

Most of the major approaches to dynamic content in use today are based on Server Side Scripting. Examples of this approach include ASP, JSP, PHP, XSP, Cold Fusion, and many others. The basic idea is simple: place a scripting language inside special markers in the HTML content. The web server recognizes these markers and interprets the content of these markers to generate the dynamic portion of the page.

Most scripting languages are chosen to be simplified forms of a general purpose programming language. The hope is that a mere web developer can understand these, without the need for advanced programming skill.

In practice, the developer does need to be able to program in order to do anything significant. The interpretation phase slows the response of the server as the number of page requests in a given time goes up. And last, but not least, this approach is still page-centric.

1.1.4: Cocoon

The Cocoon system is being developed as part of the Apache project. It has a few things in common with the WSDL system[12]. There is a strong focus on separation of XML generation, processing, and presentation. As described in Example: Cocoon, Cocoon relies mostly on convention to keep consistency across generated pages. Like most web site production systems today, Cocoon is definitely page-centric.

One interesting possibility for further research is the use of WSDL and the skeleton presentation system to generate the templates that some of the Cocoon layers use as input. This would help to enforce consistency where possible, yet still gain the impressive run-time benefits of the Cocoon system.

1.1.5: Proprietary Web Servers

From the very beginnings of the Web, there was always an alternative when functionality beyond the capability of normal web servers was needed. A web server is not a terribly difficult program to create. Some people used this fact to create special-purpose web servers to support the functionality they needed.

This approach gives all of the flexibility that anyone could require. However, the cost is pretty high. The developer must maintain the server himself. He does not get the benefit of upgrades provided by the server vendor. He must build any needed support, unless a standard server can be used as a starting point.

In spite of this disadvantage, some companies have built their own web servers connected to proprietary databases and processing solutions on the back end. If designed well, these systems can outperform a standard server on some kinds of requests. They can also perform functions that a normal web server cannot match.

These kinds of systems do not compare with WSDL at all, because they are solving a very different problem. However, if a proprietary system can support some form of template system, that allows it to read HTML from disk to use in formatting the output, WSDL can simplify the customization of the output of this proprietary web server.

1.2: Thesis Overview

Quite a bit of background material is necessary to present WSDL. Small web sites of a dozen or so pages might not benefit as much from a tool like WSDL. An Overview of Web Development describes some of the problems inherent in large, production web sites. These problems are the reason WSDL was created. WSDL is built on a relatively new technology, the Extended Markup Language (XML). A large amount of hype and misinformation surrounds XML at this time. An Overview of XML attempts to explain the issues surrounding this language. XML in Current Use describes some of the ways that XML is being used at present.

A High-Level Approach to Web Site Design describes the design goals and assumptions that are part of the WSDL system. Any complicated system goes through multiple iterations of design and testing. WSDL is no exception. The Evolution of WSDL describes the various early implementations of the system that would eventually become WSDL.

The appendices fully document the WSDL and PLL languages, along with complete Document Type Declarations (DTDs) for each. The Source for the Example Site gives the source for an example site which shows WSDL and PLL in use on a small, but functional, web site.

Chapter 2: An Overview of Web Development

In order to see the need for WSDL, it is necessary to examine the problem more closely. Some of the difficulties with developing large web sites can be related to a lack of software development experience, misunderstanding of scale, and misestimation of maintenance cost.

2.1: Lack of Experience

Most people designing and constructing web sites today have little or no experience in software engineering or information architecture[7]. Quite a few have no background in presentation or graphic design either. In many cases, a web developer's sole qualification for the job is access to Microsoft FrontPage or some other WYSIWYG tool.

This lack of experience often shows in the developer's focus. A beginning web developer tends to focus on individual pages. Depending on the developer's background, most of the emphasis is placed either on content or presentation. At this stage, navigation or structure of the site is almost always an afterthought. Consistency, if it happens at all, is usually an accident.

As most web developers gain experience, they become more adept at the various display tricks used to deliver more interesting-looking pages. Fortunately, a few begin to realize that site structure is also important. However, even at this stage, most web developers consider the site structure and page layout to be entwined as the concept of presentation. As a side effect, the developer usually spends too little time on the overall site structure and too much time on the details of the site's look[4].

2.2: Web Sites vs. Web Pages

Most people, including most web developers, tend to think of a web site as a collection of pages[7]. This viewpoint is a natural one considering all they see and build is the pages. This approach to web site design is fine when designing sites of less than a dozen pages. However, this approach does not scale to web sites containing hundreds or even thousands of pages. As a web site grows in size, the designer must focus more on the structure and consistency of the overall web site. Users are not as impressed with an individual page if they cannot find what they need on the web site.

When designing a web site, the developer must focus on the intended audience, the information to be presented, and the relationships between various pieces of information. These issues give rise to the overall look of the site, as well as its navigation model[4].

In Web Navigation: Designing the User Experience, Jennifer Fleming describes several different kinds of web sites, including shopping sites, community sites, entertainment sites, and information sites[4]. Each of these general kinds of site has a different goal and approach that should be supported by a different design. This level of design is difficult to maintain across all of the pages of a web site that is changing over time. Without some way of capturing this high-level design, developers usually do not have a chance of maintaining consistency in the face of change.

2.3: Maintaining the Web Site

In many production web sites, the change requests begin the moment the web site launches. Some may be issues that were deferred until Phase Two. Others may involve perceived ``cool'' features seen on other sites. As the developer makes these changes, the design of the web site, which was so firm only a few days before, can begin to blur. Maybe good business reasons or politics obscure the clarity of the original vision. In any case, changes can soon override the original design.

If no real solid reference for the design exists, it is very hard to adapt the design to change. It is even harder to adapt the design to requested changes if no one remembers the design itself. In effect, if no one can point to the design, one can argue that it does not exist. Unfortunately in many instances, if there is a design document, it is not part of the code. Over time, the code diverges from the document, back to the state where no design exists.

When a web site is maintained without a concrete design, any changes made during maintenance are likely to cause inconsistencies in the site. Part of this effect is the entropy affecting any complex system. But the more important portion of the problem is the difficulty of recognizing the inconsistencies as changes are made. With the details that describe the overall design spread out in a thousand files, it is extremely hard to see the design of the web site breaking down, but the symptoms are there. Bug fixes or corrections do not show up everywhere. Different portions of the site are not consistent. Lastly, changes that should take ten minutes consume hours of development time.

2.4: WSDL as a Possible Solution

A high-level Web Site Description Language (WSDL) could help to alleviate these issues. If the overall design of the site is described in one place, a single senior developer or a small group of senior developers can provide the insight and experience to design the web site. This approach allows the experience of the senior developers to be spread across more projects and still use more junior developers for most of the development work.

By describing the web site at a higher level, WSDL encourages the developers to focus on the web site as well as the individual pages. By reviewing the WSDL description of the web site, the developers can get an overview of the entire system. Issues such as structure and site consistency are easier to judge at this higher level of abstraction.

WSDL helps to document the overall design of the site in the best of all possible locations, the actual source code for the site. The advantage of this approach is the fact that the source and documentation cannot diverge, because there is only one copy of the design. Also, by keeping the design in this fashion, it is possible to go back to read the source and determine what the designer of the site intended. The ability to read the overall design in the source helps the maintainer of the web site to retain consistency when possible, and to adapt the design in other cases.

A language designed for this purpose should be declarative: it should define what is needed, not how to implement it. It should be easy to write, parse, and verify. The next chapter describes the Extensible Markup Language (XML), which offers an appropriate syntax for defining such a language.

Chapter 3: An Overview of XML

The Extensible Markup Language (XML) is a not a language for marking up text. It is not a replacement for the HyperText Markup Language (HTML). XML is a standard syntax for in-line markup for use in text documents. XML also includes facilities for defining a set of markup elements that are used together as an application. To really explain XML, however, requires a little history.

3.1: Origins of XML

In order to discuss the origins of XML, it is necessary to review two other markup standards, the Standard Generalized Markup Language (SGML) and HTML. Both of these standards influenced the design of XML in many ways.

3.1.1: SGML

SGML was created in an effort to standardize markup systems[25]. In order to handle all of the features of the markup systems of the time, the SGML design was comprehensive, with many optional features and shortcuts. These features made for a very powerful system. These same features and shortcuts make SGML tools difficult to program.

SGML is not actually a markup language, it is a meta-markup language[3]. SGML provides a language to support the definition of new markup languages that are called applications. All SGML applications have a similar structure. Unlike many proprietary systems, all SGML markup uses normal printable characters. Markup is defined in terms of elements and text. Elements are made up of tags and attributes. All elements, tags, and attributes must follow a well-defined format[32]. This simplifies validation and translation of the documents.

Separate from the issue of format is the validity of the tags used in a particular document. SGML describes the use of a Document Type Definition (DTD) to specify which SGML application pertains to the document. A processing system could then use information from the DTD to validate a particular document.

3.1.2: HTML

The HyperText Markup Language (HTML) has been an important factor in the success of the World Wide Web. HTML was based on SGML but was not as formalized in the beginning. Contrary to popular usage, HTML was designed to describe the structure of a document, not its presentation. To quote the HTML Home Page:

For most people the look of a document - the color, the font, the margins - are as important as the textual content of the document itself. But make no mistake! HTML is not designed to be used to control these aspects of document layout.[31]

However, the first browsers defined a default presentation for many of the structural elements. As a result, people focused on the presentation aspects of HTML and the structural meaning of most tags was forgotten. When the primitive presentation support in HTML was found to be inadequate, vendors, like Netscape and Microsoft, extended HTML to support more presentation control.

In addition, the original browsers were defined to be extremely forgiving in their dealings with HTML[39]. This tended to support a large number of people being able to create HTML with little or no training. Many people began to rely on the side effects of the browser's interpretation of invalid HTML. This scenario has led to very complex software that tries to recover from almost any mistake the HTML author may make.

Another disadvantage of this forgiving approach to HTML interpretation is the difficulty of creating and maintaining an HTML parser and display system. When the World Wide Web first became popular, many online companies had their own browsers. Unfortunately, these browsers were all inconsistent in their handling of HTML. As HTML evolved, each company had to make changes in their proprietary code to deal with the new functionality. One by one, these vendors dropped out of the market, leaving only a handful. Today, that handful is mostly competing in how nonstandard they can be. If HTML had been more standard, it would have been possible to build standard parsing libraries that could be used by multiple browsers.

3.1.3: XML

The design of XML tries to take the best features of both SGML and HTML while leaving behind their worst disadvantages. Like SGML, XML is a language for defining new markup languages. But, XML uses a much simpler feature set than SGML, making it easier to parse. Unlike HTML, all XML-based documents must be well-formed. This means that XML parsers and viewers must report any errors detected when reading a document. They are not allowed to guess what the XML author meant and then continue. This makes parsers and other tools much easier to build[21]. Moreover, the presentation aspects of an XML document are defined in a separate file. This reduces the temptation to modify an XML document for the sake of presentation.

One of the major roadblocks to SGML spreading across the Web is the difficulty of implementation of tools that fully support SGML[21]. Unlike SGML, XML was designed with simplicity and implementation in mind. Many of the optional features of SGML have been dropped. This has already resulted in a much simpler job for developers who wish to build XML tools.

3.2: XML Structure

The structure of an XML document is similar to that of an HTML or SGML document. The following is a fragment of an XML document:

<para type="example" align="none">This is a <em>short</em> example
paragraph, containing three elements.<xref ref="examples"/></para>

An element is defined by start and end tags and the content they surround. The whole example above is one para element. The content of an element can be text, markup, or both. In the example, the content of the para element is two pieces of text, an em element, and an xref element. The content of the em element is the text short. The xref element has no content.

The start tag is distinguished from the rest of the text by starting with the character < and ending with the character >. The start tag begins with a name and may have optional attributes before the closing >. In the example, the para element has two attributes, type and align.

The end tag for an element starts with the two character combination </ and ends with the > character. Nothing is allowed in the end tag except the name used in the start tag.

An element with no content can be denoted by an empty-element tag, like the xref element in the example. An empty element tag is just like a start tag except it ends in the two characters /> and it does not contain any content[21].

Attributes can only appear in start tags or empty-element tags. All attributes take the form of a name followed by an equal sign (=) followed by a value in either double or single quotes. An attribute may only appear once in a given tag. Unlike HTML, the quotes are required in XML attributes[16].

Elements are also required to nest properly. If an element begins inside another element, it must also end inside the same element. This requirement for well formed markup is probably one of the most commonly violated rules in HTML.

3.3: Rendering XML

One suggested approach to the use of XML involves serving XML in exactly the way web sites currently serve HTML. However, unlike HTML, an XML document probably contains elements that the browser does not know how to display. In fact, many XML documents will consist of markup that the browser does not know how to interpret. In order to display XML as a web document, some form of style sheet is needed to explain formatting information to the browser[17].

There are several standards for style sheet languages available on the Web. The two most commonly associated with XML are Cascading Style Sheets (CSS) and Extended Stylesheet Language (XSL) style sheets[24]. CSS style sheets are currently in widespread use on the Web for HTML. Although not consistently supported by the major browsers, CSS is the most standard way of separating formatting information from HTML content.

XSL, on the other hand, was developed exclusively for use with XML. XSL is a very ambitious system including support for extensive rewriting of the XML input using the XSL Transformations (XSLT)[19] subset, as well as a comprehensive formatting model. The biggest problem with XSL at the moment is the lack of browser support.

3.4: XML as Document Markup

XML can be used to define markup languages for specific kinds of documents. A good example is Extensible HyperText Markup Language (XHTML). XHTML is a new version of HTML rewritten to conform to the rules of XML. In general, XHTML documents can be displayed by HTML browsers except for a few minor issues. These include the proper termination of empty elements (<br/> instead of <br>) and non-minimized attributes (nowrap="nowrap" instead of nowrap).

Any group can define an XML application that describes markup for the particular kinds of documents they use. This markup describes the structure of the documents in a way that makes sense for them. For example, instead of forcing their widgets catalog to fit HTML, which is a format defined for internal documentation at CERN,[15] they can use a reasonable catalog format specified in XML.

Another good example of XML used for document markup is DocBook [11]. DocBook is an SGML application for writing many kinds of documents. After the XML recommendation was released, the maintainers of DocBook began a conversion process to make DocBook fully XML-compliant.

In some cases, it is useful to include portions of one XML application in another XML application to reuse a working design. For example, several XML applications include XHTML (or just HTML) as content for some elements in places where they need simple presentation markup. People who are familiar with HTML will recognize the tags and have little problem formatting text for those areas.

3.5: XML as Data

One of the more interesting uses of XML has absolutely nothing to do with documents. XML is actually a very good format for describing complex data structures[5]. Nested data is often difficult to transfer between two programs, systems, or even separate runs of the same program. An application does not need to be extremely data intensive to have this problem. For example, manipulating the configuration information for a complex program can be a relatively major undertaking on its own. Unfortunately, that time is often better spent on the main purpose for the program.

XML supports a natural format for hierarchical data. Because it follows a standardized format, many people have written parsers and libraries for reading and parsing XML[21]. As a result, a programmer can use this format without writing all of the code needed to read it. As more code is written to support both reading and writing XML, it will become easier for programmers to use XML as a data interchange and storage format.

Since XML is mostly human-readable, it is easier to verify and transform data in XML than in some proprietary binary formats[5]. Moreover, XML is inherently extensible. It is relatively easy to write code that ignores unrecognized elements and attributes. This makes it easier to write code that can deal with newer versions of the data than they were written to support. Newer versions of the code can also deal with older files without a huge amount of code being devoted to transforming the data.

Probably one of the most exciting twists on this idea is using XML as a serialization format for objects. Serialization is a process in which objects are converted to a form that survives the termination of the current process. This often involves writing the object to disk. Each element can have not only a value, but also type and context information that can help in reconstructing objects. Some have even suggested the possibility of serializing an object from one language and reconstructing an equivalent object in another language[10].

XML as a data-interchange format has become such an important idea that several efforts are underway to develop a reasonable data-typing system in XML. These schema languages were conceived to overcome a fundamental shortcoming in XML. XML has only one real data type: text. Since it was designed to mark up text documents, this is not surprising. However, now that XML deals with other kinds of data, a more complete typing system is required.

3.6: XML Parsers

XML defines languages in which documents and data are written. In order to make use of this information, a program uses a parser to convert the text input into a form more suitable for processing.

3.6.1: Standard Parsers

Several standard XML parsers have been built in different languages, including C, Java, and Perl[21]. In general, these implementations include full Unicode support and may support validation against a Document Type Definition (DTD). Since many of these parsers are freely available, there is very little need to write another XML parser.

Standard parsers can be divided into four types. A parser can be validating or non-validating. This term refers to whether or not the parser compares a document against its DTD to ensure that it conforms to the definition. In addition, the programming interface to the parser can be either tree-based or event-based[9]. Tree-based parsers parse XML and return a tree of objects that corresponds to the XML. Event-based parsers, on the other hand, call event handlers when certain important events occur in the processing of the XML, such as when a start tag is encountered or when text context has been identified.

3.6.2: Ad Hoc Parsers

Since all XML must be well-formed, it is not difficult to build a quick parser that supports a subset of XML. In particular, if a programmer only plans to deal with straight ASCII text and work with relatively small XML files, he may decide to construct a quick parser on his own. Unlike HTML, XML makes this relatively easy. This means that even small applications that do not need the overhead of Unicode and validation can benefit from XML[42].

Although writing an XML parser from scratch is usually not necessary, some programmers will probably do it for small applications. In some cases, the programmer may reason that the overhead of learning and integrating an available parser is not worth the benefit. In other cases, the programmer may not be aware of the available offerings. Whatever the reason for this decision, it was one of the fundamental design criteria of XML. Design goal number 4 from the XML specification states: It shall be easy to write programs which process XML documents.[3]

3.7: XML Editors

XML editors are available from many companies. Examples include Merlot by ChannelPoint, Swish by Zveno, Epic Editor by Epic, XML Pro by Vervet Logic, and WordPerfect by Corel. There are three categories of XML editors: text editors with XML features, structure editors, or editors that support both.

Text editors with XML features allow editing of XML as a text file. This gives the maximum amount of control over the layout of the XML. This kind of editor may validate against a DTD and may support other features such as element completion. Another very useful feature supported by some of these editors is well-formedness checking. This is similar to grammar or spell checking in a word processor. It does not guarantee that the output makes sense, but at least it looks like it makes sense.

Structure editors usually display a tree-like structure of elements and attributes. The editor provides the ability to change the values of attributes on an element or add or modify elements and text. These kinds of editors do not need a check for well-formedness, because there is no way to enter data that is not well formed.

XML editors that support both text and structure editing usually contain multiple views that allow editing in either style. Some of the more advanced editors support both views open at the same time, with the two views kept synchronized during an edit session.

Chapter 4: XML in Current Use

4.1: The Hype

XML has certainly been a magnet for hype lately. The pitch normally goes something like this:

Not only is XML going to replace HTML, but it will also solve almost any documentation or data problem in existence today. XML will soon make any form of data exchange painless and trivial. In fact, XML will quickly replace every data and documentation format in existence.
XML is such a great idea that if it had been available 40 years ago, there would have been no worries of a Y2K bug.
Soon all browsers will support XML and you will be able to view the Web however you like. Intelligent agents will search the Web for you and bring you only the content that interests you, without the useless information or annoying ads.
Because XML is a self-describing format, search engines will be able to perfectly match the search criteria you provide. Search engines will be able to use XML markup to make certain that the keywords you requested really pertain to the subject you are searching for. Never again will you wade through hundreds of useless links to find the few that actually pertain to your subject of interest.

4.2: The Real Advantages

In reality, XML does give the potential for better definition of Web content. Instead of pages devoted to nothing but display tricks, XML could potentially give Web content that is much more searchable and customizable[22]. However, the key word here is potential. Much careful work is needed for even a fraction of this potential to be realized.

In order to realize the potential of XML, people need to develop useful application definitions. Many of the applications written today are either huge standards that attempt to do everything or small special-purpose languages used inside a single company or group. As XML matures, more people should develop general purpose applications that are small enough to implement and work with. In addition, the markup is not particularly useful if no one can understand it. As XML applications become more commonplace and well-specified, search engines may be able to tailor a search based on the content of particular pieces of markup. Until then, the engines will probably continue to search all of the text, regardless of markup.

XML does provide a promising format for dealing with database records that need to be placed online. The ability to create specialized document formats is definitely an advantage to those who are using the Web for online catalogs and commerce. The web sites that do provide their output in well-designed XML will definitely be easier to search. These web sites will also be able to provide much more usable and useful content to their users.

The benefits of XML to businesses that wish to distribute data to other businesses on the web are beginning to be recognized. XML can be used to describe and transfer data nicely. If rendering is not an issue, XML can still be used as an effective trade language between many different kinds of systems.

4.3: Disadvantages

The major disadvantage of XML is its verbosity. The text-based format specified by XML is much more verbose than a binary format for the same data[10]. Most XML documentation goes on to suggest that this disadvantage can be reduced by data compression of some kind.

4.4: Roadblocks to Widespread Use of XML

XML is technically superior to HTML and easier to process. Unfortunately, as has been shown in many parts of the computer industry, technical superiority does not ensure automatic acceptance and widespread use. If most web sites continue to use HTML exclusively, XML's technical superiority does not matter.

Ironically, the biggest roadblock to XML's acceptance may be its processibility. A large portion of the driving force of the Web today is commercial. Many sites either promote a particular company or display ads that are sold to generate revenue to finance the site. With XML's ability to describe content, it would be easy to build more applications for filtering out the ads and only returning the meat of the site. Moreover, it will become even easier to parse content off of a company's web site and display it on a different site. For these reasons, companies might shy away from XML, or they would need to explore new business models and ways to protect their data. This is likely to slow the acceptance of XML by the larger commercial web sites.

Another roadblock to XML's widespread use is the relative inexperience of Web developers. Many developers that have experience with finessing HTML to generate the perfect output will be reluctant to move to this new format where their tricks are irrelevant. They also may rebel against giving up any fancy editing tools they have become accustomed to. This effect will certainly put a damper on many web sites' change to XML.

Even if developers do begin to switch to XML, there is another potential problem. Not everyone will design reasonable XML applications. As is apparent with HTML, some web sites will be designed using the worst XML possible. Two reasons for this are either the designer used an inappropriate XML application for the job or the site designer created a new XML application without knowing what he was doing. For example, in Building XML Applications[10], the authors try to show how to use XML to lay out a web site. They choose a standard XML application to demonstrate reuse. However, the application they choose, CDF, is not really appropriate for this use. They end up ignoring required elements in the language and using other elements for uses that do not quite match the language design. This example misuses XML in the same way that many people, including the authors, complain about regarding HTML.

One of the discouraging predictions made by skeptics of XML is the Tower of Babel scenario. The basic form of this prediction is simple:

Everyone embraces XML.
Everyone creates his/her own XML application.
No one uses anyone else's XML application unless forced.

The result is that all XML applications become proprietary and we are back where we started from. However, now things are worse. Not only is it difficult to convert between different people's formats, but all of the files are now bigger because they are not using compact binary formats.

4.5: XML Compiled into HTML

Many of the sites that currently use XML are compiling an XML description of web pages into HTML or Active Server Pages. Most of these web sites are using XML as a richer version of HTML that can be rendered in different ways. For example, the XML source for a help site may be converted to HTML for browsing and also converted to a format more convenient for print distribution.

Although some sites convert XML to HTML at request time, many others convert the XML into HTML and serve the HTML directly. Most of these use special purpose code to determine the format of the HTML output. Since this is a new field, many different implementations and approaches are being tried.

4.5.1: Advantages of Compiled XML

By using a compiled XML approach, a developer can easily develop and maintain a consistent look on a web site. Unlike equivalent solutions using server-side includes, ASP, or CGI scripts, there is no run-time penalty when a page on the site is accessed. Unlike these other technologies that are often used for content reuse, this approach combines each of the pieces of the page off-line before they are accessed.

This approach actually works very well with dynamic pages such as ASP. Only the parts of each page that really must be dynamic are left as ASP scripting code. Any static text that would have been included is inserted using the compiled XML approach. In practice, this can significantly improve the performance of a site.

4.5.2: Disadvantages of Compiled XML

The major disadvantages of compiled XML involves retraining of HTML authors. Unlike most programmers, HTML authors are not usually comfortable with a compile stage. They usually want to make a few changes and see them instantly. Moreover, it is quite difficult to convince them not to change the output HTML. Changes to the HTML output will of course be lost any time the HTML is regenerated from XML.

The other major disadvantage is caused by the fact that the authors are no longer editing straight HTML. This means that the HTML authors usually cannot use the normal slick WYSIWYG editing tools that they may prefer, like Microsoft FrontPage. This often generates resistance to change. XML editing tools do exist, but they are not in general WYSIWYG. This is because the presentation of the XML is not specified by document being edited.

4.5.3: Examples of Compiled XML

4.5.3.1: Example: gxml2html

The generic XML to HTML conversion tool (gxml2html) is a relatively simple system for compiling XML[18]. In many ways, its simplicity is its greatest strength. Josh Carter, the author of gxml2html, makes the argument that content and presentation should be separate. This is not new -- that sentiment is the whole reason behind style sheets. Unlike other systems, the author does not attempt an all-powerful solution that can take any XML and convert it into arbitrary HTML. Instead, gxml2html uses HTML templates, which are snippets of HTML used to replace a given XML element.

Carter goes on to argue that systems like XSL, while much more powerful, are too difficult to learn. There is also the suggestion that XSL may be overkill for many applications. As a proof of concept, the entire site describing gxml2html is built using the tool[18]. Like most XML compiling systems, gxml2html focuses on single page conversions, although it does support converting an entire working directory at a time.

4.5.3.2: Example: Cocoon

Another compiled XML approach has been developed by the Apache Software Foundation. The Cocoon project seeks to separate creation, rendering, and serving of Web content[12]. This is based on a three stage approach:

XML Creation
XML Processing
XSL Rendering

Cocoon's approach to separating these stages involves multiple stages of XSLT processing. A dynamic caching system is used to reduce the run-time overhead of this approach. The documentation also suggests that Cocoon can be run in an off-line mode to compile the HTML to disk for later delivery by a web server[13].

Preliminary study of the Cocoon system shows that it is possible to perform good separation between the content and presentation. In general, separating presentation and content is a good idea. Unfortunately, there does not appear to be any particular mechanism in place to enforce this separation. Additionally, Cocoon is still very page-centric in its design.

4.6: Other XML Applications on the Web

There are already several XML applications available on the Web. Some of these applications have been developed by committees attempting to standardize some form of document. Others have been developed by companies attempting to make a name for themselves in this new field. Still others have been developed by one, or a handful, of developers trying to solve a particular problem. Because this field is so new, it is quite easy for a lone programmer working on a personal project to contribute as much as a group of multinational standards organizations.

Current XML applications cover fields ranging from molecular description and modeling (the Chemical Markup Language, CML) and mathematical formulae (MathML) to multimedia applications (Synchronized Multimedia Integration Language, SMIL) and graphics (Vector Markup Language, VML and Precision Graphics Markup Language, PGML).

Many XML applications are connected directly to Web development (The Channel Definition Format, CDF[1] and The Wireless Application Protocol, WAP[23]). Others were developed for different fields completely, such as GedML for genealogy, and RELML for Real Estate Listings, as well as a host of others.

4.7: Hybrid XML Approaches

One promising area of current development is a hybrid of the above-mentioned approaches. Using this hybrid approach, XML and HTML are used together, possibly even mixed in the same file.

Using Microsoft's Internet Explorer, developers can embed XML data directly into their HTML[10]. The embedded XML is then processed by scripts or applets for presentation to the user. The compiled XML tool gxml2html, described above, uses HTML templates to help in the conversion from XML to HTML.

The possibility explored in this thesis uses the XML-compiler approach to generate templates that define the overall look of a site. The main data of the site is provided through other mechanisms. This could be static HTML or from a database accessed using XML. A run-time process then combines this data with the templates to produce output pages.

These approaches allocate resources well. A large portion of each HTML page on a web site is devoted to common elements such as navigation, logos, and footer information. There is no need to generate this text from the raw XML on every page request. On the other hand, some portion of many pages on a web site contain dynamic information. This portion does benefit from evaluation at the time of the user's request.

4.8: XML Storage

XML can be used as a storage format. Although not much progress has been made in establishing XML as a standard storage format, there are two approaches currently being researched. These involve using XML to implement either an entire file system or a small database.

The average file system consists of directories (equivalent to elements) and files (equivalent to text content or unparsed entities). It is possible that a file system could be constructed on top of XML[10]. With suitable mechanisms defined for returning parts of an XML document, this would give all of the functionality of current file systems. In addition, XML attributes would allow much richer metadata to be stored with the files and directories. One drawback to this approach is XML's verbosity.

Classic Database Management Systems (DBMSs) are similar to XML in that both systems are designed to structure data in some way to supply a context. One group at Stanford University has developed a full DBMS based on XML called Lore[47]. Lore supports a query language (Lorel) and other features that take advantage of XML's unique capabilities.

Although XML's verbosity will probably prevent it from replacing most database applications, various efforts are underway to produce a standardized query language for extracting data from XML documents. This would allow XML to replace some proprietary flat-file databases and most text-based configuration or preference databases[10] [46].

One of the current initiatives that is being considered by W3C is the XPath specification. The XPath specification is intended to provide a standardized way to reference data inside an XML document[20]. The notation used by XPath looks very much like the directory structure notation used under UNIX.

4.9: XML as a Interface Description Language

One of the most useful features of XML is the ease with which it can describe complex data structures. For this reason, XML has found great popularity in the description of interfaces.

In the realm of distributed computing, people found that there was much repetitive work involved in building procedures that called code on other machines. In order to simplify this process, standard Interface Definition Languages (IDLs) were created.

The Web Interface Definition Language (WIDL) is an XML format for describing an API for dealing with Web pages[45]. One of the unfulfilled dreams of the Web was the concept of Intelligent Agents. An Intelligent Agent is a program that roams the Web on the behalf of a user, looking for information and web sites relevant to that user. Unlike search engines, this program would perform its task automatically, without intervention from the user.

The main reason that Intelligent Agents never materialized in the mainstream is the difficulty of extracting usable information from an HTML page. In general, the agent must specify the URL request, retrieve the page, parse the HTML, extract the data, and return it for processing. Then the agent can begin the useful part. The first portion of this process has been extremely hard to automate due to the nature of most of the HTML on the Web. WIDL is a method for specifying the mechanical part of this transaction.

XML has also been used to replace the IDL normally used in CORBA applications. Although the normal IDL used with CORBA covers the input and output specifications required, XML also provides a useful format for serializing data sent and received remotely[43].

Another very interesting use of XML is XML-RPC[35]. XML-RPC is a markup language designed to aid in marshaling Remote Procedure Calls (RPCs). The idea behind RPC is simple enough: treat a call to a remote computer exactly the same as a procedure call in the current process. Unfortunately, the parameters passed to this remote procedure may include addresses and other parameters that would not survive the transfer. The marshaling process converts the parameters of a procedure call into a message to be sent to a remote server. Software on that server converts this message back into a normal procedure call. Finally, the return values from the called procedure are made into a return message that goes back to the original machine.

The advantage of RPC is simplicity of use. The major downside is the marshaling process. Fortunately, the creation of the marshaling code can be automated. XML-RPC uses XML in two different capacities. First, an XML application is defined which serves as an IDL for the RPC calls that XML-RPC implements. Second, XML is used to encode the data that is transferred to the remote process and back again[35].

4.10: XML as a Conversion Format

XML is currently being used as a format for transporting data from one system, say a SQL database, to a program that needs the data. The goal of XML as a conversion format is helping to decouple the programs from the specific database and data format used for storage.

4.10.1: Advantages of XML as a Conversion Format

An advantage of using XML as a conversion format is the same advantage compiler builders get from intermediate code. If there are m different database formats to read from and n different programs that need the data, doing direct translation would result in m n different translators. When using an intermediate format, such as XML, the number of translators is reduced to m + n. This benefit is not exclusive to XML. Any intermediate format would provide this benefit.

An additional benefit of using XML as a conversion format is readability. Unlike binary interchange formats, the developer needs no special tools to read XML. A well-designed XML interchange format is self-describing, making recovery of information easier, even if the original program is lost[5].

The text-based format makes manipulation easier as well. A text editor provides the minimum functionality needed to view XML. There are also XML editors that give a more structured view of the XML document. Additionally, text manipulation tools can be used to perform maintenance on the XML in between phases, possibly resulting in a much more powerful translation system with relatively little work. Since XML is a text-based format, machine dependencies such as byte order and floating point formats would not be an issue.

Standard tools are available for reading and writing XML. Therefore, the developer can spend more time on the specifics of the problem instead of developing the translation tools.

4.10.2: Disadvantages of XML as a Conversion Format

The main disadvantage of XML as a conversion format is its verbosity. A straight-forward binary translation of the data could be substantially smaller, though harder to read.

If all that is needed is a one-time conversion from one format to another, the overhead of the conversion to XML may not be worth the time. It might be possible to put together a one-shot program in less time. However, the debugging time involved should not be underestimated.

In addition, a potential for loss of information exists in the conversion from a binary representation to a text-based XML representation and back again. Any conversion program should be carefully designed not to lose any information in the conversion process.

The issue of the time needed to convert binary data into an appropriate text form and back was once considered a potential problem. However, given the speed of modern computers, this issue is not as important as it once was. In general, the time to access a hard disk or another computer across a network takes much more time than this conversion.

Chapter 5: A High-Level Approach to Web Site Design

5.1: Describing a Web Site

If we were to attempt to describe a web site using the three orthogonal concepts of content, page presentation, and site structure, we would need a good notation for describing each part. This thesis defines a system which uses separate XML applications for the three separate parts. A program can then combine the pieces and compile them into a web site.

HTML can do a relatively good job of describing content and page presentation. SSI[37], ASP, PHP, HTML::Mason, and others do a good job of factoring out common portions of pages to reduce maintenance. Tools like Cold Fusion and Apache::DBI simplify access to databases. ASP, PHP, Cold Fusion, mod_perl[8] and many others allow run-time changes to web pages at the point of request. However, none of these technologies are geared toward overall site design. Any overall design and consistency work is performed directly by the developer.

All of these technologies can be used to generate consistent, fast and well-designed web sites. However, the consistency and overall design must be maintained through the direct efforts of the developer. The tools do not work at a high enough level of abstraction to reduce the developer's burden. In many ways, this is similar to the contrast between assembly language and high-level computer languages. It is possible to write very well-designed, structured, readable code in assembly language. However, high-level languages supply constructs that make writing code with these attributes much easier.

How then can we gain the benefits of a high-level abstraction when building our web sites? One way is to take the assembly vs. high-level language analogy one step further and create a high-level language for site construction. This language could supply the higher-level constructs needed to simplify a site-centric view of web sites.

Another important benefit of this approach is focusing the developer's attention on the higher level constructs. This change in focus could result in better web site design. By helping the developer focus on the higher level issues, the developer has an incentive to think about and experiment with these issues without being bogged down in the actual details of implementation.

5.2: Requirements for a Web Site Description System

In order to be useful, a web site description system must meet certain minimum requirements. To be used in the construction and maintenance of a production web site, the system must meet even more requirements. As with any design, there are certain issues that are not intended to be addressed, and these non-requirements must be spelled out to reduce confusion.

5.2.1: Minimum Requirements

The minimum requirements are simply those that describe functionality that cannot be ignored and still have a functional system. Most of these requirements are not a part of other site-building tools.

The system must be capable of describing the overall structure of a site:: Most web site construction tools ignore this requirement. Those tools are based around creating pages. In a medium to large site, navigation and overall site structure are actually much more important and more difficult than page design.
The system must be capable of describing the presentation of individual pages:: Most web site construction tools address this issue as their primary focus.
The system must be able to effectively describe common portions of a page with minimal redundancy:: Although most web site building tools focus on implementation of web pages, they normally focus on single pages. Most tools that support some form of common subpage factoring rely on run-time inclusion of the common components. Although this method does work, it does not scale particularly well on high-traffic sites.
The system should support the separation of page presentation, site structure, and content:: This is the primary message of this thesis.
The system must be general enough to use on different kinds of web sites:: A tool that is only useful on one, or a handful of kinds of web sites, is a curiosity at best. In order to be truly useful, the tool must help design the web site the developer wants to build.
Replicating the structure of a web site should not require copying almost every file in the site:: In the normal approach to web site design, the details of site structure and overall look are spread throughout all of the files in the site. In order to replicate this structure, large amounts of the site must be copied to the new site and edited to add the new content. This cut and paste style of development has been found to be a problem in software construction[2].

5.2.2: Production Requirements

Presentation and site structure generalization should pose minimal run-time penalties:: For large, high-traffic sites, this may be the most important requirement of the system. Common site components should be easily maintained with little or no run-time cost.
The system should support conditional compilation-type functionality for testing variations of a current site from one set of source:: In a production environment, there are often parallel sets of changes in development at any one time. This feature reduces the impact of releasing some changes without releasing them all. Improperly used, this feature can generate a maintenance nightmare.
Overall site consistency and maintainability are considered more important than page by page tweaks:: This requirement is probably the most controversial. However, when designing large or heavy traffic web sites, this may be one of the most important requirements.

5.2.3: Non-Requirements

The system is not optimized for speed of compilation phase:: Timing tests on an earlier design similar to this system generated around 150 pages in under 5 seconds. A fair portion of that time was spent starting Perl and reading configuration information with a primitive XML parser.
The content format portion is not intended to be usable without programming:: This system is designed for use by people trying to solve problems where the normal drag-drop approach fails. Many systems created over the years for use without programming do not scale well to difficult problems.
Support by WYSIWYG editing tools is not a priority:: If this system proves useful and someone decides to build WYSIWYG tools to support it, that would be great. However, the goal of this system is to make large, complex sites possible to build and maintain.

5.3: The Payoff

Even without a complete architecture, it is possible to develop a consistent, usable web site through sheer sweat and determination. However, like any other form of software development, the initial development is only a small fraction of the total time spent working on the web site. Shortly after the initial launch, the first change requests come in. As more and more changes accumulate, the design of the site can begin to break down, as described in Maintaining the Web Site.

By collecting the high-level description into a single file, WSDL makes measuring the consistency of the site easier. Given reasonable consistency measures, one can tell if the web site is getting less consistent. This can be a major advantage when a single developer is in charge of several sites, or if she must change a web site she has not looked at in six months. Some of these measures are simple enough that one can gauge them by looking at the WSDL description. Others can be calculated using a fairly simple script.

Some of the metrics that one might find useful in evaluating a web site are

the number of pages
the number of groups
the total number of navigational elements
the number of layouts
the number of headers
the number of footers
the number of websites inside the main site
the number of code elements devoted to navigation
the number of text elements devoted to navigation
the number of stylesheets
the number of elements directly under a single navigation
the number of elements directly under a single group
the deepest level of nesting under the navigation

These metrics and the numbers derived from them can help to evaluate a web site. Unfortunately, these numbers do not directly indicate whether a web site is in trouble. However, the change in these numbers over time can give an indication of where problems may arise.

For example, if the web site was originally defined to have five major sections and all of them have the same presentation except one, one would expect the number of group elements under the main navigation to be five. The expected number of layout elements would be two. After several changes the number of group elements under the main navigation is six and the number of layout elements is ten. The developer probably already knows that there was one more new section, but he may not have realized that the overall consistency of the sections appears to be getting worse. There are now more different styles of page presentation than there are major sections. It would be a good idea to investigate this issue and either correct the problem or find an acceptable reason for the divergence.

Without this indicator, the first clue that the presentations have diverged is the difficulty in making a single change to a section. A request comes in for a change to the header on the third section. The change is made and it appears on half the pages in the section. The developer tracks down the pages that do not match and makes a change there. That change affects one of the pages in the fourth section, and so on. By having all of this information in one place, it is easier to recognize the decline before the developers are spending all of their time correcting the effects of the last change.

5.4: The Web Site Description Language (WSDL)

At the highest level, the purpose of the Web Site Description Language (WSDL) is the description of a whole web site in enough detail for a program to generate it. This language must be able to describe the structure of a web site and its navigation in a clear, concise format. Another issue to consider is what makes this language different than other similar languages. Other web content creation languages focus on page creation. The site is then defined as a collection of pages with appropriate navigation built onto the pages.

5.4.1: The Pieces

In order to describe a web site, the pieces that need to be described must first be identified. Any web site can be partitioned into several kinds of data. Some of it is explicit, such as the individual pages and images that make up the site. Some of the data is implicit.

5.4.1.1: Implicit Web Site Data

What is meant by implicit data? Implicit data includes data about the site (meta-data) and common pieces of information that occur throughout the site. Some pieces of meta-data are as follows:

the root of the site
the directory structure
the navigation model
the target audience
server names (e.g., www.uh.edu)
link type (relative or absolute)

What common pieces of information are associated with the web site? This is an area that people have explored with a view to reducing page maintenance. Some of these items include the following:

disclaimers
header/banner information
footer information
the Webmaster contact information

Some of the above information is factored out of various pages and put into separate files that are included somehow to reduce maintenance. For example, the footer of all of the pages in a web site may have the same look and text. This would often be located in one file and then included as needed. Other parts of this data are maintained only by conventions enforced by the developers. For example, the presentation of side-bar text is expected to be consistent throughout the site, but the actual side-bar must be different on every page. The developers may then establish conventions for how this piece of HTML is constructed when it is needed. In many cases, none of this information is documented or really understood as an important part of the web site.

5.4.1.2: Explicit Web Site Data

The explicit data that makes up a web site may, on the surface, seem easier to define. Obviously, the pages that make up the site are all necessary parts of the site, as are all of the images. However, there are many other resources on a web site that are not part of the navigable HTML. These resources include:

applets
downloadable programs or documents
client-side script files
CGI scripts
style sheets
server-side image map files

Obviously, this set of data is as varied as any other portion of the web site. In order to define a high-level description of the web site, these pieces must be classified. These classifications not only support the construction of the web site. They also improve the ability to analyze the web site for inconsistencies and maintenance problems.

5.4.1.3: Benefits of This Level of Detail

Some people might ask why they need to care about this level of detail. After all, people have been building web sites for years without making these distinctions, so how are these distinctions helpful? In addition to the standard software design arguments for this level of detail, there are some serious practical, real-world considerations as well.

Assume someone is building web sites for external customers instead of a site for their personal use or their company. How should that developer respond to the following client questions?

I like the design, but our corporate policy does not allow Java applets. What parts of the site will be affected and how long will it take to remove them?
We would like a site containing your A, B, and C sections, but none of the others. How does that affect the navigation?
Everything is just perfect... except we would like to rearrange the major sections and split the last one into two pieces. How long will it take to do that?
The site looks great, but the president of the company hates JavaScript. Could you just fix the navigation to not use it anywhere? I mean, how hard can that be?

If the developer happens to have put the web site together in just the right way and the clients ask for just the right modifications, it's no problem. But other seemingly innocent suggestions become a nightmare to implement. Unfortunately, similar issues can arise with internal customers as well.

As with any other form of software design, web site design is a study in managing change and risk. The risks have to do with browser incompatibilities, download times, and platform inconsistencies. Managing change involves Internet time, which used to be called Rapid Application Development. The clients want faster changes and newer technologies, without sacrificing download speed, user experience, or web site stability. Unfortunately, if the major design components of a given web site are spread among 500 pages, simple changes are no longer simple.

Even using some form of inclusion technology does not solve the problem. Issues of run-time cost are only part of the issue. The consistency of the site depends on including the same boilerplate HTML in every file in the web site. This approach is far from perfect. If this boilerplate HTML is not included in a file, that file is not consistent with the rest of the web site. If the boilerplate HTML is changed in one file or if the included HTML is similar to, but not the same as, the standard boiler plate, that file will also be inconsistent with the rest of the web site.

The inconsistencies that may arise through errors in the inclusion technique are obviously a problem. But there is a more important problem hidden by the obvious problem. There is no way to find any of these problems without looking at every file in the web site. This is definitely a potentially large maintenance problem.

5.4.2: WSDL as a Design Aid

The design of WSDL is geared toward providing tools to solve these problems. It is not a panacea, however. A flexible, usable web site still requires careful thought and design. WSDL just helps collect the necessary information in a way that simplifies certain kinds of changes. It can also be used as a guide to direct the thought process.

WSDL provides a way to keep common page components in one place without the run-time costs of server-based include systems. WSDL also provides a single place to describe the structure of a site, instead of the standard practice of scattering that information through every file on the web site. In some environments, the structure is described in an external document that developers can reference when there is a question. However, external documentation is rarely synchronized with the code.

5.4.3: Conditional Elements

The goal of the conditional elements is to allow multiple versions of a configuration to exist at one time without the need for separate versions of the file. Since the implementation language, at present, is Perl, the simplest solution for Boolean expressions are those supported by Perl. The Boolean expression can be any Perl expression.

Conditional expressions are modeled on the ones supported by XSLT[19]. The only difference is that the WSDL conditionals use Perl syntax for the test attribute and Perl's definition of true and false. Remember also that the > symbol is illegal in an attribute. You must use > to encode it. Likewise, the & character must also be encoded as &. The if is replaced by its contents if the test attribute evaluates to true. Otherwise, the element and its contents are removed. In the example below, if the $debug variable is true, the debug page is defined, otherwise it is not.

<if test="$debug">
  <page id="debug" title="Debug Page" href="/debug.html"/>
</if>

Example of an if element

The choose element allows for multiple tests and a default. The content of the first when element whose test succeeds replaces the entire choose element. If none of the when tests succeeds and there is an otherwise element, the content of the otherwise element replaces the choose element. If no otherwise exists, the choose and its contents are removed. In the example below, the value of the variable $feature determines the level of functionality provided.

<choose>
  <when test="1==$feature">
    <page id="cool" title="Cool Page" href="/feature1.html"/>
  </when>
  <when test="2==$feature">
    <page id="cool" title="Cool Page" href="/feature2.html"/>
  </when>
  <when test="3==$feature">
    <page id="cool" title="Cool Page" href="/feature3.html"/>
  </when>
  <otherwise>
    <page id="cool" title="Cool Page" href="/unavailable.html"/>
  </otherwise>
</choose>

Example of a choose element

A good use for this feature is changing some of the capabilities of the WSDL file based on command line parameters. Any parameter of the form name=value sets an entry in the hash %CmdLine. As expected, the value is stored in the hash keyed by name. For example, if the command line contains the item PLL=1, the value of $CmdLine{PLL} would be 1.

The first version of the program does not prevent a boolean expression from referencing or changing any value in the program. This behavior may change in later versions of the program.

5.4.4: Directory Attributes

Several elements have attributes that reference files or directories on the local disk, or URLs located in the output web site. The rules for dealing with these directories are fairly straightforward. However, they sometimes result in surprises. The purpose of the rules is to reduce the amount of redundant data entered into the WSDL file.

Any directory or URL that is absolute after macro expansion is used as is. For example, if the value of src attribute of a file element is {{root}}/stuff.pdf, which resolves to /docs/stuff.pdf, that file path is used as is. If a directory or URL is relative, it is resolved relative to its parent element's equivalent attribute. In the case of URLs, the root attribute of the parent is considered to be equivalent to the href attribute of the child. For example, given the WSDL fragment below:

<group root="/news">
  <page href="today.html" title="Today's News"/>
</group>

Attribute inheritance

The URL for the Today's News page is /news/today.html, combining information from the group and the page.

If the src or dest attribute of an element does not contain a filename, but the href attribute does, the filename in the href is used.

5.4.5: Element Inclusions

There is one portion of the design of WSDL that is very different from most XML applications. The normal XML approach to inclusion of data from another file is through the use of general entities. To include a file or other resource using a general entity is a two step process. First, the entity referencing the resource must be created in the DTD. Next, the entity must be referenced in the appropriate place in the code.

One major advantage of this approach is the ability to include any kind of resource, not just files. Also, because it is standardized, it may be supported directly by an XML parser. Major disadvantages include the learning curve of this separate mechanism and the fact that not all non-validating parsers support the feature.

In order to allow an easy path towards reusing pieces of a web site description, some of the larger WSDL elements support a special attribute named include. This attribute names a file that is to be read for the content of the current element. This approach was chosen primarily because this method seemed easier to understand than an approach relying on general entities.

Nothing in the WSDL design prevents using general entities to include external data in a WSDL file. The include attribute mechanism is just simpler to learn for someone with less XML knowledge and experience.

Any element that uses the include attribute cannot have any content. This is most easily accomplished by making it an empty element when the include attribute is used.

There are a few restrictions on the attributes of including and included elements. In general, the attributes of the resulting element is the union of the attributes of the including and included element. If an attribute exists in both the including and included element, the including element takes precedence over the same attributes in the included element. The one exception to this model is the id attribute (See The WSDL Element Reference). If the id attribute exists in both the including and included element, it must have the same value.

5.4.6: Page Properties

In an earlier version of our system, any extra data to be associated with the page was added as new attributes on the page element. Since that system was not intended to be validated and had no DTD, this was not a major problem.

However, this approach does not scale well to multiple, different kinds of web sites. In a large project, the ability to validate the WSDL file would become critical to the maintenance of the site. Unfortunately, there is no way to develop a single generic page description that covers all possible scenarios. Even if there was, it would be too large and complicated to use.

In order to correct this deficiency in the earlier design, Page Properties were added. These properties are supported through the prop element. Using these properties, the developer can add application-specific data to each page element in the web site. Just as importantly, a parser can still validate the resulting WSDL file using the WSDL DTD.

In the earlier version of this system, these properties were used in many different ways. Some ideas for use might include

A flag to tell whether or not to wrap the URL call for this page in a JavaScript function call in the navigation.
An indicator for the amount of time to wait before refreshing the page if a refresh is needed.
The parameters used in a call to an ad server to specify which ads should appear on this screen.

Including this feature allows much more flexibility when creating the presentation files. The properties approach allows a particular application to add specific features in a standard way. This gives us the advantages of the original approach without any of the disadvantages.

One possible enhancement to this system would be to add prop elements to groups and websites. At present, the definition of WSDL does not include prop elements as children of these other elements. Only a few of the implications of this change have been considered, without any solid conclusions.

5.4.6.1: An Example: Banner Images

Let's say a group of developers is asked to build on a web site that has a consistent look across all of the pages. After careful consideration, they design the web site and manage to get all of the pages based on the same presentation file. A week later, a new requirement comes in. Some of the pages need a different banner image at the top of the page. Because they are in a hurry, they duplicate the layout page into two versions, one with the first banner and one with the second. A few days later, another request comes in for a third banner to be placed on other pages.

At this point, the developers should realize that splitting the presentation file again is likely to create maintenance problems. What they really need is a single presentation file and a parameter that specifies which banner image to insert. First, they go back to a single presentation file. They replace the banner image name with a reference to a page property, such as prop[banner]. In each page in the WSDL file, they now add the prop element with a name of banner and a value of the appropriate banner image name.

After this change, any changes to the banner image names result in changes to the WSDL file and nothing else. This applies whether the client decides to go with a single image again, a different image on each page, or anything in between. Now they have a more flexible design and they have arrested the slide into maintenance chaos, this time.

5.5: Presentation Files

The WSDL file describes a web site at a high level, but it does not contain enough information to generate actual files. Most of the missing information is a description of the presentation of the individual pages. The presentation of individual pages is described by special presentation files that are referenced by the layout elements in the WSDL file. These presentation files contain information on laying out individual pages described in one of two formats: the Page Layout Language or skeleton-based presentation.

5.5.1: The Page Layout Language (PLL)

The Page Layout Language is a high-level markup language designed to help structure the individual pieces of a set of HTML pages. The language describes components of the pages at a high level to abstract the presentation knowledge from the pages themselves. For a list of the elements supported by PLL, see The PLL Element Reference.

The PLL presentation approach is based on a very high-level description of the presentation of a page. The WSDL processor determines how each of these high-level elements is converted into actual HTML. Since the PLL elements must be well-defined, the processor can guarantee valid HTML is generated. Moreover, the processor can sanity check the PLL description to verify that all of the proper elements are in the presentation, in the right contexts. The processor can also verify attributes.

5.5.2: The Skeleton-based Presentation

The skeleton-based presentation files contain the actual HTML to be used to lay out the pages. Macro commands are embedded in this HTML that supply the content specific to this page. This system gives the complete flexibility needed to generate any desired HTML output. Unfortunately, the WSDL processor cannot verify the HTML.

This system does give the largest amount of flexibility. It also carries the danger that content may creep into the presentation files, defeating some of the purpose of the WSDL system. However, it is easier to learn for people with a background in ASP, PHP, or plain HTML.

5.5.3: Comparison of the Two

As described above, much of the benefit of the PLL format comes from the high-level nature of the format. This format can generate better HTML over time as the WSDL processor is refined. However, this format is not as flexible as the complete control over the HTML afforded by the skeleton-based presentation system.

Although the skeleton-based presentation files are more flexible, they suffer from a relatively low level of abstraction. The developer must control every aspect of the presentation explicitly. The processor cannot provide default presentation behavior or validate the HTML.

This tradeoff is very much like the one between high-level languages and assembly language. There is nothing that can be done in a high level language that cannot be done in assembly language. Moreover, assembly language is much more flexible than any high level language. But, in spite of these facts, high level languages allow better productivity because they hide many details that most programmers do not need (or want) to know.

Just like the tradeoff between high level languages and assembly, it is useful to have both the high level and low level options. In many cases, the high level representation is good enough to do what is needed. But every now and then, only assembly code will do. For this reason, PLL is provided for regular pages requiring relatively straightforward presentation. Skeleton files are available for pages that require some really strange format that does not conform to the simple model provided by PLL.

5.5.4: Macro Support

If the presentation files were nothing but static text, they would not serve any purpose in WSDL. However, both presentation file formats support macro commands that allow programmable functionality at the time the presentation file is used to build a page. The full list of macro commands available is described in The WSDL Macro Reference.

Most pieces of text in the WSDL system that are used to generate output pages are subject to macro expansion. The process is relatively straight-forward. The output string is searched for a string of the form {{some_string}}, where some_string does not contain the string }}. Then, some_string is evaluated as described in The WSDL Macro Reference. The result of this evaluation replaces the {{some_string}} in the output and the expansion continues.

These macro commands can be embedded in the content of text elements and any content files that are included into output pages, including skeleton files. Used carefully, this macro system can reduce the amount of content that creeps into the skeleton files.

PLL files handle macro commands slightly differently from the rest of the WSDL system. First of all, the text and code elements may be referenced directly using the appropriate PLL elements. The macro commands may only be placed in the content of the prelayout element, in the results of the text and content elements, or in the arguments to a code routine call. The last of these gives a feature that is not yet possible anywhere else in the system. A macro command can be used to specify the value of an argument to a code routine. Supporting this feature in general is a topic for further research.

5.6: The code Element

The code element provides a large portion of the power of the WSDL system. This feature allows arbitrarily powerful custom functionality to be added to the system and run at the time the site is compiled. This means that it incurs no run-time penalty.

The code element contains Perl code that is executed as part of the WSDL process at times specified in the WSDL description. The begin and end classes are relatively simple. All begin class code elements are evaluated at the time they are encountered in processing the WSDL file. All end class code elements are evaluated in the order they were encountered just before the WSDL processor shuts down.

The routine class code elements, or code routines, are only executed when necessary during the construction of specific pages. In order to simplify this process, several variables are passed to the code routine, including

$curr: This is a reference to the current page object.
$ancestors: This is a reference to an array of objects listing the ancestors of the current page. This page's parent is $ancestor->[0].
%args: This is a hash containing the name=value pairs passed to the code routine.

Additionally, these global variables are available to the code routine:

$wsdl: This is a reference to the whole WSDL tree.
$website: This is a reference to the outer-most website in the WSDL tree.
%CmdLine: This is a hash containing any name=value pairs passed on the command line.

5.6.1: The WSDL Object Model

The WSDL object contains a DataDoc object as well as an XML::Parser object. The member functions that are most likely to be useful in code routines are described below:

$wsdl->website: Return the outermost website element.
$wsdl->byID( "id" ): Return the element identified by the supplied id string or undef if none matches.
$wsdl->IDedElementsByType( "type" ): Return a list of all elements that have ids and match the specified type.

See The DataDoc Object Reference for complete documentation of the DataDoc object model. When the DataDoc sub-object in the WSDL object is created, several portions of the input file are filtered out for efficiency. The DataDoc will not include any processing instructions or comments. In addition, any ignorable whitespace is removed from elements that do not support text content.

5.6.2: Utility Functions

In order to simplify working with these internal objects in code routines, several utility functions are provided.

NthContent: When called with an element, an index, and arguments to pass to Content returns the single content item indexed from the list.
resolve_macros: Returns the string resulting from resolving the macro commands found in the passed string.
resolve_macro: Returns the string resulting from resolving the supplied macro command without the {{ }} delimiters.
child_of: Returns true if the second argument is a child of the first argument. The arguments must be XMLelement objects.
descendent_of: Returns true if the second argument is a descendent of the first argument. The arguments must be XMLelement objects.
ancestor_of: Returns true if the first argument is in the ancestor list referenced by the second argument.
parent_of: Returns true if the first argument is the parent of the current element according to the ancestor list referenced by the second argument.

5.7: Page Content Example: Internet College

Because of the many mechanisms for delivery of HTML output, page content will be different for each site. In order to make this discussion somewhat more concrete, this thesis provides a particular application for an example site. The classic bookstore example is too rigid to show the flexibility of this system. Instead, this thesis discusses a college web site, including a simulated course registration system, at the mythical Internet College. The presentation and structure of the site is built in WSDL. In general, the content of the web site is not supplied, unless it is necessary to prove a point. (See The Source for the Example Site)

Unlike most toy sites, this one contains a few of the quirks encountered when building a real site. The lifetimes of pages in a real site are a continuum ranging from static pages, that only change if the entire site is redesigned, to dynamic pages, that are different on every request. Unlike many example sites, this one contains a wide range of page lifetimes.

The disclaimer page is static -- it probably will not change in the lifetime of the site. At the other end of the scale, the pages relating to billing and the student schedules must be dynamically generated for each request. The calendar and schedule pages change on a regular schedule, once per semester or per year. The faculty pages change infrequently, but not on a fixed schedule. It is easier, when building an example site, to focus on one of these at the expense of all of the others. Real web sites tend to be more complex and interesting.

This example should show how the page content may need to be modeled in an arbitrarily complex way. More importantly, using this model, we can separate this portion of the problem, which is the most unique part, from the presentation and site structure issues. Although each is important, it is the content that determines the difference between one web site and the next.

Chapter 6: The Evolution of WSDL

The initial idea for WSDL predated my introduction to XML. The WSDL language did not appear fully-formed to be used for web site development. It was actually the next stage in a series of languages developed for similar purposes.

6.1: And So It Begins

The company I work for provides financial content through a financial web site. The company also develops customized financial web sites for external customers. I was requested to help with a problem where multiple clients wanted sites that were exactly the same as ours, with some minor differences. The initial attempts to solve this problem involved extensive use of ASP and JavaScript tricks.

This solution was fragile, and we predicted it would not support more than three to five sites. I suggested a system that would allow us to compile a web site and reserve the run-time behavior for items that changed at run time. It was decided that this sounded too risky to apply company resources to.

A couple of years, and a half-dozen projects, later, a project came along that required an extremely ambitious delivery schedule. We realized that we should just about have enough time to complete the major look and feel components of the site by the deadline. But, there would not be any time or resources available for rework if the client changed the requirements. Clients always change the requirements. This time, the web site compiler idea was approved, because we knew the project would not succeed without it.

The first version of the template generator, as it came to be called, took about two days to complete and could generate 70 web page templates in under three seconds. The system used a Perl script and an XML-based configuration file to turn a couple of templates containing special markers into 70 pages of HTML that would wrap our content. Within a week, an HTML developer was not only making changes to the templates, but also editing the configuration file and making minor tweaks to the Perl code.

By the end of the project, approximately 20 skeleton files, as we now called the input templates, were being used to create almost 200 output templates that would be combined with run-time data to generate thousands of result pages. The template generator was definitely a success. However, this version suffered from some major drawbacks. Making some structural changes required changes to the Perl code, resulting in less flexibility than we wanted. Much of the Perl code was written for this particular client and could not easily be reused. The design of the configuration file assumed we could add attributes as needed to some of the major elements. This made validation a difficult task. Last of all, the hand-written XML parser was not as flexible as some of the ones available on the Internet.

6.2: The Next Generation

This version of the template generator needed changes to be useful on any other project. The company handed the system to a new programmer, who spent several weeks enhancing and simplifying the design. She and I spent quite a bit of time exploring the initial design and discussing what I saw as the new direction to go. Her version incorporated some of my ideas and a lot of novel work to create the next generation of the template generator. The results of her efforts were used to generate the next version of our company web site. She was also granted permission to use this system as the topic of her master's thesis[6].

Later, I reworked the template generator to deal with a set of similar web sites that only differed in the colors and defined structural changes. This system could create any number of web sites that followed one of four basic designs.

6.3: A New Beginning

Each of the major versions of the template generator had different strengths and weaknesses, but there were several things they had in common. Their major strengths were the XML-based configuration files and the skeleton files that gave pages their basic form. The major weakness was the fact that each site needed its own Perl code and configuration file format. In order to solve this problem, WSDL would need to be designed more flexibly from the beginning.

We began by researching web site design and implementation. We found that quite a bit had been written about web site design from an information architecture[7] and structure[4] standpoint. The books we found on the subject discussed the things a web site designer needs to keep up with and what decisions to consider. But the implementation was assumed to follow the normal approach of building individual pages with some included components for reusability.

Next, we reviewed the earlier attempts I had been part of. We saw that most of the shortcomings could be traced to attempting to solve the current situation, without looking at the real, underlying problem. With that in mind, we began by enumerating all of the components WSDL would need to describe a web site. The basics of the current structure including the website element, siteinfo element, and most of the navigational elements fell into place early. The common pieces that applied to the whole web site went into the siteinfo element. Everything else went in the website element.

We experimented by writing small sites in WSDL, to see how they would look. In general, they seemed fairly reasonable. One early design decision involved non-HTML files. We wanted to be able to list the non-page files in WSDL as well. We believed this might be particularly useful during site reorganizations.

Another important design decision was the ability to nest website elements to support the concept of sub-sites. Some complicated web sites use the concept of sub-sites to enhance the user experience. The user is not confronted with a massive site containing a large number of pages. Instead, the user sees a set of sub-sites all accessible from one umbrella web site. In general, all of the sites have major presentation elements in common. However, each will have some distinctive feature that allows the user to recognize which site he is visiting.

6.4: Reality Check

The first major reality check came when a program (MakeWSDL.pl) to walk an existing web site and write out its description in WSDL was built. We decided to start with my own personal web site. This site is simple, although not particularly well designed. The idea was to see what a simple web site description would look like. We expected the structure to look bad, but the actual output was a surprise. It was obvious that the navigable content of the site must be separated from the other files. It was almost impossible to find the structure in amongst the other files. The entire navigational structure, one of the fundamental reasons for this language, was hidden.

To fix this problem, the content of the website element was modified to contain a list of resources and the actual navigation. This decision also changed the location for the stylesheet elements. Before that time, they had been part of the siteinfo. But, that did not really fit. Moving the stylesheet elements to the resources made much more sense.

During this reorganization, we began thinking about the way we encoded people's names. We realized that the original approach had been simplistic. The original elements that made up a person's name were honorific, given, and family. We realized that many names would not fit this form. This approach could not deal with people with only one name. It also did not deal with middle names, or numbers, or initials. We could add more elements to deal with middle name, initials, nicknames, and more; but, this seemed inelegant. Finally, we realized that a better approach was to take all of the name portions of a person's name and make them name elements with a class which further defines the role of that portion of the name. we left the decorations honorific and degree separate because we did not actually consider them to be part of the name.

Another limitation that became obvious during the reorganization was a loss in flexibility from previous designs. The earliest versions of the template generator allowed the addition of site-specific attributes to the page element. We realized that by defining a DTD for WSDL, we had made this feature impossible in its earlier implementation. We did understand that there is no way WSDL can define all of the possible page attributes that everyone will ever need. Thus, the prop element was born. This element allows us to associate an unlimited number of named properties with a page.

6.5: Fine-tuning the Design

With some testing, the need for the set element under resources became obvious. There needed to be some way to group the resources. Shortly after that decision, fpages and subpages became part of the content for resources and sets. Some sites, including mine, have pages devoted to links leading to other web sites. These pages seem naturally implemented with sets of fpages.

As we began to develop the output logic, the problem of presentation descriptions became much more obvious. In earlier projects, we had used skeleton files that contain raw HTML with embedded macro commands. We had originally intended to use a high-level presentation language instead. Both approaches have advantages and disadvantages. The skeleton approach is much easier to implement, but it is much easier to abuse. The presentation language approach is easier to verify, but it requires more development and reduces the control of the developer.

On the advice of a wise programmer I know, ``When attempting to choose between two good implementations, try to choose both,'' we implemented both systems. The PLL approach works very well for well-structured presentations. If the presentation is too complicated, the skeleton approach can always be used. Implementing this feature required the addition of the layout element.

We originally had required the src and dest attributes on all file elements. In working with a real web site, we discovered that this lead to a lot of duplication. Allowing the file elements to inherit these attributes from their enclosing set element resulted in a much cleaner design. A side effect of this decision was the need to add src and dest attributes to the website element in order to reduce redundancy at lower levels.

The current system for handling the inheritance of directories seems more complicated than necessary. Initial testing generated several surprising results. Other than a small amount of cleanup that solved some of the surprises, no redesign of this functionality has been attempted.

Another surprise that happened during this stage of the implementation was the realization that the stylesheet element needed src and dest attributes as well. Since the design had been moving toward a system allowing the entire site to be built through generation of pages and copying of non-page files, it was necessary to define where the style sheet would come from and go to.

6.6: The WSDL System

The completed system consists of several programs implemented in Perl. Perl was chosen for this task because of its extensive support for text manipulation, as well as its rapid prototyping capabilities. Libraries for parsing and manipulating XML in Perl are also available.

6.6.1: The WSDL Processor

The initial version of the WSDL processor is named MakeWebsite.pl. The program uses the XML::Parser module to convert an XML file into a memory object called a DataDoc. The program then walks this DataDoc object and creates the appropriate output files for the web site under construction.

This object was designed specifically to simplify access to attributes and still give good performance on access to content. The ability to find an element's ancestors or siblings was not a design goal. As a result, the data structure is quite a bit simpler and lighter than the equivalent Document Object Model (DOM) structure.

6.6.2: Compiling the Web Site

Once a WSDL description of a web site has been created, the WSDL processor is invoked on the WSDL file. This program parses the WSDL file, uses the presentation files and content, along with any external files, and creates a full copy of the web site in the target directory.

In addition to the name of the input file, MakeWebsite.pl has five command line options:

-v: generate verbose output.
-n: nondestructive, do not copy files or generate output.
-d dir: use dir as the destination directory.
-s dir: use dir as the source directory for all content.
-i indexfile: use indexfile as the name of the index file for a directory. Defaults to index.html, if none is given.

6.6.3: Other WSDL Tools

In addition to MakeWebsite.pl, there are several other scripts developed as part of this research.

The program MakeWSDL.pl takes the URL of a web site on the command line and prints a starting WSDL file for that site to the standard output. Because MakeWSDL.pl cannot know about the actual structure of the site, except through its links, the program makes guesses about how to group the various pages and other content. The program also extracts style sheet information and any contact information it can find. The output from this program is not a usable WSDL file, because it lacks presentation information and much of the site's actual structure. The program's purpose is to automate some of the tedious portion of creating a new WSDL file.

The program TestPLL.pl validates the PLL files that are passed as arguments. Called with no arguments it runs a regression test on the PLL validation code. This program serves as a quick check on any PLL presentation files needed for a project.

There is an equivalent validation program for WSDL files called WSDLvalid.pl. Called with file names on the command line, it validates each of the WSDL files. With no arguments, it also performs a regression test on the WSDL validation code.

Some of the benefit of using WSDL comes from the ability to analyze the WSDL description of a web site. The program WSDLmetrics.pl provides the data for this analysis. This program reads a WSDL description and generates the metrics described in Payoff section.

These programs are available at http://www.anomaly.org/wade/wsdl/, along with the source to the example site.

6.7: Performance

Although speed of web site compilation was not a major focus in the design, it is an issue with measuring.

6.7.1: The Inet College Example Site

The example site is a small, relatively simple site. The code routines used for the site navigation are of medium complexity. The test for faculty web sites adds some time to the generation of the site. The WSDLmetrics.pl program returned the following results.

WSDL Summary:
------------
 total navigation elements:   34
  group elements:             4
  page elements:              17
  pageref elements:           0
  fpage elements:             0
  subpage elements:           0
  website elements:           0
 max elements per navigation: 3
 max elements per group:      8

 layout elements:             2
 header elements:             1
 footer elements:             1
 code elements:               7
 text elements:               1
 stylesheet elements:         1

 max depth of nesting:        5

MakeWebsite.pl created this site in approximately 3 seconds on a 200 MHz Pentium Pro running Linux.

6.7.2: The University of Houston CS Web Site

As an completely different kind of example, a WSDL file was created for the University of Houston Computer Science web site using MakeWSDL.pl. This description was modified to use a minimal presentation file and no content, in order to measure raw speed of WSDL processing. The WSDLmetrics.pl program returned the following results.

WSDL Summary:
------------
 total navigation elements:   3702
  group elements:             79
  page elements:              796
  pageref elements:           1246
  fpage elements:             157
  subpage elements:           139
  website elements:           0
 max elements per navigation: 389
 max elements per group:      516

 layout elements:             1
 header elements:             0
 footer elements:             0
 code elements:               0
 text elements:               0
 stylesheet elements:         9

 max depth of nesting:        5

MakeWebsite.pl created this site in approximately 10 seconds on a 200 MHz Pentium Pro running Linux.

6.8: Future Directions

The WSDL system is still a work in progress. During the development of this version, several enhancements or areas for further research presented themselves. Some of these ideas were not explored because the full implications of the features are not yet apparent. Others were put aside in the interest of actually completing a working system.

One idea for possible enhancement of the system involves more code classes. The current set of begin, end, and routine handle the current system's requirements nicely. However, code that is executed at the beginning and end of each page could be a useful enhancement. Extending that idea a little further, code that executes at the beginning and end of each group and website could also be useful.

Extending this idea in a different direction, we could define classes of pages so that only some of the page begin and end code applies to certain pages. Obviously, there is room for quite a bit of experimentation in this area.

The ability to add prop elements to groups and websites opens up many new possibilities. However, issues of semantics of these features are still open to question. There's also the question of accessibility through macro commands and inside code routines.

Another possible area for further research is the ability to define multiple output formats for the PLL presentation system. Although the skeleton-based presentation system can be used for any form of output, the WSDL processor only generates HTML from PLL presentation files. The ability to define multiple formats or the ability to specify the conversion from PLL into an output format could be very useful.

The ability to nest macro commands in the parameters for other macro commands could be very powerful. For instance, the value of a page property could be passed as a parameter to a code routine. Another possibility is using a page property to select which contact to use for email questions. Unfortunately, the parsing for arbitrary levels of this nesting cannot be done with regular expressions. An actual context-free grammar would be required.

Another direction for future research involves more tools to support WSDL development. A WSDL-specific editor would simplify construction of WSDL files. Various validation, analysis, and optimization tools are possible that increase the usefulness of this higher-level description of a web site. Lastly, the command-line tools could be enhanced to friendlier, graphical versions, reducing the learning curve.

Chapter 7: Conclusions

The purpose of the WSDL system is reducing maintenance costs and inconsistencies in large web sites. The system defines two high-level languages that describe a web site in enough detail that the WSDL processor can generate the web site. This approach is very different from other systems that have been implemented for the creation of web-based content.

Testing with previous versions of this system showed a definite increase in developer productivity during initial development. In addition, maintenance costs were reduced dramatically. The current WSDL design includes all of the main features of the earlier versions. In addition, the WSDL system also addresses the some of the earlier shortcomings, such as the ability to have separate code bases for different clients and automatic verification of the WSDL files.

Appendix A: The WSDL Element Reference

This section lists and describes all of the elements and attributes in the WSDL language.

A.1: body

The body element defines a set of attributes for the body element of any HTML page that references it.

A.1.1: body content

The body element has no content.

A.1.2: body attributes

The body element has one required attribute, described below, as well as all of the attributes of the HTML 4.0 body tag:

id: The required id attribute supplies a unique name for this body element.

A.2: choose

The choose element provides the ability to choose among multiple options. This feat is accomplished with the help of the when and otherwise elements.

A.2.1: choose content

The choose element contains one or more when elements followed by an optional otherwise element.

A.2.2: choose attributes

The choose element has no attributes.

A.3: code

The code element contains sections of scripting code that is evaluated at compile time to generate portions of output pages. The obvious example would be code used to generate navigable material that is different for each page.

At present, the only code language supported is Perl. The Perl code is evaluated to create a subroutine reference that is called when the appropriate code macro is called in the presentation file. The parameters for a code reference is expected to be a list of name=value pairs.

A.3.1: code content

The code element contains the source code that is evaluated whenever this code reference is called. Since the content of the code element often contains characters that are not valid in character content, it is usually contained in a CDATA section. This element must be empty if the include attribute is specified.

A.3.2: code attributes

The code element has one required attribute and two optional attributes:

id: The required id attribute supplies a unique name for this code element.
class: The class defines which of three types of code reference this element contains. The default class is routine, which specifies a section of code to be treated as a single function. The begin class specifies code that should be evaluated before any code is executed. This may be used for initializing global variables and declaring utility functions. The end class specifies code that should be evaluated after all code has been executed.
include: The include attribute supplies a filename from which to read the content of the code element. If this attribute exists, the code element must contain no content. For more information, see Element Inclusion.

A.4: contact

The contact element contains information about a person or organization associated with the web site.

A.4.1: contact content

The content of the contact element comes in one of two forms. The simplest is an optional descr element followed by a single institution element. The more complex form is an optional descr element followed by a set of elements that describe a person. This set consists of an optional honorific element, followed by one or more name elements, and zero or more degree elements.

A.4.2: contact attributes

The contact element has one required attribute and two optional attributes:

id: The required id attribute supplies a unique name for this contact element.
href: The href attribute provides a URL link that is associated with this contact. This would often be a home page for the person or organization.
email: The email attribute provides an email address associated with this contact.

A.5: copyright

The copyright element specifies copyright information for use on the web site.

A.5.1: copyright content

The copyright element contains raw text without markup. This content is expected to be the terms of the copyright, although there is no way to validate that.

A.5.2: copyright attributes

The copyright element has one required attribute and two optional attributes:

id: The required id attribute supplies a unique name for this copyright element.
year: The year or range of years for which this copyright applies.
owner: The owner attribute contains a reference to a contact element that supplies the copyright owner's name.

A.6: degree

The degree element contains the degrees applied to the contact's name.

A.6.1: degree content

The degree element contains raw character data without markup.

A.6.2: degree attributes

The degree element has no attributes.

A.7: descr

The descr element contains a small amount of text that is a description of the current contact. The descr is useful for adding small amounts of explanatory text to a contact, above and beyond the contact's name and contact information.

A.7.1: descr content

The content of the descr element can be any combination of text and markup. If the markup is not defined by the WSDL markup language, it is usually contained in a CDATA section.

A.7.2: descr attributes

The descr element has no attributes.

A.8: directory

The directory element defines a relative or absolute directory in the web site.

A.8.1: directory content

The directory element has no content.

A.8.2: directory attributes

The directory element has two required and one optional attribute:

id: The required id attribute supplies a unique name for this directory element.
name: The required name attribute supplies the actual directory name for this directory element.
class: The class attribute supplies a broad category for the directory to allow grouping for various validation and reporting features. Suggested classes might include image, database, stylesheet, include, applet, and script.

A.9: file

The file element designates a non-html file linked on the system. This element describes files like downloadables, PDF files, images, etc. Anything that resource to be managed in this system but is not created by the system is described by a file element.

A.9.1: file content

The file element has no content.

A.9.2: file attributes

The file element has one required and six optional attributes:

id

The id attribute supplies a unique identifier for this file element.

src

The src attribute gives the location in the source file system where this file is located.

dest

The dest attribute lists the location where this file is to be written.

href

The required href attribute specifies the URL for this file on the completed site.

class

The class attribute allows partitioning the possible files into broad groups. These may be used for further processing or reporting. The defined groups are

applet
archive
audio
binary
document
image
plugin
script
source
text
video
other

title

The title attribute specifies a title to be used for the link that points to this file.

type

The type attribute allows us to associate a mime-type with this file.

A.10: footer

The footer element specifies the content to be used as a footer on pages in the web site. The footer element can be thought of as a special case of the text element.

A.10.1: footer content

The content of the footer element can be any combination of text and markup. If the markup is not defined by the WSDL markup language, it is usually contained in a CDATA section. This element must be empty if the include attribute is specified.

A.10.2: footer attributes

The footer element has one required attribute and one optional attribute:

id: The required id attribute supplies a unique name for this footer element.
include: The include attribute supplies a filename from which to read the content of the footer element. If this attribute exists, the footer element must contain no content. For more information, see Element Inclusion.

A.11: fpage

A fpage element references a foreign page. A foreign page is either a page on another web site, or a page on the current web site that is not generated by the WSDL processor.

A.11.1: fpage content

The fpage element has no content.

A.11.2: fpage attributes

The fpage element has two required and two optional attributes:

title: The required title attribute specifies a title to be used for the link that points to this fpage.
href: The required href attribute supplies the URL of the page that the fpage reference points into.
fragment: The fragment attribute contains the name of the page fragment if this element references part of a foreign page instead of the whole page. The page fragment is the part of the URL after the '#'.
id: The id attribute supplies a unique identifier for this fpage element.

A.12: group

The group element collects a set of pages together into one logical, navigation piece. This is has sections of a site would normally be defined.

A.12.1: group content

The group element can contain the conditional elements, if and choose plus zero or more of the navigational elements listed below:

group
page
pageref
subpage
fpage
file
website

This element must be empty if the include attribute is specified. For more information, see Element Inclusion.

A.12.2: group attributes

The group element has no required attributes and several optional attributes:

id: The id attribute supplies a unique identifier by which this group can be referenced.
include: The include attribute supplies a filename from which to read the content of the group. If this attribute exists, the group element must contain no content. For more information, see Element Inclusion.
dest: The dest attribute lists the directory into which the pages in this group are written.
root: The root attribute defines the root of the directory tree for this group.
main: The main attribute specifies the id of the page that is the main page for this group.
href: The href attribute specifies the URL for this group. If not supplied, the URL for the page specified by the main is used.
title: The title attribute specifies a title to be used for the link that points to this group. If not specified, the title from the page referenced by main is used. If neither the title or main are supplied, the title of the first navigational element in the group is used.
keywords: The keywords attribute lists the keywords to be added to each page of this group that does not have keywords of its own.
description: The description attribute contains a short description of the page, sometimes used by search engines. This description may be applied to any page in this group that does not have a description of its own.
layoutref: The layoutref attribute contains a reference to the default layout for pages in the group.
bodyref: The bodyref attribute contains a reference to the default body attributes for pages in the group.
copyref: The copyref attribute contains a reference to the default copyright for the group.

A.13: header

The header element specifies the content to be used as a header on pages in the web site. The header element can be thought of as a special case of the text element.

A.13.1: header content

The content of the header element can be any combination of text and markup. If the markup is not defined by the WSDL markup language, it is usually contained in a CDATA section. This element must be empty if the include attribute is specified.

A.13.2: header attributes

The header element has one required attribute and one optional attribute:

id: The required id attribute supplies a unique name for this header element.
include: The include attribute supplies a filename from which to read the content of the header element. If this attribute exists, the header element must contain no content. For more information, see Element Inclusion.

A.14: honorific

The honorific element contains part of the contact name. (e.g. Mr., Ms., Dr.)

A.14.1: honorific content

The honorific element contains raw text without markup.

A.14.2: honorific attributes

The honorific element has no attributes.

A.15: if

The if element provides simple if-then conditional functionality. If the test evaluates to true, include the content of the element. Otherwise, discard the content of this element.

A.15.1: if content

The if element can contain any text or markup. The only requirement on this data is that it remain valid WSDL if the content of the if element replaced the element itself.

A.15.2: if attributes

The if element has one required attribute.

test: The required test attribute contains a boolean expression that determines whether or not the content of the if element is used. For more information, see Conditional Elements

A.16: institution

The institution element contains the name of an institution used as part of a contact.

A.16.1: institution content

The institution element contains raw text without markup.

A.16.2: institution attributes

The institution element has no attributes.

A.17: layout

The layout element gives a name and format to a presentation file used to format HTML pages.

A.17.1: layout content

The layout element has no content.

A.17.2: layout attributes

The layout element has two required attributes and one optional attribute:

id: The required id attribute supplies a unique name for this layout element.
file: The required file attribute specifies the file containing the presentation information.
class: The class attribute tells the presentation file format. Currently, two presentation formats are supported: pll and skeleton. See Presentation Files for more information.

A.18: name

The name element contains the one of the name parts of the contact name.

A.18.1: name content

The name element contains raw text without markup.

A.18.2: name attributes

The name element has one optional attribute:

class: The class attribute clarifies which part of the name of a contact this element refers to. The class may have one of the following values: initial, given, family, middle, number, or nickname.

A.19: navigation

The navigation element serves as a container for all of the navigable items on the web site. These items include anything that a user might navigate to in the course of viewing a site.

A.19.1: navigation content

The navigation element can contain the conditional elements, if and choose plus zero or more of the navigational elements listed below:

group
page
pageref
subpage
fpage
file
website

This element must be empty if the include attribute is specified. For more information, see Element Inclusion.

A.19.2: navigation attributes

The navigation element has two optional attributes, described below.

id: The id attribute supplies a unique identifier by which this navigation can be referenced.
include: The include attribute supplies a filename from which to read the content of the navigation. If this attribute exists, the navigation element must contain no content. For more information, see Element Inclusion.

A.20: otherwise

The otherwise element provides the default behavior of the choose element. If none of the when tests evaluate to true, use the content of this element. If any of the when tests are true, discard the content of this element.

A.20.1: otherwise content

The otherwise element can contain any text or markup. The only requirement on this data is that it remain valid WSDL if the content of the otherwise element replaced the element itself.

A.20.2: otherwise attributes

The otherwise element has no attributes.

A.21: page

The page element describes an individual page in the web site. It is used both to define meta-data about the page and also locate the page within the navigation. The page element should only be used to describe pages that are created with the WSDL tool. If a web page is referenced in the navigation, but is not created through WSDL, use the fpage element instead.

A.21.1: page content

The page element contains a zero or more prop elements and optionally any of the conditional elements (if and choose.)

A.21.2: page attributes

The page element has one required attribute and several optional attributes:

title: The required title attribute specifies a title to be used for the link that points to this page.
src: The src attribute lists the directory in which source file for this page can be found.
dest: The dest attribute lists the file into which the completed page is written.
href: The href attribute specifies the URL for this page after the site is built.
id: The id attribute supplies a unique identifier by which this group can be referenced.
keywords: The keywords attribute lists the keywords to be added to this page.
description: The description attribute contains a short description of the page, sometimes used by search engines.
content: The content attribute contains the name of the file containing the main content of this page.
layoutref: The layoutref attribute contains a reference to the layout for this page.
bodyref: The bodyref attribute contains a reference to the body attributes for this page.
styleref: The styleref attribute contains references to any stylesheets for this page.
copyref: The copyref attribute contains a reference to the copyright for this page.

A.22: pageref

The pageref element is a reference to a page elsewhere in the web site. This element is needed because many sites are not trees, they are actually graphs. The pageref element allows a page to be referenced from multiple places in a web site.

A.22.1: pageref content

The pageref element has no content.

A.22.2: pageref attributes

The pageref element has one required attribute and two optional attributes:

ref: The required ref attribute contains the unique identifier of the page to be referenced.
fragment: The fragment attribute contains the name of the page fragment if this reference points into a page instead of to the whole page. The page fragment is the part of the URL after the '#'.
title: The title attribute specifies a title to be used for the link that points to this pageref. If no title is supplied, the title from the referenced page is used.

A.23: prop

The prop element is a project-specific property attached to the current page. For more information, see Page Properties

A.23.1: prop content

The prop element has no content.

A.23.2: prop attributes

The prop element has two required attributes:

name: The required name attribute a name to be used when this property must be referenced.
value: The required value attribute contains the actual value of this property.

A.24: resources

The resources element is a container for all of the site resources that do not necessarily participate in site navigation. Examples include applets, images, and downloadable files. The design goal for this element is to allow the one WSDL file to contain all of the information about the web site whether it will be generated by the WSDL processor or not.

A.24.1: resources content

The resources element can contain the conditional elements, if and choose plus zero or more of the resource elements listed below:

stylesheet
set
page
fpage
subpage
file

This element must be empty if the include attribute is specified. For more information, see Element Inclusion.

A.24.2: resources attributes

The navigation element has one optional attribute:

include: The include attribute supplies a filename from which to read the content of the resources. If this attribute exists, the resources element must contain no content. For more information, see Element Inclusion.

A.25: server

The server element defines a logical name for a server or machine on the Web.

A.25.1: server content

The server element has no content.

A.25.2: server attributes

The server element has two required and one optional attribute:

id: The required id attribute supplies a unique name for this server element.
name: The required name attribute supplies the actual server name for this server element.
class: The class attribute supplies a broad category for the server to allow grouping for various validation and reporting features. Suggested classes might include: main, image, database, search, and ad.

A.26: set

The set element collects a set of resources together into one logical unit. The purpose of a set is to group several resources into a single logical entity. Often this grouping is used to apply common attributes to several resources at once. Usually, the attribute that is applied is the root directory.

A.26.1: set content

The set element can contain the conditional elements, if and choose plus zero or more of the elements listed below:

stylesheet
set
page
fpage
subpage
file

This element must be empty if the include attribute is specified. For more information, see Element Inclusion.

A.26.2: set attributes

The set element has no required attributes and three optional attributes:

id: The id attribute supplies a unique identifier by which this set can be referenced.
include: The include attribute supplies a filename from which to read the content of the set. If this attribute exists, the set element must contain no content. For more information, see Element Inclusion.
src: The src attribute lists the directory in which source files for this set can be found.
dest: The dest attribute lists the directory into which the files from this set are written.
root: The root attribute defines the root of the directory tree for this set.

A.27: siteinfo

The siteinfo element contains meta-data that is used in the description of a web site. This includes all of the common elements and information about the site.

A.27.1: siteinfo content

The siteinfo element serves as a container for various pieces of common data for a website. In addition to the conditional elements, if and choose, a siteinfo element can contain zero or more of the following elements:

server
directory
contact
copyright
test
header
footer
layout
body

After evaluation of any conditional elements, the siteinfo should contain no elements except those on the above list. This element must be empty if the include attribute is specified. For more information, see Element Inclusion.

A.27.2: siteinfo attributes

The siteinfo element has three optional attributes:

id: The id attribute supplies a unique identifier for this siteinfo element.
include: The include attribute supplies a filename from which to read the content of the siteinfo. If this attribute exists, the siteinfo element must contain no content. For more information, see Element Inclusion.
ref: The ref attribute references another siteinfo element to be used in this website. If this attribute exists, the siteinfo element must contain no content.

It is illegal for a siteinfo element to contain both a include and a ref. It is also illegal to have either a include or a ref and to contain content.

A.28: stylesheet

The stylesheet element defines style information for use on the web site.

A.28.1: stylesheet content

The content of the stylesheet element is either empty or the internal stylesheet information to be applied to pages on the web site. Remember to place this data in a CDATA section if it contains markup not defined by WSDL.

A.28.2: stylesheet attributes

The stylesheet element has two required attributes and three optional attributes:

id: The required id attribute supplies a unique name for this stylesheet element.
type: The required type attribute allows us to associate a mime-type with this stylesheet.
href: The href attribute specifies the URL to use when referencing an external stylesheet.
src: The src attribute gives the location in the source file system where this stylesheet is located. This attribute is only useful if the href attribute is also supplied.
dest: The dest attribute lists the location where this stylesheet is to be written. This attribute is only useful if the href attribute is also supplied.

A.29: subpage

The subpage element serves as a placeholder for a navigational item that points into a page, not to the page as a whole. This distinction is important to prevent the attempted generation of subpages by the WSDL processor.

A.29.1: subpage content

The subpage element has no content.

A.29.2: subpage attributes

The subpage element has two required and two optional attributes:

title: The required title attribute specifies a title to be used for the link that points to this subpage.
href: The href attribute supplies the URL of the page that the subpage reference points into. This may be empty if the subpage reference is internal to the current page.
fragment: The required fragment attribute contains the name of the page fragment. The page fragment is the part of the URL after the '#'.
id: The id attribute supplies a unique identifier for this subpage element.

A.30: text

The text element is used to designate boilerplate text that appears on pages in the web site.

A.30.1: text content

The content of the text element can be any combination of text and markup. If the markup is not defined by the WSDL markup language, it is usually contained in a CDATA section. This element must be empty if the include attribute is specified.

A.30.2: text attributes

The text element has one required attribute and two optional attributes:

id: The required id attribute supplies a unique name for this text element.
class: The class attribute supplies a broad category for the text element to allow grouping for various validation and reporting features. Suggested classes might include: disclaimer, message, warning, etc.
include: The include attribute supplies a filename from which to read the content of the text element. If this attribute exists, the text element must contain no content. For more information, see Element Inclusion.

A.31: website

The website serves as a container for all elements in a web site description. The website element contains a description of the entire web site. This element can also be used to describe a sub-site within another web site.

A.31.1: website content

A website element contains an optional siteinfo followed by one or more navigation elements, an optional resources element, and zero or more code elements. This element must be empty if the include attribute is specified. For more information, see Element Inclusion.

A.31.2: website attributes

The website element has two required attributes and a large number of optional attributes:

id: The required id attribute supplies a unique identifier by which this website can be referenced.
include: The include attribute supplies a filename from which to read the content of the website. If this attribute exists, the website element must contain no content. For more information, see Element Inclusion.
src: The src attribute lists the base directory from which to obtain any source files for the creation of the website.
dest: The dest attribute lists the directory into which the completed website is written.
root: The root attribute defines the root of the directory tree for this website.
main: The required main attribute specifies the id of the page which is the main page for this website.
title: The title attribute specifies a title to be used for the link that points to this website. If not specified, the title from the page referenced by main is used. This attribute is most useful on website elements which designate subsites in a main website.
keywords: The keywords attribute lists the keywords to be added to each page of the site that does not have keywords of its own.
description: The description attribute contains a short description of the page, sometimes used by search engines. This description may be applied to any page which does not have a description of its own.
layoutref: The layoutref attribute contains a reference to the default layout for pages in the website.
bodyref: The bodyref attribute contains a reference to the default body attributes for pages in the website.
styleref: The styleref attribute contains references to the default stylesheets for pages in the website.
linktype: The linktypes attribute defines the format of links between pages on the website. This attribute takes one of two values: absolute and relative. A value of absolute makes all links absolute URLs. To make all links relative URLs, use a value of relative. If no value is specified, the default is relative.
copyref: The copyref attribute contains a reference to the default copyright for the website.
layout: The layout designates the default layout for the pages in this website.

A.32: when

The when element provides the tests for a choose element. If the test evaluates to true, include the content of the element. Otherwise, discard the content of this element.

A.32.1: when content

The when element can contain any text or markup. The only requirement on this data is that it remain valid WSDL if the content of the when element replaced the element itself.

A.32.2: when attributes

The when element has one required attribute.

test: The required test attribute contains a boolean expression that determines whether the content of the when element is used. For more information, see Conditional Elements

Appendix B: The PLL Element Reference

This section lists and describes all of the elements and attributes in the Page Layout Language (PLL). The purpose of PLL is to describe the presentation of an HTML page without getting bogged down in the actual coding details. The elements of the language are, therefore, exclusively related to page presentation. There is no direct support for images, fonts, colors, or other implementation details.

B.1: body

The body element contains the main presentation description of the output page.

B.1.1: body content

The body element has two forms. The first form is a list of one or more row elements. The second form consists of a list of one or more of the following elements:

content
header
footer
text
code

B.1.2: body attributes

The body element has two optional attributes. These attributes are required if the body content consists of row elements, otherwise they are disallowed.

cols: The cols attribute supplies the number of columns expected in each row in the page.
rows: The rows attribute supplies the number of rows expected the page.

B.2: cell

The cell element describes an individual piece of content in a row.

B.2.1: cell content

The cell element contains one or more of the following elements:

header
footer
text
code
content

B.2.2: cell attributes

The cell element has six optional attributes:

align: The align attribute specifies the alignment to apply to the cell presentation element. Its values are the same as the equivalent attribute in HTML's td, left, center, and right.
class: The class attribute specifies a stylesheet class to apply to the cell presentation element.
colspan: The colspan attribute allows a cell to occupy two or more columns in the presentation.
rowspan: The rowspan attribute allows a cell to occupy two or more rows in the presentation.
valign: The align attribute specifies the vertical alignment to apply to the cell presentation element. Its values are the same as the equivalent attribute in HTML's td, top, center, bottom, and baseline.
width: The width attribute supplies a pixel or percentage width for the underlying table cell.

B.3: code

The code element references a code defined in the WSDL file for this web site.

B.3.1: code content

The code element has no content.

B.3.2: code attributes

The code element only has one optional attribute:

ref: The ref attribute contains the id of a code element defined in the WSDL file. If no ref attribute is supplied, the first code defined in the WSDL file is used.

B.4: content

The content element specifies the place in the page presentation where the content for this page is to be placed. Optionally, this element can specify where the content of a file should be included.

B.4.1: content content

The content element has no content.

B.4.2: content attributes

The content element only has one optional attribute:

file: The file attribute specifies the name of a file to read and insert in place of this element. This allows multiple pieces of content to be blended together into a single page.

B.5: footer

The footer element references a footer defined in the WSDL file for this web site.

B.5.1: footer content

The footer element has no content.

B.5.2: footer attributes

The footer element only has one optional attribute:

ref: The ref attribute contains the id of a footer element defined in the WSDL file. If no ref attribute is supplied, the first footer defined in the WSDL file is used.

B.6: gutter

In the presentation of some pages, it is useful to add space between columns of information. The gutter presentation element gives this capability.

B.6.1: gutter content

The gutter element has no content.

B.6.2: gutter attributes

The gutter attribute has three optional attributes:

width: The width attribute supplies a pixel or percentage width for the underlying table cell.
rowspan: The rowspan attribute allows a gutter to occupy two or more rows in the presentation.
class: The class attribute specifies a stylesheet class to apply to the gutter presentation element.

B.7: head

The head element described the meta-information that normally goes in the head of the HTML document. When the WSDL processor evaluates the head element, it automatically supplies the title element, meta tags for keywords and description, and the link tag for the stylesheet based on the page description.

B.7.1: head content

The head element can contain either zero or more text or code elements.

B.7.2: head attributes

The head element has no attributes.

B.8: header

The header element references a header defined in the WSDL file for this web site.

B.8.1: header content

The header element has no content.

B.8.2: header attributes

The header element only has one optional attribute:

ref: The ref attribute contains the id of a header element defined in the WSDL file. If no ref attribute is supplied, the first header defined in the WSDL file is used.

B.9: layout

All PLL files must have a layout element as the root element. This element contains all of the rest of the elements in the presentation description.

B.9.1: layout content

The layout element can contain an optional prelayout element, an optional head element, and a required body element.

B.9.2: layout attributes

The layout element has one optional attribute:

class: The class attribute specifies a particular form of output. Currently, only one form is supported html4. This attribute will eventually be used to tailor the output produced by the WSDL processor.

B.10: prelayout

Some variations on output format require special code to appear before the root element of the markup. The prelayout element allows the specification this information. Information that may go here includes the language declaration for ASP or the DOCTYPE declaration for HTML.

B.10.1: prelayout content

The prelayout element contains the text that should occur before the root element of the output. Since the content of the prelayout element often contains characters that are not valid in character content, it is usually contained in a CDATA section. This element must be empty if the ref attribute is specified.

B.10.2: prelayout attributes

The prelayout element has one optional attribute:

ref: The ref attribute supplies a reference to a text element from the WSDL file. The content of that element is placed before the root element in the output. If this attribute exists, the code element must contain no content.

B.11: row

The row element specifies the vertical positioning of pieces of the output page. All of the elements in a given row appear side-by-side in the output.

B.11.1: row content

The row element can contain zero or more cell or gutter elements.

B.11.2: row attributes

The row element has one optional attribute:

class: The class attribute specifies a stylesheet class to apply to the row presentation element.

B.12: text

The text element references a text defined in the WSDL file for this web site.

B.12.1: text content

The text element has no content.

B.12.2: text attributes

The text element only has one optional attribute:

ref: The ref attribute contains the id of a text element defined in the WSDL file. If no ref attribute is supplied, the first text defined in the WSDL file is used.

Appendix C: The WSDL Macro Reference

This section lists and describes all of the macro commands that are available in the WSDL system. These macro commands may be applied in many different contexts. Macro commands may be used in attributes of WSDL elements to retrieve data from other portions of the WSDL document. For example, if a developer wanted all of the images on a web site to be referenced from a consistent directory structure, he could build a directory element containing this information called images. Any file elements defined for images can now use a value like {{directory[images]}}/my_image.png for the href attribute. The images structure of the entire system can now be changed relatively painlessly.

Macro commands are also useful for generating text inside text elements, presentation files, and content. This could be used for direct references to images as in the example above. Macro commands can also be used for inserting boilerplate text like disclaimers, headers, and footers. A more powerful use of macros involves code routines. This allows WSDL to execute arbitrary pieces of Perl code to construct text to place in the output. Using code routines, relatively complex navigation is quite easy. More importantly, if structural changes are made in the WSDL file, this code can regenerate the new navigation automatically.

C.1: Builtin Macros

Builtin macros are called by placing the name of the macro between double curly braces where they should evaluate. (e. g. {{root}}) If the macro has parameters, the parameters are placed within the double curly braces between parenthesis after the macro name. (e. g. {{timestamp(gmt)}})

body: Return the currently defined body attributes in a format suitable for adding to an element.
cmdline: Return the value of the command line parameter specified by the supplied argument.
content: Return the content defined for this page.
description: Return the appropriate meta tag to add a description to the current page.
dest: Returns the value of the dest attribute on the outermost website.
footer: Content of the first footer element defined in the WSDL file. Often useful as a default if none is set by the current page.
full_url: Return the fully-qualified URL for the current page.
header: Content of the first header element defined in the WSDL file. Often useful as a default if none is set by the current page.
keywords: Return the appropriate meta tag to add keywords to the current page.
root: Returns the value of the root attribute on the outermost website.
source: Returns the value of the src attribute on the outermost website.
stylesheet: Return the appropriate link or style code to include the this page's stylesheet in the HTML.
timestamp(): Return the current timestamp in Perl localtime format. If the parameter gmt is supplied, the time in question will be GMT.
url_path: Return the URL for the current page, minus the server name and protocol.

C.2: Code Element Macros

Code element macros are called using the form {{code[routine_name]( args )}}. The routine_name is the id specified for one of the code elements in the WSDL file. The args must be supplied as whitespace-separated name=value pairs, e.g., {{code[nav_bar]( color=blue selected=white )}}. These arguments are passed as a hash to the code routine when it is called.

C.3: Elements and Attributes

Attributes of the current element can be referenced using {{@attr}}, where attr is the name of the attribute, e.g., {{@title}}. A specific element in the WSDL file can be referenced by it's id using {{type[id]}}, where type is the type of the element and id is it's unique id, e.g., {{page[main]}}. These two can be combined with the syntax {{type[id]/@attr}}, e.g., {{page[main]/@href}}.

C.4: Page Properties

Page properties can be accessed using the syntax {{prop[name]}}. The value of the property in this page with the supplied name is returned.

Appendix D: The DataDoc Object Reference

I created the ancestor of the DataDoc model before I knew that the Document Object Model (DOM) existed. When I discovered the standard DOM interface, I found that it was somewhat heavy-weight for my application. The DOM supports a large number of navigational and manipulation functions that are not required in this application. The data structure for the DOM is also complicated by the need to support this powerful navigation mechanism.

Additionally, much of the information in WSDL is stored in attributes of the individual elements. Although DOM does support attributes, gaining access to those attributes is a little awkward. This would have been a disadvantage in working with WSDL.

The DataDoc model consists of several classes:

XML::Data::DataDoc: The DataDoc class is an abstraction for an XML entity. It contains all of the elements and information from the XML file.
XML::Data::DataDoc::element: The element class abstracts all elements of the XML document.
XML::Data::DataDoc::text: The text class abstracts all character data of the XML document.
XML::Data::DataDoc::comment: The comment class abstracts any comment from the XML document.
XML::Data::DataDoc::pi: The pi class abstracts any processing instruction from the XML document.
XML::Data::DataDoc::cdata: The cdata class abstracts CDATA sections from the XML document.

D.1: The DataDoc Class

The DataDoc element encapsulates the entire XML document. This object supports the following member functions:

Content: This member function returns all of the content of the XML document.
Encoding: This member function returns (or sets) the encoding of the XML document.
Root: This member function returns the root element of the XML document.
Standalone: This member function returns (or sets) the standalone attribute of the XML document.
Version: This member function returns (or sets) the version of the XML document.

The following member functions should not be needed in any code routine used in WSDL. They are provided here for completeness.

new: This is the constructor for DataDoc objects.
AddContent: This member function adds a new object to the contents of the DataDoc.
Print: This member function prints the DataDoc to the standard out as a valid XML document.
MakeElement: This member function creates an element object with the given type.
MakeText: This member function creates a text object containing the given text.
MakeComment: This member function creates a comment object containing the given text.
MakePI: This member function creates a pi object with the given target and text.
MakeCdata: This member function creates an cdata object containing the given text.

D.2: The XML::Data::DataDoc::element Class

Any element data is stored in objects of type element. This includes the $curr object passed to the code routine. An element contains all of the attributes and content of the XML element it was read from.

The element object interface includes the following member functions:

AttribNames: The AttribNames member function returns a list of all of the attribute names used on this element. The order of the returned attribute names is not defined
Attrib: The Attrib member function gets the value of an attribute if the name of the attribute is the only parameter. If more than one parameter is passed to Attrib, the parameters are treated as name, value pairs and the appropriate attributes are set.
HasAttrib: The HasAttrib member function returns true if the element has the attribute specified by the parameter.
DelAttrib: The DelAttrib member function deletes the attribute specified by the parameter.
Content: When called with no arguments, it returns a list of all of the items contained in the element. Elements are returned as element objects, text is returned as text objects, and CDATA sections are returned as cdata objects.
ElementContent: The ElementContent member function returns a list of all elements that are contained by the object.
IsEmpty: The IsEmpty member function returns true if the element has no content.
TextContent: The TextContent member function returns all of the text and CDATA sections included in the element as one string.
Type: The Type member function returns the type of this element as a string.
Print: The Print member function prints the current element object and all of its contents to the standard output as an XML element.

The following member functions should not be needed in any code routine used in WSDL. They are provided here for completeness.

new: This is the constructor for element objects.
AddContent: This member function adds a new object to the contents of the element object.
DelContent: This member function removes from the content of the element object all content items specified by the string argument. The form of this argument is identical to the Content member function described below.
DelContentItem: This member function removes from the content of the element object the content item whose reference is passed as an argument.
ReplaceContent: This member function replaces the content item referenced by the first argument with all of the items in the following arguments.

D.2.1: The Content Member Function

The most general of the content member functions is Content. When called with no arguments, it returns a list of all of the items contained in the element. Elements are returned as element objects, text is returned as text objects, and CDATA sections are returned as cdata objects.

The Content member function can be called with various parameters to restrict the portions of the content returned. The different arguments are listed below:

tagname: If Content is called with a tag name as a string, it returns a list of the child elements of that type.
'*': If the string '*' is passed to Content, the function returns a list containing all of the child elements contained in the element, just like ElementContent.
'#text': If passed the string '#text', Content returns a list of all the text pieces contained in the element.
'#comment': If passed the string '#text', Content returns a list of all the comments contained in the element.
'#pi': If passed the string '#text', Content returns a list of all processing instructions contained in the element.
'#cdata': If passed the string '#text', Content returns a list of all the CDATA sections contained in the element.

D.3: The XML::Data::DataDoc::text Class

The text object is a thin wrapper over the actual text in an element's content. It supports three useful member functions:

Content: The Content member function returns the string that this object contains.
Print: The Print member function prints the contents of the current text object to the standard output as legal XML text.
Type: The Type member function always returns the string '#text'. This is useful for distinguishing text objects from element objects.

The following member function should not be needed in any code routine used in WSDL. They are provided here for completeness.

new: This is the constructor for text objects.

D.4: The XML::Data::DataDoc::comment Class

The comment object is a thin wrapper over an XML comment. It supports three useful member functions:

Content: The Content member function returns the string that this object contains.
Print: The Print member function prints the current object to the standard output as a legal XML comment.
Type: The Type member function always returns the string '#comment'. This is useful for distinguishing comment objects from element objects.

The following member function should not be needed in any code routine used in WSDL. They are provided here for completeness.

new: This is the constructor for comment objects.

D.5: The XML::Data::DataDoc::pi Class

The pi object is a thin wrapper over an XML processing instruction. The processing instruction target is accessible using the syntax $pi->{target} for a pi object stored in $pi. From the same object, the processing instruction data is accessible through $pi->{data}. It supports two useful member functions:

Print: The Print member function prints the current object to the standard output as a legal XML processing instruction.
Type: The Type member function always returns the string '#pi'. This is useful for distinguishing pi objects from element objects.

The following member function should not be needed in any code routine used in WSDL. They are provided here for completeness.

new: This is the constructor for pi objects.

D.6: The XML::Data::DataDoc::cdata Class

The cdata object is a thin wrapper over the actual text of a CDATA section. It supports three useful member functions:

Content: The Content member function returns the string that this object contains.
Print: The Print member function prints the current object to the standard output as a legal XML CDATA section.
Type: The Type member function always returns the string '#cdata'. This is useful for distinguishing cdata objects from element objects.

The following member functions should not be needed in any code routine used in WSDL. They are provided here for completeness.

new: This is the constructor for cdata objects.
AddContent: This member function adds a piece of text to the contents of the cdata object.

Appendix E: The WSDL DTD

<!--
     DTD for the Web Site Description Language
     Version: 0.7
     Author: G. Wade Johnson
     Copyright 2000, by G. Wade Johnson
       Released under the Perl Artistic License.
-->
<!-- Entities: attribute values -->
<!ENTITY  % fileref   "CDATA">
<!ENTITY  % url       "CDATA">
<!ENTITY  % rngnumber "CDATA">
<!ENTITY  % email     "CDATA">
<!ENTITY  % mimetype  "CDATA">
<!ENTITY  % linktypes "(absolute|relative)">
<!ENTITY  % fileclasses   "(other|applet|archive|audio|binary|
                            document|image|plugin|script|source|
                            text|video)">
<!ENTITY  % nameclasses   "(initial|given|family|middle|number|
                            nickname)">
<!ENTITY  % loclasses     "(pll|skeleton)">
<!ENTITY  % codeclasses   "(routine|begin|end)">
<!ENTITY  % boolexpr; "CDATA">

<!-- Entities: attributes -->
<!ENTITY  % root     "root      %url;      #REQUIRED">
<!ENTITY  % copyref  "copyref   IDREF      #IMPLIED">
<!ENTITY  % styleref "styleref  IDREFS     #IMPLIED">
<!ENTITY  % include  "include   %fileref;  #IMPLIED">
<!ENTITY  % codeclass "class    %codeclasses;  'routine'">
<!ENTITY  % layoutref "layoutref IDREF     #IMPLIED">
<!ENTITY  % bodyref   "bodyref   IDREF     #IMPLIED">
<!ENTITY  % dest      "dest     %fileref;  #IMPLIED">
<!ENTITY  % src       "src      %fileref;  #IMPLIED">

<!-- Entities: content values -->
<!ENTITY  % rawtext           "#PCDATA">
<!ENTITY  % styledtext        "#PCDATA">
<!ENTITY  % source.code       "#PCDATA">

<!ENTITY  % person            "honorific?, name+, degree*"
<!ENTITY  % contact.content   "descr?,((%person;)|institution)">

<!ENTITY  % nav.item          "group|page|pageref|subpage|fpage|
                               file|website">

<!ENTITY  % cond              "(if|choose)">

<!ENTITY  % siteinfo.content 
     "(server|directory|contact|copyright|text|header|footer|
       layout|body|%cond)*">

<!ENTITY  % page.content      "(%cond;|prop)*">

<!ENTITY  % stylesheet.content   "#PCDATA">

<!-- Element definitions -->

<!-- website: describes an entire site. -->
<!ELEMENT  website  (siteinfo?,navigation+,resources?,code*)>
<!ATTLIST  website
              id        ID          #REQUIRED
              %include;
              %src;
              %dest;
              %root;
              main      IDREF       #REQUIRED
              title     CDATA       #IMPLIED
              keywords  CDATA       #IMPLIED
              description CDATA     #IMPLIED
              %styleref;
              linktype  %linktypes; "relative"
              %layoutref;
              %bodyref;
              %copyref;>

<!-- navigation: describes the navigation for a website -->
<!ELEMENT navigation  ((%nav.item;|%cond;)*)>
<!ATTLIST navigation
              id        ID         #IMPLIED
              %include;>

<!-- resources: list of non-navigational items on a website -->
<!ELEMENT resources  ((stylesheet|set|page|file|fpage|subpage|
                       %cond;)*)>
<!ATTLIST resources
              %include;>

<!-- set: a grouping of resources that share common
     characteristics for instance a directory
-->
<!ELEMENT set  ((stylesheet|set|page|file|fpage|subpage|
                 %cond;)*)>
<!ATTLIST set
              id        ID         #IMPLIED
              %include;
              %dest;
              %src;
              root      %url;      #IMPLIED>

<!-- siteinfo: meta-data for the site -->
<!ELEMENT  siteinfo  %siteinfo.content;>
<!ATTLIST  siteinfo
              id        ID         #IMPLIED
              %include;	
              ref       IDREF      #IMPLIED>

<!-- group: a group of pages. equivalent to a section in many
     sites. -->
<!ELEMENT  group   ((%nav.item;|%cond;)+)>
<!ATTLIST  group   
              id        ID         #IMPLIED
              %include;	
              %dest;
              root      %url;      #IMPLIED
              main      IDREF      #IMPLIED
              href      %url;      #IMPLIED
              title     CDATA      #IMPLIED
              keywords  CDATA      #IMPLIED
              description CDATA    #IMPLIED
              %layoutref;
              %bodyref;
              %styleref;
              %copyref;>

<!-- page: a page to be built by this system. -->
<!ELEMENT  page  %page.content;>
<!ATTLIST  page
              title     CDATA      #REQUIRED
              %src;
              %dest;
              href      %url;      #IMPLIED
              id        ID         #IMPLIED
              keywords  CDATA      #IMPLIED
              description CDATA    #IMPLIED
              content   %fileref;  #IMPLIED
              %layoutref;
              %bodyref;
              %styleref;
              %copyref;>

<!-- prop: a property of the page. These properties contain
     small pieces of information that may be used in the
     construction of a page.
-->
<!ELEMENT  prop  EMPTY>
<!ELEMENT  prop
              name      NMTOKEN    #REQUIRED
              value     CDATA      #REQUIRED>

<!-- pageref: a reference to a page elsewhere in the web site.
     This element is needed because many sites are not trees,
     they are actually graphs.
-->
<!ELEMENT  pageref  EMPTY>
<!ATTLIST  pageref
              ref       IDREF      #REQUIRED
              fragment  NMTOKEN    #IMPLIED
              title     CDATA      #IMPLIED>

<!-- subpage: a reference to a location/fragment of another
     page.
-->
<!ELEMENT  subpage  EMPTY>
<!ATTLIST  subpage
              title     CDATA      #REQUIRED
              href      %url;      #IMPLIED
              fragment  NMTOKEN    #REQUIRED
              id        ID         #IMPLIED>

<!-- fpage: a "foreign page" which includes offsite links as
     well as local links to pages not built with this system.
-->
<!ELEMENT  fpage  EMPTY>
<!ATTLIST  fpage
              title     CDATA      #REQUIRED
              href      %url;      #REQUIRED
              fragment  NMTOKEN    #IMPLIED
              id        ID         #IMPLIED>

<!-- file: a non-html file linked on the system. This item
     describes files like downloadables, PDF files, images, etc.
     Anything that we may wish to manage in this system but is
     not created by the system.
-->
<!ELEMENT  file  EMPTY>
<!ATTLIST  file
              id        ID         #IMPLIED
              title     CDATA      #IMPLIED
              %src;
              %dest;
              href      %url;      #REQUIRED
              class  %fileclasses; #IMPLIED
              type      %mimetype; #IMPLIED
              nav       (yes|no)   "yes">


<!-- server: define a logical name for a server/machine on the
     web. -->
<!ELEMENT  server   EMPTY>
<!ATTLIST  server
              id        ID         #REQUIRED
              name      NMTOKEN    #REQUIRED
              class     CDATA      #IMPLIED>
<!-- some server classes: main, image, database, search, ad -->

<!-- directory: define a relative or absolute directory in the
     site. -->
<!ELEMENT  directory  EMPTY>
<!ATTLIST  directory
              id        ID         #REQUIRED
              name      %fileref;  #REQUIRED
              class     CDATA      #IMPLIED>
  <!-- some directory classes: image, database, stylesheet,
       include, applet, script
  -->

<!-- contact: information about a person or organization
     associated with the site.
-->
<!ELEMENT   contact   (%contact.content;)>
<!ATTLIST   contact
               id       ID         #REQUIRED
               href     %url;      #IMPLIED
               email    %email;    #IMPLIED>

<!-- descr: description of a contact -->
<!ELEMENT   descr       %styledtext;>

<!-- portions of a name -->
<!ELEMENT honorific    %rawtext;>
<!ELEMENT name         %rawtext;>
<!ATTLIST name
               class   %nameclasses; #IMPLIED>

<!ELEMENT degree       %rawtext;>
<!ELEMENT institution  %rawtext;>

<!-- copyright: copyright information for a site or page -->
<!ELEMENT   copyright  %rawtext;>
<!ATTLIST   copyright
               id       ID          #REQUIRED
               year     %rngnumber; #IMPLIED
               owner    IDREF       #IMPLIED>

<!-- text: definition of boilerplate text that may be used on
     the site. -->
<!ELEMENT   text       %styledtext;>
<!ATTLIST   text
               id       ID         #REQUIRED
               class    CDATA      #IMPLIED
               %include;>
  <!-- some text classes: disclaimer, message, -->
  <!-- if "include" is specified, read from there and ignore
       content. Should having content with src specified be an
       error?? -->

<!-- header: definition of header text that may be used on the
     site. -->
<!ELEMENT   header     %styledtext;>
<!ATTLIST   header
               id       ID         #REQUIRED
               %include;>
<!-- if "include" is specified, read from there and ignore
     content. Should having content with src specified be an
     error??
-->

<!-- footer: definition of header text that may be used on the
     site. -->
<!ELEMENT   footer     %styledtext;>
<!ATTLIST   footer
               id       ID         #REQUIRED
               %include;>
<!-- if "include" is specified, read from there and ignore
     content. Should having content with src specified be an
     error??
-->

<!-- stylesheet: definition of style information for a site.
 -->
<!ELEMENT   stylesheet    %stylesheet.content;>
               id       ID         #REQUIRED
               type     %mimetype; #REQUIRED
               %src;
               %dest;
               href     %url;      #IMPLIED>
<!-- if "href" specified, build a link. if content, build a
     style element and include inline. (What do we do about both?)
-->

<!-- layout: associates a name with a layout file. This file
     describes the layout of particular web pages. -->
<!ELEMENT   layout        EMPTY>
<!ATTLIST   layout
               id       ID          #REQUIRED
               file     %fileref;   #REQUIRED
               class    %loclasses; "pll">

<!-- code: container for code to be used in the construction of
     the website.
-->
<!ELEMENT   code    %source.code;>
<!ATTLIST   code
               id      ID         #REQUIRED
               %codeclass;
               %include;>

<!-- cover the full list of attributes from the HTML body tag
-->
<!ELEMENT   body    EMPTY>
<!ATTLIST   body
               id      ID         #REQUIRED
               class      CDATA   #IMPLIED
               style      CDATA   #IMPLIED
               title      CDATA   #IMPLIED
               lang       NAME    #IMPLIED
               dir      (ltr|rtl) #IMPLIED
               onclick    CDATA   #IMPLIED
               ondblclick  CDATA  #IMPLIED
               onmousedown CDATA  #IMPLIED
               onmouseup   CDATA  #IMPLIED
               onmouseover CDATA  #IMPLIED
               onmousemove CDATA  #IMPLIED
               onmouseout  CDATA  #IMPLIED
               onkeypress  CDATA  #IMPLIED
               onkeydown   CDATA  #IMPLIED
               onkeyup    CDATA   #IMPLIED
               onload     CDATA   #IMPLIED
               onunload   CDATA   #IMPLIED
               background CDATA   #IMPLIED
               bgcolor    CDATA   #IMPLIED
               text       CDATA   #IMPLIED
               link       CDATA   #IMPLIED
               vlink      CDATA   #IMPLIED
               alink      CDATA   #IMPLIED>
               

<!-- Conditionals
       the names and basic functionality are copied from XSL,
       but the tests are considerably simpler.
-->
<!-- if: provides simple if-then functionality. If the test
     evaluates to true include the content of the element.
     Otherwise, discard the content of this element.
-->
<!ELEMENT   if            ANY>
<!ATTLIST   if
                test    %boolexpr; #REQUIRED>

<!-- choose: provides the ability to choose among multiple
     options. This feat is accomplished with the help of the
     when and otherwise elements.
-->
<!ELEMENT   choose   (when+,otherwise?)>

<!-- when: provides the tests for a choose element. If the test
     evaluates to true include the content of the element.
     Otherwise, prune this subtree.
-->
<!ELEMENT   when          ANY>
<!ATTLIST   when
                test    %boolexpr; #REQUIRED>

<!-- otherwise: provides the default behavior of the choose
     element. If none of the when tests evaluate to true, use
     the content of this element. Otherwise, discard the content
     of this element.
-->
<!ELEMENT   otherwise     ANY>

Appendix F: The PLL DTD

<!--
     DTD for the Page Layout Language
     Version: 0.7.1
     Author: G. Wade Johnson
     Copyright 2000, by G. Wade Johnson
       Released under the Perl Artistic License.
-->
<!-- Entities: attribute values -->
<!ENTITY  % fileref     "CDATA">
<!ENTITY  % number      "CDATA">
<!ENTITY  % length      "CDATA">
<!ENTITY  % styleclass  "CDATA">
<!ENTITY  % valignvals  "(top|center|bottom|baseline)">
<!ENTITY  % alignvals   "(left|center|right)">

<!-- Entities: content values -->
<!ENTITY  % rawtext           "#PCDATA">

<!-- Element definitions -->

<!ELEMENT layout   (prelayout?, head?, body)>
<!ATTLIST layout
              class    (html4)    "html4">

<!ELEMENT prelayout   (%rawtext;)>
<!ATTLIST prelayout
              ref      NMTOKEN    #IMPLIED> <!-- ref to a text element -->

<!ELEMENT head     ((text|code)*)>
<!ATTLIST head     >

<!ELEMENT body     ((row+)|(content|header|footer|text|code)+)>
<!ATTLIST body
              cols     %number;   #IMPLIED
              rows     %number;   #IMPLIED>

<!ELEMENT row      ((cell|gutter)*)>
<!ATTLIST row 
              class  %styleclass; #IMPLIED>

<!ELEMENT cell     (header|footer|text|code|content)+>
<!ATTLIST cell
              width    %length;   #IMPLIED
              colspan  %number;   #IMPLIED
              rowspan  %number;   #IMPLIED
              valign %valignvals; #IMPLIED
              align   %alignvals; #IMPLIED
              class  %styleclass; #IMPLIED>

<!ELEMENT gutter   EMPTY>
<!ATTLIST gutter
              width    %number;   #IMPLIED
              rowspan  %number;   #IMPLIED
              class  %styleclass; #IMPLIED>

<!ELEMENT header   EMPTY>
<!ATTLIST header
              ref      NMTOKEN    #IMPLIED>

<!ELEMENT footer   EMPTY>
<!ATTLIST footer
              ref      NMTOKEN    #IMPLIED>

<!ELEMENT code     EMPTY>
<!ATTLIST code
              ref      NMTOKEN    #IMPLIED>

<!ELEMENT text     EMPTY>
<!ATTLIST text
              ref      NMTOKEN    #IMPLIED>


<!ELEMENT content   EMPTY>
<!ATTLIST content
              file     %fileref   #IMPLIED>

Appendix G: The Source for the Example Site

This is some of the source for the Internet College example site. The only portion of the source reproduced here is that which relates to WSDL. Most of the support files are not included.

G.1: WSDL Description

This is the complete WSDL description of the Internet College web site. This example shows some of the more interesting features of WSDL. The first such feature is the conditional compilation based on the PLL command line argument. If this argument exists and is non-zero, the web site is built using the PLL-based presentation. Otherwise, the web site is built using the skeleton-based presentation.

This web site uses a standard header and footer with a dynamic navigation bar on the left of the page. In actuality, the navigation is static HTML. There is a similar navigation bar on each page, the left_nav code routine generates the appropriate code for each page. The nav_utils element contains utility code needed to build the navigation bar.

In the resources element, there are a pair of PDF files that are intended to be referenced off the Registration and Billing Welcome page. These files are not part of the main structure for the site, but they are still maintained as part of the WSDL description.

The faculty pages are a good description of something that is awkward in most Server Side Scripting systems. The design constraint on the faculty pages is a standard page presentation, with consistent information. If the faculty member has a college home page, this page should reference that location. The interesting parts of this design are the fact that most of the page content is retrieved from a database at the time the web site is complied. The idea is that this data does not change very rapidly, so retrieving it at the time of request is wasteful of resources. In addition, we check a mounted directory in the file system to see if the faculty member has set up a home page.

The faculty pages are generated by the faculty_content text element containing several code routine references. The related code elements are database and enddatabase, for maintaining access to the faculty database, and stats and cv for generating the appropriate content for the pages.

The Registration and Billing system consists of two CGI scripts, schedule.cgi and billing.cgi. These scripts are not built by the WSDL processor. However, the design calls for those scripts to wrap their output in templates that are generated from WSDL. This allows the CGI scripts to maintain a consistent look with the rest of the site.

<!-- Internet College Website -->
<website id="wwwic" main="front" root="{{server[icmain]}}/"
         layoutref="standard" bodyref="defbody" src="icsrc"
         styleref="stylesheet1" linktype="absolute">
  <siteinfo>
    <contact email="webmaster@inet.edu" id="webmaster">
      <name>Webmaster</name>
    </contact>

    <choose>
      <when test="$CmdLine{TestSite}">
        <server id="icmain" name="{{cmdline(TestSite)}}"/>
        <server id="homes"  name="{{cmdline(TestSite)}}/faculty"/>
      </when>
      <otherwise>
        <server id="icmain" name="http://www.inet.edu"/>
        <server id="homes"  name="http://homes.inet.edu"/>
      </otherwise>
    </choose>
    <choose>
      <when test="$CmdLine{PLL}">
        <layout id="standard" file="icsrc/layout/standard.pll"
                class="pll"/>
        <layout id="fac" file="icsrc/layout/faculty.pll"
                class="pll"/>
      </when>
      <otherwise>
        <layout id="standard" file="icsrc/layout/standard.html"
                class="skeleton"/>
        <layout id="fac" file="icsrc/layout/faculty.html"
                class="skeleton"/>
      </otherwise>
    </choose>

    <body id="defbody" bgcolor="ivory"/>

    <footer id="def_footer"><![CDATA[<p>
<hr size="1" noshade>
<address> 
Email: <a href="mailto:{{contact[webmaster]/@email}}">
{{contact[webmaster]}}</a>.
</address></p>
<a href="{{page[disclaimer]/@href}}">Disclaimer and Legal
Information</a>
<h5> 
&copy; 2000 Internet Univeristy.<br>
All rights reserved.<br>
{{full_url}}
</h5> 
]]></footer>

    <header id="def_header"><![CDATA[
    <img src="{{root}}images/banner.png" alt="Inet College"
         width="600" height="80">
    <h1>{{@title}}</h1>
]]></header>
    <text id="faculty_content"><![CDATA[
<table border="0" cellpadding="0" cellspacing="0">
  <tr><td valign="top" width="100">
        <img src="{{root}}images/faculty/{{prop[homedir]}}.png"
             alt="id picture" height="150" width="100">
      </td>
      <td valign="top" align="left">{{code[stats]()}}</td>
  </tr>
  <tr><td colspan="2">{{code[cv]()}}</td></tr>
</table>
]]></text>
  </siteinfo>

<!-- Start of Site Navigation -->
  <navigation>
    <page href="{{root}}index.html" id="front" dest="index.html"
          title="Internet College Home" content="home.html"/>

    <page href="{{root}}disclaimer.html"
          title="Internet College Disclaimer" id="disclaimer"
          dest="disclaimer.html" content="disclaimer.html"/>

    <group main="front" id="mainsections">
      <group title="General Information" main="geninfo"
             root="{{root}}info/" dest="info/">
        <page id="geninfo" title="Information" content="geninfo.html"
              href="index.html" dest="index.html"/>
        <page id="directions" title="Directions" content="directions.html"
              href="directions.html"/>
        <page id="cal" dest="calendar.html" href="calendar.html"
              title="Inet College Calendar" content="cal2000-1.html"/>
      </group>

      <group title="Faculty" main="stafflist" dest="faculty/"
             root="{{root}}faculty/" layoutref="fac">
        <page id="stafflist" title="Our Faculty" href="Faculty.html"
              layoutref="standard" content="faculty.html"/>
        <page id="fac1" title="Dr. G. Brown"
              dest="{{prop[homedir]}}.html"
              href="{{prop[homedir]}}.html">
          <prop name="homedir" value="brown"/>
        </page>
        <page id="fac2" title="Dr. J. Bashir"
              dest="{{prop[homedir]}}.html"
              href="{{prop[homedir]}}.html">
          <prop name="homedir" value="bashir"/>
        </page>
        <page id="fac3" title="Dr. B. Crusher"
              dest="{{prop[homedir]}}.html"
              href="{{prop[homedir]}}.html">
          <prop name="homedir" value="crusher"/>
        </page>
        <page id="fac4" title="Dr. S. Franklin"
              dest="{{prop[homedir]}}.html"
              href="{{prop[homedir]}}.html">
          <prop name="homedir" value="sfranklin"/>
        </page>
        <page id="fac5" title="Dr. D. Sculley"
              dest="{{prop[homedir]}}.html"
              href="{{prop[homedir]}}.html">
          <prop name="homedir" value="sculley"/>
        </page>
        <page id="fac6" title="Professor C. Xavier"
              dest="{{prop[homedir]}}.html"
              href="{{prop[homedir]}}.html">
          <prop name="homedir" value="profx"/>
        </page>
        <page id="fac7" title="Dr. L. Zimmerman"
              dest="{{prop[homedir]}}.html"
              href="{{prop[homedir]}}.html">
          <prop name="homedir" value="zimmer"/>
        </page>
      </group>

      <group title="Registration and Billing" main="billinfo"
             root="{{root}}billing/" dest="billing/">
        <page id="billinfo" title="Welcome" href="index.html"
              content="billinfo.html"/>

    <!-- Create a template used by schedule.cgi for look&feel -->
        <page title="Your Schedule" id="yrsched"
              dest="schedule_tmpl.html" href="schedule.cgi"
              content="cgi_template.html"/>

    <!-- Create a template used by bill.cgi for look&feel -->
        <page title="Your Bill" id="yrbill"
              dest="bill_tmpl.html" href="bill.cgi"
              content="cgi_template.html"/>
        <page title="Financial Aid" id="finaid" href="finaid.html"
              content="financial.html"/>
      </group>
    </group>
  </navigation>

  <resources>
    <stylesheet href="{{root}}inet.css" id="stylesheet1" src="inet.css"
                type="text/css"/>

    <set id="schedules" root="{{root}}info/" dest="info/">
      <file title="Class schedule, Fall 2000" href="fall2000.pdf"
            src="schedules/fall2000.pdf" class="document"/>
      <file title="Class schedule, Spring 2001" href="spring2001.pdf"
            src="schedules/spring2001.pdf" class="document"/>
    </set>
    <set id="imageset" src="images/" dest="images/">
      <file title="Building" class="image" href="s18.gif" dest="s18.gif"/>
      <file title="Inet College" class="image" href="banner.png"
            dest="banner.png"/>
    </set>
    <set id="billscripts" src="billing/" dest="billing/">
      <file class="script" href="schedule.cgi"/>
      <file class="script" href="bill.cgi"/>
    </set>
    <set id="facultyphotos" root="{{root}}images/faculty/"
         src="images/faculty/" dest="images/faculty/">
      <file class="image" href="brown.png" dest="brown.png"/>
      <file class="image" href="bashir.png" dest="bashir.png"/>
      <file class="image" href="crusher.png" dest="crusher.png"/>
      <file class="image" href="sfranklin.png" dest="sfranklin.png"/>
      <file class="image" href="sculley.png" dest="sculley.png"/>
      <file class="image" href="profx.png" dest="profx.png"/>
      <file class="image" href="zimmer.png" dest="zimmer.png"/>
    </set>
  </resources>

<!--
   Set up interface to the Faculty database.
-->
  <code id="database" class="begin"><![CDATA[
    use lib 'icsrc';
    use FacultyDatabase;

    open_faculty_database() or die "Unable to access faculty data.\n";
]]></code>

<!--
   Shut down interface to the Faculty database.
-->
  <code id="enddatabase" class="end"><![CDATA[
    close_faculty_database() or die "Unable to close faculty data.\n";
]]></code>

  <code id="lwp" class="begin"><![CDATA[
    use LWP::Simple;
]]></code>
  

<!--
   These utility functions support the left-hand navigation functionality.
-->
  <code id="nav_utils" class="begin"><![CDATA[
    sub  leftnav_expand
     {
      my $curr      = shift;
      my $ancestors = shift;
      my $top       = shift;
      my $indent    = shift || "";
      my $output    = "";

      foreach my $c ($top->Content( '*' ))
       {
        my $p = $c;
        my $a = $ancestors;
        $p = $wsdl->byID( $c->Attrib('ref') )  if 'pageref' eq $c->Type();
        if('group' eq $c->Type())
         {
          $p = $wsdl->byID( $c->Attrib('main') );
          $a = [ $c, @{$ancestors} ];
          $output .= nav_row( 0, $p, $c->Attrib('title'), $a, $indent );
         }
        else
         {
          $output .= nav_row( $curr, $p, $c->Attrib('title'), $a,
                              $indent );
         }
         
        if('group' eq $c->Type() and descendent_of( $c, $curr ))
         {
          $output .= leftnav_expand( $curr, [ $c, @{$ancestors}],
                                     $c, $indent . "&nbsp;" );
         }
       }

      $output;
     }

    sub   nav_row
     {
      my $curr      = shift;
      my $p         = shift;
      my $title     = shift || $p->Attrib('title');
      my $ancestors = shift;
      my $indent    = shift;
      my $output    = "";

      local $Context[0]->{prop};  # expand properties in navbar
      set_page_properties( $p );
      
      my $href = make_href( $p, @{$ancestors} );
      my ($rstyle, $tstyle) = ("navrow", "navtext");

      ($rstyle, $tstyle) = ("currnavrow", "currnavtext")
                                                    if($p == $curr);

      $output .= qq{<tr><td class="$rstyle">$indent};
      $output .= qq{<a href="$href" class="$tstyle">$title</a>};
      $output .= qq{</td></tr>\n};

      $output;
     }
]]></code>

<!--
   Build left-hand navigation.
-->
  <code id="left_nav"><![CDATA[
    my $output = qq{<table border="0"};
    $output   .= qq{ width="$args{width}"}
                                      if exists $args{width};
    $output   .= qq{ cellpadding="$args{cellpadding}"}
                                      if exists $args{cellpadding};
    $output   .= qq{ cellspacing="$args{cellspacing}"}
                                      if exists $args{cellspacing};
    $output   .= qq{>\n};

    my $top = ($website->Content('navigation'))[0];
    $top = $wsdl->byID( $args{top} ) if exists $args{top};

    my $home = $wsdl->byID( $website->Attrib('main') );
    $output .= nav_row( $curr, $home, $home->Attrib('title'),
                        $ancestors, '' );
    $output .= leftnav_expand( $curr, $ancestors, $top );

    $output .= qq{</table>\n};
    
    $output;
]]></code>

<!--
   Retrieve the faculty statistics and display statistics.
-->
  <code id="stats"><![CDATA[
    my $homedir = $Context[0]->{prop}->{homedir};
    my $rec     = get_faculty_data( $homedir );
    my $output  = '';

    $output .= "$rec->{fullname}<br>\n";
    $output .= "$rec->{position}<br>\n<hr size='1' noshade>\n";
    $output .= "Office: $rec->{office}<br>\n";
    $output .= "Phone: $rec->{phone}<br>\n";
    $output .= "email: $rec->{email}<br>\n";
    my $url = $rec->{website} ||
         resolve_macros( "{{server[homes]}}/~$homedir/index.html" );
    
    $output .= qq{<a href="$url">Home Page</a>} if get( $url );

    $output;
]]></code>

<!--
   Retrieve the faculty biographical information and display statistics.
-->
  <code id="cv"><![CDATA[
    my $rec = get_faculty_data( $Context[0]->{prop}->{homedir} );
    my $output = '';

    if($rec->{bio})
     {
      $output .= "Current Biographical Information:<br>\n";
      $output .= $rec->{bio};
     }
    else
     {
      $output .= "No biographical information available.<br>\n";
     }
    
    if(@{$rec->{courses}})
     {
      $output .= "<p>Courses:</p>\n";
      $output .= "<ul>\n";
      foreach my $c (@{$rec->{courses}})
       {
        $output .= "  <li>$c</li>\n";
       }
      $output .= "</ul>\n";
     }
    else
     {
      $output .= "<br>Not scheduled for any classes at this time.<br>\n";
     }
]]></code>
</website>

G.2: Skeleton-based Presentation Files

The following skeleton-based presentation files provide the basic structure of the pages on the Internet College example web site.

G.2.1: Standard Presentation File

This is the presentation file for all of the pages on the web site except the faculty pages. It includes the standard header and footer, as well as the left hand navigation used throughout the site.

<html>
<head>
  <title>{{@title}}</title>
  {{stylesheet[stylesheet1]}}
</head>
<body {{body}}>
<table border="0" cellspacing="0" cellpadding="0" width="100%">
  <tr>
    <td colspan="2">{{header}}</td>
  </tr>
  <tr><td class="navbg" width="120" valign="top">
       {{code[left_nav](width=100% cellspacing=0 top=mainsections)}}
      </td>
      <td width="660" valign="top">{{content}}</td>
  </tr>
  <tr>
    <td colspan="2">{{footer}}</td>
  </tr>
</table>
</body>
</html>

G.2.2: Faculty Page Presentation File

This is the presentation file for the faculty pages. It includes the standard header and footer, as well as the left hand navigation used throughout the site. In addition, the content for the page is generated from the faculty_content text element.

<html>
<head>
  <title>{{@title}}</title>
  {{stylesheet[stylesheet1]}}
</head>
<body {{body}}>
<table border="0" cellspacing="0" cellpadding="0" width="100%">
  <tr>
    <td colspan="2">{{header}}</td>
  </tr>
  <tr><td class="navbg" width="120" valign="top">
       {{code[left_nav](width=100% cellspacing=0 top=mainsections)}}
      </td>
      <td width="660" valign="top">{{text[faculty_content]}}</td>
  </tr>
  <tr>
    <td colspan="2">{{footer}}</td>
  </tr>
</table>
</body>
</html>

G.3: PLL Presentation Files

The following PLL presentation files provide the basic structure of the pages on the Internet College example web site.

G.3.1: Standard Presentation File

This is the PLL file for all of the pages on the web site except the faculty pages. It includes the standard header and footer, as well as the left hand navigation used throughout the site.

<layout>
  <head/>
  <body cols="2" rows="3">
    <row><cell colspan="2"><header/></cell></row>
    <row><cell class="navbg" width="120" valign="top">
          <code ref="left_nav(width=100% cellspacing=0 top=mainsections)"/>
         </cell>
         <cell width="660" valign="top"><content/></cell></row>
    <row><cell colspan="2"><footer/></cell></row>
  </body>
</layout>

G.3.2: Faculty Page Presentation File

This is the PLL file for the faculty pages. It includes the standard header and footer, as well as the left hand navigation used throughout the site. In addition, the content for the page is generated from the faculty_content text element.

<layout>
  <head/>
  <body cols="2" rows="3">
    <row><cell colspan="2"><header/></cell></row>
    <row><cell class="navbg" width="120" valign="top">
          <code ref="left_nav(width=100% cellspacing=0 top=mainsections)"/>
         </cell>
         <cell width="660" valign="top"><text ref="faculty_content"/>
         </cell>
    </row>
    <row><cell colspan="2"><footer/></cell></row>
  </body>
</layout>

Glossary

Active Server Page (ASP): Active Server Page is the name given by Microsoft to their brand of server-side scripting. ASP pages consist of normal HTML with embedded instructions that allow for including boilerplate text and arbitrary scripting capabilities. Support for both VBScript and JScript is built in. ASP pages are interpreted at the time they are referenced[36].
attribute: An attribute is a name/value pair associated with an XML element. Attributes are often used to attach metadata to text in an XML document.
Cascading Style Sheets (CSS): A W3C Recommendation for defining the presentation characteristics of HTML and XML. The approach is to specify the presentation aspects of an HTML page in a separate document, called style sheet. There are currently two levels of CSS, CSS1[26] and CSS2[27].
CGI Script: A large fraction of the external programs called by web servers using the CGI interface were actually written in a scripting language, instead of being a compiled program[29].
See also: Common Gateway Interface
Cocoon: Cocoon is the name of project by the Apache XML Project to separate the document style, presentation, and logic.
Common Gateway Interface (CGI): An early common interface standard allowing web servers to communicate with external programs launched by the web server to service a particular HTTP request[29].
DocBook: DocBook is a long-standing document markup language based on SGML. It has features for creating books, articles, sets of books, UNIX man pages, and many other forms of documents. Much of the current work on DocBook is focused on converting it to an XML application[11].
Document Object Model (DOM): In an effort to standardize access to XML elements from scripting languages, W3C defined the Document Object Model as an access model for XML documents [14].
Document Type Definition (DTD): A Document Type Definition is formal description of a particular SGML/XML-based markup language.
element: The basic unit of an XML document is an element. An element consists of start and end tags, optional attributes, and optional contents.
empty element tag: If an element has no content, it can be abbreviated into an empty element tag. An empty element tag begins with the character <, followed by the element name, followed by optional attributes, and ending with the characters />.
end tag: Special character string that delimits the end of an element. An end tag begins with the characters </, followed by the element name, and ending with the character >.
Extensible HyperText Markup Language (XHTML): The latest version of HTML recommended by the W3C is a reformulation of HTML as an XML application. XHTML, as it has been called, has a much more rigid structure and definition. It is hoped that this variant will make the writing of tools to deal with XHTML easier and less prone to errors.
Extensible Markup Language (XML): XML is a simplification of SGML aimed specifically at ease of implementation and potential uses on the Web.
Extensible Stylesheet Language (XSL): XSL is the newest W3C recommendation for providing the presentation information to complement the structure defined in an XML document. XSL uses an XML-based syntax and supports transformations of the XML document as well as presentation.
FrontPage: FrontPage is a WYSIWYG HTML-authoring tool distributed by Microsoft.
GedML: GedML is a format for genealogical information developed by Michael H. Hay, based on the GEDCOM format[10].
general entity: A general entity is a mechanism in XML and SGML for including arbitrary text and markup from another resource or file into the current document.
HyperText Markup Language (HTML): HTML is the current markup system used for publishing data on the World Wide Web. HTML is based on SGML syntax, but is not extensible. Originally created to define the structure of documentation for experiments and hardware at CERN[15].
Intelligent Agent: The term Intelligent Agent refers to an application that searches the Web on the behalf of a user and generates specialized pages reporting information of interest to the user. This was one of the great failures of the current Web due to the difficulty of extracting context from the data displayed on the average Web Page.
Interface Definition Language (IDL): An Interface Definition Language is a method for describing the interface between two processes. The IDL normally describes the interface in a programming language-independent fashion. A separate compiler generates the appropriate skeleton code to be used for the actual implementation of the interfaces.
markup: In complicated documents, some of the information is not captured by the text of the document. This may be context or style information or something else entirely. Markup is a method for adding this extra information to text, without resorting to binary formatting.
MathML: MathML is an XML application intended to be used as an extension to HTML. MathML supplies tags to describe mathematical and scientific content, such as equations, on the web[40].
metadata: Structured data that describes or expands a resource is called metadata. Metadata is sometimes referred to as data about the data. Often used to provide context for the data.
metric: A metric is something measurable or quantifiable about a system.
Page Layout Language (PLL): The PLL is a high-level markup language designed to help structure the individual pieces of a set of HTML pages. The language describes components of the pages at a high level to abstract the presentation knowledge from the pages themselves.
Perl: Perl is a scripting language created by Larry Wall. It contains extensive support for text manipulation and process control. Perl has added Object Oriented features in the latest versions of the language.
Precision Graphics Markup Language (PGML): PGML is a vector graphics format based on the PDF imaging model. It was submitted to W3C by Adobe, IBM, Netscape, and Sun Microsystems[10].
Real Estate Listing Markup Language (RELML): RELML is an XML application based on the MLS standard from the real estate industry. Its purpose is to simplify placing MLS information on the Web[10].
Remote Procedure Call (RPC): RPC is a mechanism for communicating between processes or machines. One difference between RPC and most other methods of inter-process communication is the fact that is looks like an ordinary procedure call. Special purpose proxy code converts the parameters into a message that can pass outside the current address space to the other process. Similar code converts the returned data back into a form that can be read directly in the current address space.
Serialization: Serialization is the process by which an object is converted to a format which can be stored somewhere besides memory, often on a hard disk. This facility is usually used to allow objects to persist between invocations of a program.
Server Side Includes: The Server Side Include technology was originally conceived as a way to make HTML easier to maintain and to provide a simpler method for generating dynamic content. One of its features was support for the inclusion of other files into an HTML page before it is sent to the client. This provided a good solution for the boilerplate text problem[30].
Server Side Scripting: The term Server Side Scripting incorporates all of the dynamic content systems that rely on interpreted code added to the HTML content that is intended to be interpreted by the web server.
skeleton: A skeleton file is an HTML file containing special macro commands that serves as a template for an output page in the WSDL system.
Standard Generalized Markup Language (SGML): SGML is a system originated at IBM for describing the structure of a document. Previous systems had supported direct in-line formatting commands, but no structure[25].
start tag: Special character string that delimits the beginning of an element. A start tag begins with the character <, followed by the element name, followed by optional attributes, and ending with the character >.
style sheet: A style sheet is a description of formatting information to be applied to a document. A style sheet normally supplies information about fonts, colors, and typographical effects. Some style sheets even support extensive rearrangement of the base document.
Synchronized Multimedia Integration Language (SMIL): The SMIL language defines combinations of audio, video, text, and graphics as a single real-time multimedia presentation. The language provides features to allow an author to choreograph the various pieces to build a full-featured presentation[28].
tag: The term tag is used to denote the special text which starts and ends an XML element.
The Channel Definition Format (CDF): CDF allows a web publisher to specify frequently updated sets of data (called channels) that appropriately configured client software can retrieve automatically[33]. This was one of the architectures designed for the push technology of the late nineties.
The Chemical Markup Language (CML): CML is a markup language designed for use in documentation pertaining to the molecular sciences. As with any field, there were several systems for documenting information in these fields. CML was developed to unify these approaches. Although based on SGML, CML remains XML compatible[41].
Unicode: Unicode is a standard character encoding format maintained by the Unicode Consortium[34] to support all of the major languages in use in the world.
Vector Markup Language (VML): VML is an XML application for describing vector graphics submitted to W3C by Autodesk, Hewlett-Packard, Microsoft, and Visio. VML has particular support for CSS[10].
Web Interface Definition Language (WIDL): WIDL is an XML application that allows programmers to specify an API for dealing with Web pages as if they were library routines.
Web Site Description Language (WSDL): The WSDL is a language which attempts to describe an entire web site at a high level, but with enough detail so that a program can generate the web site.
What you see is what you get (WYSIWYG): WYSIWYG is a term used to describe editors, word-processors, and similar tools that display presentation changes and allow the user to manipulate them graphically. This has become one camp in a holy war between those who believe that presentation is an integral part of the document and those who believe that the presentation and structure of a document should be separate.
Wireless Application Protocol (WAP): A protocol defined by Unwired Planet, Nokia, Ericsson, and Motorola to allow smart cell-phones to deal with a subset of the Web. The WAP standard defines the Wireless Markup Language (WML) to replace HTML, WMLScript as a client-side scripting system, and Wireless Bitmaps WBMP) as a graphics format. Together these specifications should allow Web-like applications on cell phones[44].
World Wide Web Consortium (W3C): The World Wide Web Consortium oversees the development of technologies and protocols associated with the Web[38].
XML Schema: There are several groups currently working on methods to specify new XML applications in an XML-like format instead of the current DTD format. In the process, these specifications cover a richer set of data types than just text. The current W3C effort is called XML Schema.
XPath: XPath is a specification for describing and locating elements and attributes in an XML document based on the natural hierarchy of the elements in the document. It's original intent was to unify the efforts of different standards that needed to reference portions of an XML document.
XSL Transformations (XSLT): XSLT is the portion of the W3C stylesheet specification relating to the transformation of one XML document into another XML document.
See also: Extensible Stylesheet Language

Sources Consulted

[1] Broumphrey, Frank et al. XML Applications. Wrox Press Ltd., 1998.

[2] Brown, William J. et al. AntiPatterns: Refactoring Software, Architectures, and Projects in Crisis. John Wiley & Sons, Inc., 1998.

[3] Connolly, Dan, ed. XML: Principles, Tools, and Techniques. World Wide Web Journal. O'Reilly & Associates, Winter 1997.

[4] Fleming, Jennifer. Web Navigation: Designing the User Experience. O'Reilly & Associates, 1998.

[5] Harold, Elliotte Rusty. XML Bible. IDG Books Worldwide, 1998.

[6] Liu, Bowen. "Object-oriented Templates for WWW Development and Management: A Model and Its Implementation," Master of Science thesis, University of Houston, December 1999.

[7] Rosenfield, Louis, and Peter Morville. Information Architecture for the World Wide Web. O'Reilly & Associates, 1998.

[8] Stein, Lincoln, and Doug MacEachern. Writing Apache Modules with Perl and C. O'Reilly & Associates, 1999.

[9] St. Laurent, Simon. XML Elements of Style. McGraw-Hill, 2000.

[10] St. Laurent, Simon, and Ethan Cerami. Building XML Applications. McGraw-Hill, 1999.

[11] Walsh, Norman, and Leonard Muellner. DocBook: The Definitive Guide. O'Reilly & Associates, 1999.

[12] The Apache Software Foundation. Cocoon. http://xml.apache.org/cocoon/index.html, August, 2000.

[13] The Apache Software Foundation. Cocoon: A Publishing Infrastructure. http://xml.apache.org/cocoon/infrastructure.html, August, 2000.

[14] Le Hégaret, Philippe. Document Object Model (DOM). http://www.w3.org/DOM/, August 11, 2000.

[15] Berners-Lee, Tim. The original proposal of the WWW, HTMLized. http://www.w3.org/History/1989/proposal.html, May, 1990.

[16] Bray, Tim, Jean Paoli, and C. M. Sperberg-McQueen, ed. Extensible Markup Language (XML) 1.0. http://www.w3.org/TR/1998/REC-xml-19980210, February 10, 1998.

[17] Bos, Bert. Web Style Sheets. http://www.w3.org/Style/, November 27, 1999.

[18] Carter, Josh. gxml2html: generic XML to HTML conversion tool. http://multipart-mixed.com/xml/, November 14, 1999.

[19] Clark, James, ed. XSL Transformations (XSLT). http://www.w3.org/TR/xslt, November 16, 1999.

[20] Clark, James, and Steve DeRose, ed. XML Path Language (XPath). http://www.w3.org/TR/xpath, November 16, 1999.

[21] Connolly, Dan. Extensible Markup Language (XML). http://www.w3.org/XML/, November 26, 1999.

[22] Connolly, Dan. The XML Revolution. http://helix.nature.com/webmatters/xml/xml.html, October 1, 1998.

[23] Cover, Robin. WAP Wireless Markup Language Specification (WML). http://www.oasis-open.org/cover/wap-wml.html, November 23, 1999.

[24] Deach, Stephen, ed. Extensible Stylesheet Language (XSL). http://www.w3.org/TR/WD-xsl, April 21, 1999.

[25] Goldfarb, Charles F. The Roots of SGML -- A Personal Recollection. http://www.sgmlsource.com/history/roots.htm, October 11, 1997.

[26] Lie, Håkon Wium, and Bert Bos. Cascading Style Sheets, level 1. http://www.w3.org/TR/REC-CSS1, January 11, 1999.

[27] Bos, Bert et al. Cascading Style Sheets, Level 2. http://www.w3.org/TR/REC-CSS2/, May 12, 1998.

[28] Michel, Thierry. Synchronized Multimedia. http://www.w3.org/AudioVideo/, May 3, 2000.

[29] NCSA. Common Gateway Interface. http://hoohoo.ncsa.uiuc.edu/cgi/intro.html, December 6, 1995.

[30] NCSA HTTPd Development Team. NCSA HTTPd Tutorial: Server Side Includes (SSI). http://hoohoo.ncsa.uiuc.edu/docs/tutorials/includes.html, September 28, 1995.

[31] Raggett, Dave, and Ian Jacobs. HTML Home Page. http://www.w3.org/MarkUp/, November 28, 1999.

[32] Sperberg-McQueen, C. M., and Lou Burnard. A Gentle Introduction to SGML. http://www.uic.edu/orgs/tei/sgml/teip3sg/index.html, March 5, 1996.

[33] Tauber, James. Web, Internet, Networks at SCHEMA.NET. http://www.schema.net/web/#cdf, June 25, 2000.

[34] The Unicode Consortium. Unicode Home Page. http://www.unicode.org/, September 29, 2000.

[35] UserLand Software, Inc. XML-RPC Home Page. http://www.xmlrpc.com/, June 02, 2000.

[36] Webb, Martin. irt.org Knowledge Base: Q5800 What is ASP?. http://developer.irt/org/script/5800.htm, June 3, 2000.

[37] West, Mark. The Server Side Includes Tutorial. http://www.carleton.ca/~dmcfet/html/ssi1.html#yeah, February 19, 1995.

[38] The World Wide Web Consortium. The World Wide Web Consortium. http://www.w3.org/, November 24, 1999.

[39] The World Wide Web Consortium. HyperText Mark-up Language. http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/MarkUp.html, November 3, 1992.

[40] The World Wide Web Consortium. W3C's Math Home Page. http://www.w3.org/Math/, June 2, 2000.

[41] Zara, Steve. XML-CML.ORG - The Site for Chemical Markup Language. http://www.xml-cml.org/, August 31, 2000.

[42] Andrivet, Sebastien. "A Simple XML Parser," C/C++ Users Journal 17, no. 7 (July 1999): 22-32.

[43] Hamstra, Dirk. "XML and CORBA," Dr. Dobb's Journal, no. 305 (November 1999): 98-100.

[44] Mann, Steve. "The Wireless Application Protocol," Dr. Dobb's Journal, no. 304 (October 1999): 56-66.

[45] Monson, Lynn. "The WIDL Specification," Dr. Dobb's Journal, no. 291 (November 1998): 92-96.

[46] Sintes, Tony. "XML and Software Configuration," Dr. Dobb's Journal, no. 314 (July 2000): 56-62.

[47] Goldman, Roy, Jason McHugh, and Jennifer Widom. "Lore: A Database Management System for XML," Dr. Dobb's Journal, no. 311 (April 2000): 76-80.