This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

Anomaly ~ G. Wade Johnson Anomaly Home G. Wade Home

March 25, 2006

Domain Specific Languages, a Renewed Interest

I've seen quite a bit of interest in Domain Specific Languages (DSLs) on the Internet lately. Some good examples include Martin Fowler's exploration of the subject:

* MF Bliki: DomainSpecificLanguage
* Language Workbenches: The Killer-App for Domain Specific Languages?

He does point out that this is not a new idea. He uses Unix as an example of a system that uses a large number of DSLs. The subject has gotten enough interest to reach the point where people are discussing when it is a good approach to apply (Artima Developer Spotlight Forum - How and When to Develop Domain-Specific Languages?). Others are beginning to apply the term DSL to extend their area of interest (Agile Development with Domain Specific Languages).

So, from this we can guess that interest in DSLs is on the rise. As Fowler pointed out, Unix has been a nexus for the creation of DSLs, including:

* make
* regular expressions
* awk
* sed
* yacc
* lex
* dot

and many more.

Recently, I have seen the suggestion that extending or modifying a general purpose language is a powerful way to build a useful DSL. To some extent, this is also well-trodden ground. The classic example of this approach was implemented using preprocessors to provide facilities not present in the original language, both Ratfor (for structured programming in FORTRAN) and cfront (for object oriented programming in C) used this approach.

A recent article, Creating DSLs with Ruby, discusses how the features of the Ruby language make it well suited to building DSLs without a separate preprocessing step. The Ruby language apparently supports features that supports creating simple DSL syntax that is still legal Ruby code.

This is a very powerful technique that is not very easy to do with most languages. Amusingly enough, this technique is also not very new. In fact, there was a general purpose programming language that was designed around the concept of writing a domain language: Forth. If I remember correctly, Charles Moore once described programming in Forth as writing a vocabulary for the problem domain, defining all of the words necessary to describe the answer, and then writing down the answer.

The Forth language is different than most programming languages you might have encountered because it has almost no syntax. What looks like syntax is actually a powerful technique for simple parsing, combined with the ability to execute code at compile time. This allows for extending the capabilities of the language in a very powerful way. One interesting effect of this feature is that many good Forth programmers naturally gravitate toward the DSL approach when solving problems above a certain level of complexity. We firmly believe that some problems are best served by a language, not an arcane set of configuration options.

Forth does give us an important insight into the problems with DSLs, as well. There is is a well-known joke among Forth programmers:

If you've seen one Forth...you've seen one Forth.

Unlike more traditional programming, Forth programs are built by extending the language. A new programmer trying to learn a Forth system needs to learn the new dialect including the extensions used in this environment. This is not technically much different than learning all of the libraries and utility classes used in a more traditional system, but there is a conceptual difference. In Forth (or a DSL-based system), there is no syntactic difference between the extensions and the base language. The language itself can be extended without an obvious cue in the code to say when you have changed languages. This means that a new programmer may not recognize a new piece to learn as readily as when seeing an obvious library call.

This becomes a very important tradeoff: Which is more important, ease of learning for new programmers or power for advanced users? A well-designed DSL gives the advanced user a succinct notation to use to express hie or her requirements concisely and precisely. This is the appeal of the DSL. The downside is that this represents a new set of knowledge for each programmer relating to troubleshooting and debugging. It also requires more care in the design to develop a consistent and usable notation.

As usual, the tradeoff is the key. We need to be able to decide if the benefits outweigh the disadvantages and build the system accordingly. I wish there was a magic formula that could be applied to tell how or when a DSL would improve a system. Unfortunately, I have not seen a sign of such a formula yet.

Posted by GWade at 11:40 PM. Email comments | Comments (0)

August 10, 2004

Review of The Little Schemer

The Little Schemer
Daniel P. Friedman and Matthias Felleisen
The MIT Press, 1996.

One of my wife's friends recommended this book for learning Scheme. He's a big proponent of Scheme and has even done research into teaching Scheme to kids. He is quite knowledgeable in his field. On the other hand, I have never written a line of Scheme; although I did some coursework with LISP during my master's degree. Although I don't normally choose to work in LISP(-based) languages, I can appreciate some of their power.

I realized that this book was going to require a bit of suspension of disbelief in the preface, where I found this gem:

Most collections of data, and hence most programs, are recursive.

I agree that there are many useful collections of data that are recursive. I would even agree that many programs apply recursion. But I find the assertion that most are recursive a little strong. In fact, the only way I could see this is if the language the writers were working in treated almost everything as recursion. And, of course, this is the case.

The other real problem I had with the book is the style. The book is written as a series of questions and answers that lead you to the conclusions that they wish. Some of these question and answer sessions became quite strained; such as trying to explain how a complicated function works. In other spots, the authors asked a question that there is no way the reader could have begun to answer. The authors would then respond with something like:

You were not expected to be able to do this yet, because you are missing some of the ingredients.

I found this style very jarrng. I've learned a dozen or so languages from books (including LISP), and I've never had this much trouble understanding a computer language. The style may work for someone else, but it did nothing for me.

From the comments above, you might decide that I have nothing good to say about the book. That's actually not the case. I found the Five Rules and the Ten Commandments to be very effective ideas. The Five Rules basically define some of the most primitive operations in the language. The Ten Commandments state some important best practices.

I was also surprised at times by really sweet pieces of code and understanding that would come after some prolonged pseudo-Socratic Q&A sessions. Some of the insights and commandments are well worth the read. But, overall I found it difficult going.

Since it is considered one of the classics for learning Scheme, you should probably consider it if you are needing to learn Scheme or LISP. If all you've done is code in C or Java, it might be worth reading to introduce yourself to another way of looking at problems. But, I find it very hard to recommend this book to anyone else.

Posted by GWade at 10:19 PM. Email comments | Comments (0)

July 06, 2004

Review of Compiler Design in C

Compiler Design in C
Allen I. Holub
Prentice Hall, 1990

I decided to take a break from the relatively new books I've been reviewing and hit a real classic.

Over a decade ago, I saw Compiler Design in C when I was interested in little languages. A quick look through the book convinced me that it might be worth the price. I am glad I took the chance. This book describes the whole process of compiling from a programmer's point of view. It is light on theory and heavy on demonstration. The book gave an address where you could order the source code. (This was pre-Web.) All of the source was in the book and could be typed in if you had more time than money.

Holub does a wonderful job of explaining and demonstrating how a compiler works. He also implements alternate versions of the classic tools lex and yacc with different tradeoffs and characteristics. This contrast allows you to really begin to understand how these tools work and how much help they supply.

The coolest part for me was the Visible Parser mode. Compilers built with this mode displayed a multi-pane user interface that allowed you to watch a parse as it happened. This mode serves as an interactive debugger for understanding what your parser is doing. This quickly made me move from vaguely knowing how a parser works to really understanding the process.

Many years later, I took a basic compilers course in computer science and the theory connected quite well with what I learned from this book. Although the Dragon Book covers the theory quite well, I wouldn't consider it as fun to read. More importantly, nothing in the class I took was nearly as effective as the Visible Parser in helping me to understand the rules and conflicts that could arise.

Although this book is quite old, I would recommend it very highly for anyone who wants to understand how parsers work, in general. Even if you've read the Dragon Book cover to cover and can build FAs in your sleep, this book will probably still surprise you with some fundamentally useful information.

The book appears to be out of print, but there are still copies lurking around. If you stumble across one, grab it.

Posted by GWade at 10:29 PM. Email comments | Comments (0)

May 26, 2004

The Fallacy of the One, True Way of Programming

What is it about programmers that cause them to latch onto one approach and try to apply it to every problem? Maybe it's a characteristic of people in general. Each new paradigm, methodology, or even language seems to grab hold of many programmers and make them forget anything they ever knew. They seem to develop an overwhelming need to re-implement and replace all that came before the new tool.

This isn't always a bad idea. Done deliberately, this technique can help you evaluate a new methodology or language by comparing the solution of a known problem with a previous solution of that problem. Unfortunately, many of us seem to assume that the new way is better and apply the new approach despite any evidence that it does not solve the problem more effectively.

Over time, I've come to the conclusion that there is no One, True Way of Programming. This is not my own insight. I have read this and heard this from numerous people with more experience. Earlier in my career, I did not believe them. I didn't disagree, I just didn't believe. Through experience with multiple paradigms, languages, and methodologies, I've finally come to understand what those senior people were trying to say.

Multiple paradigms are different tools in my bag of tricks. Each language is another tool that I can apply to a problem. Different methodologies give me different approaches to solving a problem. But, in the end, no single set of these tools can help me solve all problems. Each combination has strengths and weaknesses. One of the most important skills we can develop is the ability to recognize the right tool for the job.

Many years ago, I saw this point made in a book. (One day I'll remember which book.) The author pointed out that very different techniques and processes are needed when you are building a skyscraper and when you are building a doghouse. The author pointed out that the techniques that would be used to architect a skyscraper are not cost-effective when building a doghouse. I think I would now go a little further and suggest that the techniques are not appropriate, and that cost is one easy measure of why they are not appropriate.

Posted by GWade at 10:38 PM. Email comments | Comments (0)

May 01, 2004

Idiomatic Programming

Like any natural language, most programming languages support idioms. According to the American Heritage Dictionary of the English Language, an idiom is

A speech form or an expression of a given language that is peculiar to itself grammatically or cannot be understood from the individual meanings of its elements, as in keep tabs on.

Programming idioms can, of course, be understood functionally from its individual elements. The interesting thing about most programming idioms is what they say to the programmers who will read the code later. A programming idiom can reveal many things about the author of the code. To some extent, the idioms you choose are a function of your understanding of the language, what you may have been taught in school, and the other languages you have programmed in.

I made a comment on this subject earlier in Textual Analysis and Idiomatic Perl on Perl Monks. In that article, I pointed out that idioms are cultural, just like those in natural languages. Also like in natural languages, if you only have experience in one culture and language, the idioms of another appear unnatural or wrong. When learning a language, you learn the syntax and obvious semantics. You have to live with the language for a long while to learn the idioms.

The result of this is that the idioms of a language are foreign to new speakers. As they continue to use the language, simple idioms become part of their vocabulary. At first, they will misuse or over-use these idioms. Over time, they develop an appreciation for the subtleties of the expression. Eventually, those idioms are only used when necessary. With more exposure, more idioms become assimilated into your vocabulary. Soon, you learn to speak/program the language relatively fluently. Also, like natural languages, you may always retain a bit of an accent, based on your earlier languages.

This explanation seems to explain the course I've seen many programmers, as well as myself, take with a new programming language.

  1. You learn the basics.
  2. You begin to recognize and fight against the idioms.
  3. You slowly recognize the sense of simple idioms and try to use them.
  4. You over-use the idioms you know.
  5. You develop a reasonable understanding of those idioms and begin using them correctly.
  6. You learn new idioms and go back to 4.
  7. Eventually, you program with the language instead of fighting it.

The simpler the language, the quicker you get to the end of this cycle. But, a simpler language is not as expressive. So, you find yourself doing more work than should be necessary. A more expressive language takes longer to learn, but it's easier to pass on subtle shades of meaning to the maintenance programmer (possibly yourself).

This is not to say that you must use advanced idioms to be effective in a language. Sometimes, a simple expression is all that is warranted. But, complex ideas and complex problems often have subtle solutions. Subtle shades of expression can help the reader of your code recognize when they need to pay attention. In the example from my Perl Monks note, I talk about variations in intent expressed by different conditional expressions in Perl.

Idiom can also serve as a warning. An unusual or advanced idiom should tell the reader that he is entering dangerous territory. This is a message that it's time to pay close attention. A friend of mine is fond of referring to advanced code with the phrase Here there be dragons like on old maps. A properly placed idiom can do that more effectively than a comment. More importantly, if a good programmer runs into an new or unusual idiom, he should know to tread carefully and pay attention.

I think that idiomatic programming is a lot like becoming fluent in a natural language. You may be able to communicate with someone from Mexico using your high school Spanish. But going beyond simple, straight-forward communication will probably take a lot of effort. In some cases, that's enough. You can be mostly right when asking where is a restroom or how to find a taxi. But if you needed to argue a business or legal case or explain a medical condition, I'm sure you would want every possible nuance of the language at your disposal.

Programming is one of the most complex things that humans do. You have to be able to take complex concepts and translate them in such a way that they communicate to two very different audiences: an incredibly literal device and an intelligent person who may need to make subtle changes later. Explaining the instructions correctly to the computer is hard. Communicating your intent to the other programmer is even worse. If that programmer misunderstands your intent, he might ruin all of your carefully thought-out work. Well-chosen idioms can help in communicating that intent.

Posted by GWade at 11:45 PM. Email comments | Comments (0)