The
"Great Debate" is a very emotional exchange that has been running
continuously since the late '70s. On some newsgroup somewhere, you will
find a thread discussing this topic; although you will have the best
luck looking at the Internet comp.lang.asm.x86, comp.lang.c,
alt.assembly, or comp.lang.c++ newsgroups. Of course, almost anything
you read in these newsgroups is rubbish, no matter what side of the
argument the author is championing. Because this debate has been raging
for (what seems like) forever, it is clear there is no easy answer;
especially one that someone can make up off the top of their head (this
describes about 99.9% of all postings to a usenet newsgroup). This page
contains a series of essays that discuss the advances of compilers and
machine architectures in an attempt to answer the above question.
Although I intend to write a large number of these essays, I encourage
others, even those with opposing viewpoints, to contribute to this
exchange. If you would like to contribute a "well thought out,
non-emotional essay" to this series, please send your contribution
(HTML is best, ASCII text is second best) to
debate@webster.ucr.edu
Often,
someone will try to "prove" to me that compilers produce really good
code by taking some HLL sequence of statements and showing me the
outstanding assembly sequence that compiler produces. Folks, there is
only one way to prove an "always" condition by example; that's by
enumerating all possibilities and showing the condition to be true for
all such possibilities. Since there are (for all practical purposes) an
infinite number of possible programs one can write, you will not be
able to prove a compiler's worthiness by example.
Note that the general question is not "Can a compiler produce a code sequence that is as good as (or better than) a human would?"
I consider myself to be an expert assembly language programmer.
However, I am not ashamed to admit that I've learned some assembly
language tricks by studying the output of various compilers. Just
because I wrote some assembly sequence (without looking at some
compiler's output) and you found a compiler that bests me doesn't make
the compiler better than me. If you feed the same input to a compiler
twice, you will always (assuming a deterministic program) get the same
output. Give the same problem to an assembly language programmer twice
and you're likely to get two different solutions. One will probably be
better than the other. Have compilers ever beaten me? Yes. They do it
all the time. But on different code sequences I beat the compiler every
time. If I really apply myself, I can beat the compiler every time.
Another problem with the "Proof by Example" myth is the fact that
pro-compiler types will often use the output of several different
compilers to boost their arguments. That is, given three or four
different algorithms/code sequences, they may run the code through
several different compilers and pick the best output. The problem with
this approach is that no single compiler implements everything in the
best possible fashion. Some compilers will excel in one area and
totally suck in another. If the output of three compilers is
complementary (i.e., they each excel where the other two fail), it is
possible to pick the best results from the different compilers and
present them as though they were outputs from a single compiler. This
tends to hide the fact that compilers often fail miserably at some
things that a human would handle automatically.
The argument for this policy is simply "Well, if existing compilers can
do all these good things seaparately, surely we can merge the best of
these compilers into a single product and have something really great."
This line of reasoning fails for three reasons-
(1) some optimizations are mutually exclusive. That is, if you perform
one type of optimization you cannot perform some other type of
optimization on the code. If the "best" example from one compiler uses
an optimization technique that is mutually exclusive with the "best"
example from a different compiler on a different problem, it may not be
possible to merge those two techniques into the same product.
(2) Don't forget that most compilers are commercial products. The
quality of the optimizer is often a trade secret and other vendors may
not be able to directly clone an optimization technique.
(3) Even if two optimizations are not mutually exclusive, putting the
two of them into the same program could produce difficult to maintain
code or severely impact the performance of the compiler.
Software engineers have been promising for 20 years now that compilers
would merge all known techniques into a single product and we've have
really great compilers someday soon. Compilers have gotten better, but they're still a long ways off from perfect.
Note that it is possible to disprove a theory with a single example
Therefore, if you want to claim that compilers can always produce
better code than humans, all I've got to provide is one example to the
contrary. Proving that compilers, on the average, produce better code
than an expert assembly language program is far more difficult.
Perhaps
the best indication of how well compilers in the future will operate is
in the past. By looking at how well compilers have improved their code
generation capabilities over the past several decades, we can
anticipate how much better they will get over the next decade.
In the late 70's and early 80's there was a flurry of activity with
respect to the production of optimization compilers. The result was
quite impressive. In ten years, the code from compilers for a give
language doubled, tripled, or improved by an even greater percentage.
Coincidentally, it was during this time that "Software Engineering"
came of age and people began to move away from assembly language
because compilers promised high performance with less work.
Unfortunately, compilers in the later 80's and early 90's failed to
produce the dramatic improvements seen in the late 70's and early 80's.
Indeed, most major performance improvements during this time period
came from architectural improvements to the CPU rather than any great
advance in compiler technology. Whereas performance gains in the
100-500% area were common with the first way of microprocessor
compilers, the improvments dropped well below 100% in the second wave
of products (late 80's and early 90's). Today, compiler writers are
scratching in the dirt to get gains of 15-30%. Computer architects
aren't doing much better. Compiler writing is a fairly mature science
at this point. It is very unlikely (short of someone proving that P=NP)
that we will ever again see impressive gains in compiler technology
with respect to raw performance improvement.
Therefore, extrapolating the past performance of compiler writers to
predict how much faster the code will run that compilers will produce
ten years from now is very dangerous. Unless there is a radical shift
in computer architectures that favors HLLs at the expense of assembly
language, it is unlikely the performance gap between good HLL programs
and good assembly language programs will become much narrower. Indeed,
the only real thing left to do is to consolidate as many optimizations
as possible into a single compiler (we are a long ways off from this
today). This will probably improve performance by another 50% on the
average.
As
I mentioned in the previous section, most of the big perfomance gains
over the past 20 years have been due to architectural improvements, not
to compiler improvements. The mere fact that we've gone from a 5MHz
8088 to a 200MHz Pentium Pro in a high-end PC in 15 years has a lot
more to do with the speed of software today than with the quality of
compilers. While certain technologies, such as RISC, have closed the
gap between human-based machine code output and compiler-based machine
code output, the performance boost by compilers pales in comparison
that that provided by the newer hardware.
Another
problem with contributors to "The Great Debate" is the limited exposure
many people have. If you get involved in a thread arguing the relative
merits of assembly language vs. C, you will often find the pro-HLL
types leading the charge are UNIX programmers. Now I don't want to
pigeon-hole all UNIX programmers, but the types I've seen making the
argument against assembly language have very little experience outside
the UNIX (or mainframe) O/S arena. I think that one could make a very
good case that assembly language is a bad thing to use under UNIX. Does
that mean assembly language isn't useful elsewhere? Gee, some
programmers wearing UNIX blinders sure seem to think so.
Before you start coming up with reasons why assembly language is not a
practical tool, make sure you state the domain in which you operate.
Claiming "Code doesn't really need to get any faster" or "We don't need
to worry about saving memory" are fine arguments when you're working on
a 500 MHz DEC Alpha with 1 GByte main memory installed. Are the claims
you're making for your environment going to apply to the engineer
trying to convince a Barbie doll that it should talk using a $0.50
microcomputer system? Keep in mind, it's the C/C++ (and other HLL)
programmers arguing that you should never have to use assembly. The
assembly programmers never (okay, rarely) argue that you should always
use assembly[1].
It is very difficult to defend a term like "never". It is very easy to
defend a term like "sometimes" or "occasionally." Just because you've
never been forced to use assembly language in order to achieve some
goal doesn't mean it is always possible to avoid assembly. Be careful
about those blinders you're wearing when arguing against assembly.
Okay,
it seems like a stupid question. Obviously any code written in assembly
language is going to have a difficult time running on a different
processor (it may not even run efficiently on a processor that is a
member of the processor family for which the original code was
written). Worse still, you will have to learn several different
assembly languages in order to move your code amongst processors. While
learning a second or third assembly language is much easier than
learning your first, learning all the idiosyncrases that you must know
to write fast code still requires quite a bit of work. So it seems that
porting code involving assembly language is not a brillant idea.
On the other hand, Software Engineering Researchers typically point out
that coding represents only about 30% of the software development
effort. Even if your program were written in 100% pure assembly
language, one would expect that it would require no more than 40% of
the original effort to completely port the code to a new processor (the
extra 10% provides some time to handle bugs introduced by typos, etc.).
Perhaps you're thinking 40% is pretty bad. Keep in mind, however, that
porting C/C++ code doesn't take zero effort; particularly if you switch
operating systems while porting your code. If you're the careful type,
who constantly reviews their code to ensure it's portable, you're
simply paying this price during initial development rather than during
a porting phase (and there is a cost to carefully writing portable
code). I am not trying to say that it is as easy to port assembly code
as it is to port C/C++ code, I'm only saying that the difference isn't
as great as it seems. This is especially true when porting code between
operating systems that have different APIs (e.g., porting between
flavors of UNIX is easy; now try UNIX -> Windows -> Macintosh
-> OS/400 -> MVS -> etc).
Is
assembly language easier to read than HLL code? Being an expert
assembly language programmer and a fairly accomplished C programmer, I
find my own assembly language programs only slightly more difficult to
read than my own C programs. On the other hand, I generally take great
pains to structure my source code so that it is fairly easy to read
(take a look at my code on this web site). I will say this - I've seen
some assembly code out there that is absolutely unreadable. Of course,
I've also seen my share of C/C++ code that looks like an explosion in
an alphabet soup factory.
Of course, only the person doing
the reading can really make this judgement call. Obviously, if you know
assembly but don't know C/C++, you'll find assembly is easier to read.
The reverse is also true. I happen to know both really well and I find
a well-written C/C++ program a little easier to read than an assembly
language program. Poorly written examples in both languages are so bad
they are incomparable. Once a program is unreadable, it is difficult to
determine how unreadable it is.
Quick quiz: What does the following C statement do and how long did it take you to figure this out?
*(++s) && *(++s) && *(++s) && *(++s);
Most people (who know 80x86 assembly) would find the corresponding 8086 code much more precise and readable:
mov bx, s mov al, 0 inc bx cmp al, [bx] jz Done inc bx cmp al, [bx] jz Done inc bx cmp al, [bx] jz Done inc bx Done:
This
notion exists because people tend to save assembly language programming
for the very time critical (and often complex) components of their
program. Obviously if you've spent a lot of time and effort arranging
the instructions in a certain sequence to ensure the pipeline never
stalls, and then you discover that you need to modify the computation
that is going on, the new changes will introduce a lot of work since
you will have to reschedule each of the instructions.
Of
course, it never occurs to people that similar low-level optimizations
that occur in HLL programs are very difficult to maintain as well.
Consider the well-written (from a performance point of view) Berkeley
string routines. These routines need to be completely redone if you
move from a 32-bit processor to a 16-bit processor or a 64-bit
processor.
As a general rule, any code that is optimized is difficult to maintain.
This has led to the proverb "Early optimization is the root of all
evil." People perceive that it is difficult to maintain assembly code
mainly because the assembly code they've had to deal with is generally
optimized code.
What if we don't go in and pull out every unnecessary cycle out of a
section of assembly code? Will the code be easier to maintain? Sure.
For the same reason non-optimal C code is easy (?) to maintain.
Of course, one of the primary reasons for using assembly language is to
reduce the use of system resources (i.e., to optimize one's program).
Therefore, when using assembly language in place of a HLL, you're
typically going to be dealing with hard to maintain code. Don't forget
one thing, however, had you chosen to continue using a HLL rather than
dropping down into assembly language, the optimization that would have
been necessary in the HLL would have produced hard to maintain HLL
code. Keep in mind, optimization is the root of the problem, not simply
the choice of assembly language.
Is
assembly language easier to write than HLL code? Sometimes. There are
certain algorithms that, believe it or not, are easier to understand
and implement at a very low level. Bit manipulation is one area where
this is true. Also see the section on floating point arithmetic later
in this document for more details.
I
personally don't know not having really learned assembly language on a
RISC chip. I have certainly heard of individuals who have written some
butt-kicking code in assembly on a RISC, but this is generally
third-hand knowledge. I do know this, though. One of the design
principles behind the original RISC design was to study the
instructions a typical compiler would use and throw out all the other
instructions in a typical CISC instruction set. This suggests that an
assembly language programmer has less to work with on RISC chips than
on CISC machines. Nevertheless, I will not comment on this subject
since I don't have any first hand experience. I invite those who have
mastered RISC assembly to write a guest essay for this series.
That depends entirely on the algorithm. Generally, algorithms will fall into one of four categories:
1) Horrible solution in assembly, horrible solution in some HLL.
2) Horrible solution in assembly, elegant solution in some HLL.
3) Elegant solution in assembly, horrible solution in most HLLs.
4) Elegant solution in assembly, elegant solution in some HLL.
Show me your algorithm, I'll tell you which category I think it belongs in.
Although
the number of people who know assembly language increases daily (faster
than programmers are dying or forget assembly language), the number of
people who know a given HLL is generally increasing much faster. While
this says something bad about assembly language, what it has to do with
the question "Will Compilers Ever Produce Better Code than a Human?" is
an interesting question in its own right.
Probably
not. This is one big advantage compilers have. If you get a new
compiler for a later chip in a CPU family, all you've got to do is
recompile your code to take advantage of the new architecture. On the
other hand, your hand-written assembly code will need some manual
changes to take advantage of architectural changes in the CPU. This
fact alone has driven many to condem writing code in assembly. After
all, today's super-fast program may run like a dog on tomorrow's
architecture. This argument, however, depends upon two fallacies-
(1) Tomorrow's compilers will also take advantage of these architectural features.
(2) The assembly language program used architectural features on
today's chips that cause performance losses on tomorrow's chip.
Historically, compilers for the x86 architecture have lagged
architectural advances by one or two generations. For example, about
the time the Pentium Pro arrived, we were starting to see true 80486
optimizations in compilers. True, many compilers claim to support
"Pentium" optmizations. However, such compilers do very little for real
programs. Given past support from compiler vendors, coupled with the
fact that the trend is to handle really tedious (e.g., instruction
scheduling) optimizations directly in the hardware, I personally feel
that worrying about a specific member of a CPU family will become a
moot point.
Those claiming that hand written assembly language is inferior because
the next member of a CPU family will render the code obsolete are
missing the whole point of assembly optimization. Except in extreme
cases, assembly language programmers rarely optimize at the level of
counting cycles or scheduling instructions (as the pro-compiler crowd
points out, this is really too tedious a task for human beings).
Assembly language programmers achieve their performance gains by
typically using "medium-level" optimization that are CPU family
dependent, but usually independent of the specific CPU. This is such an
important concept that I will devote a completely separate essay in
this series to this subject.
Further essays in this series will address the question "Is there a
true need to use assembly language?" The "Compilers can generate code
that is just as good as humans." is one (albeit incorrect) negative
answer to this question. In the following essays I will attempt to
answer this question in the positive sense.