Abstractions, Their Algorithms, and Their Compilers

67
Abstractions, Their Algorithms, and Their Compilers


By Alfred Aho, Jeffrey Ullman



Communications of the ACM,
February 2022,
Vol. 65 No. 2, Pages 76-91


10.1145/3490685


Comments

gears on binary background

Computational thinking, which centers around devising abstractions for problems so they can be solved with computational steps and efficient algorithms, is a concept that serves not only computer science (CS) but progressively more of science and everyday life. After Jeannette Wing published her influential paper on computational thinking,50 the discussion of computational thinking has greatly expanded in scale to include topics such as modeling natural processes as information processing.18

At the heart of computational thinking is abstraction, the primary subject of this essay. While abstractions have always been an essential part of computer science, the modern emphasis on computational thinking highlights their importance when teaching CS to a broad audience.

Abstractions play an important role in shaping virtually every field of science. However, in computer science, abstractions are not tethered to a physical reality, so we find useful abstractions pervading the field. For example, we shall encounter in Section 4 a vital abstraction from physics: quantum mechanics. There is a derived computing abstraction called quantum circuits, which starts with the physical ideas, but has led to programming languages for its simulation, to theoretical algorithms that exploit its unique capabilities and that one day may be implemented on a large-scale machine.

Every abstraction in computer science consists of the following:

  1. A data model, which is one or more types of data and possibly relationships among them. For instance, an undirected graph could be described as consisting of a set of nodes and a set of edges; each edge is a set of two nodes.
  2. Away of manipulating that data—that is, a programming language of some sort. This might be a conventional programming language, or it might be as simple as a few specific operations. This language always has a formal semantics—a specification for how programs affect the data.

Notice it is the second part that distinguishes abstractions in computer science from abstractions in other fields.

Each abstraction thus allows us to design algorithms to manipulate data in certain specific ways. We want to design “good” abstractions, where the goodness of an abstraction is multidimensional. The ease with which an abstraction can be used to design solutions is one important metric. For example, we shall discuss in Section 3.1 how the relational model led to the proliferation in the use of databases. There are other performance metrics, such as the running time, on serial or parallel machines, of the resulting algorithms. Likewise, we favor abstractions that are easily implemented and that make it easy to create solutions to important problems. Finally, some abstractions offer a simple way to measure the efficiency of an algorithm (as we can find “big-oh” estimates of the running time of programs in a conventional programming language), while other abstractions require that we specify an implementation at a lower level before we can discuss algorithm efficiency, even approximately.

1.1. Compilation. Some abstractions are at a sufficiently high level that they do not offer meaningful performance metrics. Thus, the operations of a high-level abstraction may need to be implemented at a lower level—a level closer to what we think of as program execution. Indeed, there may be several levels of abstraction at levels that are progressively closer to the machine itself. As suggested in Figure 1, the operations of a high-level abstraction (Abstraction 1) may be implemented by a lower-level abstraction (Abstraction 2), which in turn may be implemented by still-lower-level abstractions (not shown). There are interesting hierarchies of abstraction that take us from high-level programs to machine instructions, to physical hardware, to logical gates, to transistors, and finally to electrons. We shall, however, concentrate only on the higher levels.

f1.jpg


Figure 1. Layers of abstractions and algorithms.

An algorithm using the operations of Abstraction 1 is compiled into an algorithm in the lower-level Abstraction 2. In this paper, we shall use the term compiler in a very general sense—not just the conventional compiler of programming languages as was the focus of the “Dragon book”9 but any algorithm for turning programs of one abstraction into programs of a second, presumably lower-level abstraction. Thus, in some cases, the compilation process is straightforward, with each operation at the higher level replaced by one or more specific operations at the lower level. In other cases, especially when the compilation is from a conventional language (think C) to a machine-level language, the translation algorithm is very complex. In still other situations, such as when the high-level abstraction uses powerful algebraic operations such as linear algebra or relational algebra, optimization is critical, since naive compilations often result in algorithms that take orders of magnitude more time than those generated by optimizing compilations.

It may be that Abstraction 2 is sufficiently close to the machine that it has meaningful performance metrics. If so, Abstraction 1 can inherit those metrics to provide a notion of quality for algorithms written in Abstraction 1. However, we should remember that the high-level abstraction usually can be implemented in several different lower-level abstractions. Each of these abstractions may provide a radically different notion of running time or other metrics and thus imply different notions of algorithm goodness at the high level.

EXAMPLE 1.1. For an example that should be familiar to many readers, and which we shall discuss in detail in Section 2.1, Abstraction 1 could be regular expressions (REs), whose data model is sets of strings and whose “programming language” is the regular expressions that can be formed from the three operations: union, concatenation, and Kleene star. We can implement regular expressions by converting them to deterministic finite automata (DFA), which thus play the role of Abstraction 2. An algorithm in the RE abstraction is the RE itself. That algorithm can be “compiled” into a DFA, which in turn is “compiled” (that is, simulated) in some conventional programming language, which in turn is compiled into the language of the machine. Given a reasonable simulation, the DFA running time is O(1) per input symbol. Thus, the running time for the original RE is also O(1) per input symbol, even though that point is far from obvious given only the definition of REs.

1.2. A taxonomy for abstractions. We can identify at least four different kinds of abstractions that are distinguished by their intended purpose. In the discussions that form the body of this paper, we shall give examples of each and explore their interactions.

1.2.1. Fundamental abstractions. Like all abstractions, these consist of a data model and operations. These abstractions are often thought of as abstract data types31,46 as implemented in object-oriented programming languages. However, fundamental abstractions do not have a specific implementation for the operations or a specific data structure to represent their data. One can also liken these abstractions to interfaces as in Java,48 but unlike the interface, these abstractions have an intended meaning for their operations, not just a name for the operations.

There are actually two somewhat distinct purposes for studying fundamental abstractions. In some cases, they represent common operations that are worth studying in their own right and which have a variety of approaches to implementation. For instance, we discuss the dictionary (a set with operations insert, delete, and lookup) in Section 1.4. Other examples of this type include stacks, queues, priority queues, and a number of other abstractions that we have cataloged in Aho et al.7 and Aho and Ullman.10

Other abstractions are extensive enough to support large components of application programs. Familiar examples include trees and graphs of various kinds—for instance, directed, undirected, labeled, and unlabeled. We shall discuss one important example—flow graphs—in Section 2.3.

These abstractions have an extensive set of operations that can be combined in various ways. However, the operations are not Turing-complete (capable of describing any computation that can be performed by a Turing machine) by themselves. Rather, they are presumed to be embedded in a Turing-complete language, in which algorithms using this model are constructed. For example, in a graph abstraction, we might have an operation such as “find adjacent nodes.” Outside this abstraction, we might suppose that there is a programming language that allows iteration over all the adjacent nodes. The implementation of this operation and the representation of the graph itself are left unspecified, so we have no concrete notion of running time. We can draw an analogy between these abstractions and the typical class and its methods in an object-oriented programming language. The distinction is that the methods for a class have specific implementations in the underlying programming language. We could similarly regard things such as programming-language libraries or TeX packages as abstractions of this type.

1.2.2. Abstract implementations. These represent approaches to implementation, presumably implementation of one or more fundamental abstractions. These are not themselves Turing-complete languages, and they typically can be compiled into several different machine models—for example, serial or parallel machines, or models that assume main or secondary memory. Each of these machine models provides a notion of running time, which can be translated into a running time for the abstract implementation and then into running time for the supported fundamental abstraction. For example, we shall discuss the hash table as an implementation of the dictionary in Section 1.6. Many other data structures fall into this category, such as linked lists or trees of various types (see Sections 1.5 and 1.7.2). Also in this group are automata, such as deterministic or nondeterministic finite automata (see Sections 2.1.1 and 2.1.3) and shift-reduce parsers (see Section 2.2.1).

1.2.3. Declarative abstractions. One of the most important uses of abstractions is to foster a style of programming where you say what you want done, but not how to do it. Thus, we find many different abstractions that consist of a data model and a programming language that is at a higher level than that of conventional languages; often these languages are algebras of some sort. Examples are regular expressions, to be discussed in Section 2.1, and relational algebra, which we mention in Section 3. Context-free grammars (Section 2.2) are another example of this type of abstraction, although not strictly algebraic.

The special characteristic of these abstractions is that their compilation requires a serious degree of optimization. Unlike optimization of conventional languages, where you are happy to speed up running time on one machine by a factor of two, for these abstractions there can be orders of magnitude difference between the running times of good and bad implementations. Another characteristic is that the programming language for a declarative abstraction is not Turing-complete. Undecidability properties of any Turing-complete language would preclude the existence of an optimizer that deals effectively and generally with what a program wants to do without being told how to do it.

1.2.4. Computational abstractions. In contrast to the abstract implementations, these abstractions are close to a physically implemented machine. That is, no one would build a machine for the sole purpose of implementing an abstract implementation, but one normally implements a computational abstraction or something to which it is easily translated. Thus, computational abstractions offer meaningful performance metrics, even if they are not 100% accurate.

You are probably familiar with a number of these abstractions, since they include all the common programming languages as well as machine instruction sets. Other abstractions of this type are more theoretical, such as the random-access machine (RAM) model19 or the parallel-RAM (PRAM) model.20 Here, we shall talk in Section 1.7 of a model of a conventional machine that emphasizes the role of secondary storage. We shall also discuss abstractions for parallel computation: bulk-synchronous in Section 3.5 and MapReduce in Section 3.6.

While many computational abstractions relate to conventional computers, there are some exceptions. The Turing machine43 is one, and there are others that are not even Turing-complete, yet play an important role in computer science. For instance, following Claude Shannon’s MS thesis, Boolean circuits and Boolean algebra are among the earliest abstractions used in the developing science of computing. And the quantum circuit abstraction, which we discuss in Section 4, is among the newest.

1.3. Some taxonomic comments. We do not claim that this taxonomy is exhaustive or that it is the best possible taxonomy. The reader is invited to add to or rethink this structure. However, we do believe that this organization reflects four of the most important developments in the field of computer science. Fundamental abstractions are, in a sense, dictated to us by the requirements of applications. Abstract implementations, on the other hand, represent the response of computer scientists to support the fundamental abstractions, and they represent a lot of the earliest thinking about data structures. Declarative abstractions represent the tendency of computer science to make programming progressively easier by supporting notations for encapsulating large, complex operations as single steps. And computational abstractions represent the influence of changing hardware technology on how we compute.

1.4. An exploration of the abstraction space. To gain some perspective on the nature of abstraction chains and their relationships, we shall look at a common example of a fundamental abstraction: the dictionary. We then explore the implementation levels below that and their effect on running time evaluations.

EXAMPLE 1.2. The dictionary is a common example of an abstraction that has many alternative implementations and illustrates some of the issues that crop up as high-level abstractions are compiled down into lower-level ones. The data model for a dictionary consists of the following:

  1. A universal set U.
  2. A subset S of U. Initially, S is empty.

The “programming language” for the dictionary consists of straight-line sequences of the following three operations:

  1. insert (x) inserts element x of U into the set S if it is not already there; that is, S: = S ∪ {x}.
  2. delete (x) removes x from S; S:= S – {x}.
  3. lookup (x) returns true if x is in S and returns false otherwise.

For instance, a dictionary can be used to describe the behavior of a symbol table in a compiler. U would be the set of possible identifiers of the programming language. As the compiler scans the program, S will be the set of identifiers that have defined meaning at each point in the program. However, for a symbol table, you need to attach data to each identifier—for example, its defined data type and the level of nested blocks in which it appears (so we can distinguish identifiers with the same name). When the compiler finds a declaration, it inserts the declared identifier into the set S. When it reaches the end of a procedure or function, it deletes the identifiers associated with that program block. When an identifier is used in the program, the compiler looks up that identifier and retrieves its type and other necessary information.

Notice that the programming language for the dictionary is rather trivial, and it certainly does not have the power of a Turing machine. Moreover, there is no real notion of algorithm design, since the “programs” are simply a reflection of what some other process is doing—for instance, the symbol table operations mentioned in Example 1.2. Likewise, there is no real notion of running time, since it is unclear how long each operation takes. We could define each operation to take unit time, but since we cannot control the length of the “program,” there is no meaning to this running time.

1.5. Implementations of the dictionary. Many different abstract implementations could be used to implement a dictionary. Linked lists will serve, although they do not offer a good running time, even for the most efficient implementation of lists.

EXAMPLE 1.3. The linked list is an abstract implementation with which all should be familiar. Its data model consists of the following:

  1. Cells containing a value (member of some universal set U) and a link to another cell (possibly null).
  2. Headers, which are simply named links (possibly null) to a cell.

We shall assume the reader is familiar with the typical operations allowed, such as creating cells or headers, inserting and deleting cells from lists, and returning the data contained in a designated cell. You can implement a dictionary by making a linked list of all the elements in the set S. Compilation of the three dictionary operations into list operations is straightforward.

If we assume the linked list is implemented in the RAM model of a computer, we have a realistic notion of running time. We can assign one time unit to each of the basic operations on the cells of a list, as we know that on a RAM, each of these operations will take a constant amount of time, independent of the size of the set S. That observation lets us lift the RAM notion of running time to the linked list notion of running time and then to the level of the dictionary. Unfortunately, the news is not good. On average, we have to go at least halfway down the list, often all the way to the end, to implement any of the dictionary operations. Thus, the running time for a single dictionary operation is proportional to the size of the set S at the time.

Another well-understood class of abstractions which implements a dictionary uses search trees. When the algorithms for the three dictionary operations keep the tree balanced—for instance, AVL trees2 or red-black trees22—the running time for each operation is logarithmic in the size of the set S at the time of the operation. But the normally preferred abstraction for implementing the dictionary is the hash table,36 which we shall consider next.

1.6. The Hash abstraction. The data model for Hash consists of the following:

  1. A universal set U.
  2. A number of buckets B; the buckets are numbered from 0 to B – 1.
  3. A hash function h from U to {0,1,…,B – 1}. Each bucket b is a subset of those elements x of U such that h(x) = b.

The usual operations are to compute h(x)—where x is a member of U—and to insert, delete, or lookup x in the bucket numbered h(x). For instance, the insert operation for the hash table will be denoted h-insert (x, b), where b = h(x). Programs for hash consist of alternatingly computing some h(x) and then doing one of the three operations on x and the bucket h(x).

Compiling a dictionary program into a hash program is simple. For example, the dictionary operation insert (x) is translated into b: = h (x); h-insert (x, b).

Hash is not sufficiently close to the machine that we can use it directly to determine running time. One problem is that hashing is rather unusual, in that the worst-case performance, where all elements of the set S wind up in the same bucket, is wildly worse than the average case, when we average over all possible hash functions. To make things simple, we shall correctly assume that in the average case, almost all buckets contain close to the average number of elements—that is, S/B. But even after we agree to talk only of the average case, we still do not know how long each operation on an element and a bucket takes.

Essentially, each bucket is a little dictionary by itself, so we have to decide how to implement its operations. If the size of S remains on the order of B, we can use a linked list implementation of buckets and expect that each operation takes O(1) average time on a RAM or a real machine. If, however, S is much larger than B, then the average length of the list that represents a bucket is O (S/B). That is still better than O (S) per operation as would be the case for a linked list. However, when S is so large it cannot fit in main memory, the RAM model no longer applies, and we need to consider another computational abstraction entirely.

1.7. The secondary storage abstraction. As an alternative to the RAM computational abstraction, where any piece of data can be accessed in O(1) time, we can introduce locality of access at several levels. We shall discuss only an abstraction that has disk-based secondary memory, where large blocks of data (think 64KB) are moved as a whole between disk and the main memory. Data must be in main memory if it is to be read or written. The cost of moving a block between main and secondary memory is very large compared with the cost of the typical operations that would be done on the data itself when it is in main memory. Thus, it becomes reasonable to regard running time in this new model as simply the number of disk I/Os—that is, the number of times a block is moved from secondary to main memory or vice versa.a

In a secondary storage model of the underlying machine, the best way to implement a hash table is somewhat different from the preferred approach using a RAM model. In particular, each bucket will consist of one or more entire disk blocks. To take advantage of locality, we want the typical bucket to consist of as few disk blocks as possible, but we want to make those disk blocks as full as we can. Thus, suppose main memory is capable of holding M elements of the universal set, while disk blocks hold P such elements. Then we want B, the number of buckets, to be B = M/P, so we can hold one disk block for each bucket in main memory, and that block will likely be close to full.

As the size of the set S grows, we use a linked list of disk blocks to represent each bucket, only the first of which is in main memory. The three dictionary operations at worst require us to examine all the disk blocks in a single bucket. So, on average, we expect the number of disk I/Os per operation to be O(S/BP), since the elements of S will be divided approximately evenly among B buckets, and the elements of a bucket will be packed P to a disk block. Since B = M/P, the running time per operation is OS/M).

1.7.1. Database operations via hashing. Is this running time impressive? Not really. Had we used a linked list implementation but allowed the lists to grow randomly onto disk, we could have used M buckets, and the average bucket would be represented by a linked list of length S/M. Even if each cell on the list were in a different disk block, we would still need only O(S/M) disk I/Os per dictionary operation. However, there is an important class of operations from the world of database management where the disk block model yields important efficiencies. In these operations, of which hash-join25 is a major example, we have only to divide a set S into buckets, while neither deleting nor looking up elements.

EXAMPLE 1.4. Let us consider a simpler problem than hash-join, but one that has a similar solution in the disk-based abstraction. Suppose we are given a set S of pairs (x, y), and we want to aggregate on x by computing a table that for each x gives the sum of the associated y’s. We assume S is so large that it does not fit in main memory. As above, we shall use B = M/P buckets, but our hash function is only a function of x, not y. We then insert each member of S into the proper bucket. Initially, the block for each bucket that is in main memory is empty. When it fills up, we move it to disk and create a new, empty block in main memory for that bucket. The new block links to the block we just moved to disk, which in turn links to prior blocks for that bucket, if any. The number of disk I/Os required by the entire partition of S will be approximately the number of blocks needed to hold S—that is, S/P. That is the minimum possible number of disk I/Os needed to process a set of that size.

Then, we process the buckets one at a time. Assuming that the buckets themselves are small enough to fit in main memory—that is, SM2/Pwe only need to move each disk block once into main memory. Since a bucket contains either all or none of the pairs (x, y) with a given x, we can sort each bucket in turn, on the values of x and thus compute the sum of y’s for each of the x’s that hash to that bucket. Since in this model we only count disk I/Os, and in fact the time to sort a bucket is often less than the cost of moving its blocks into or out of main memory, the running time of this algorithm, which is O(S/P) disk I/Os, is realistic and, to within a small constant factor, the best imaginable.

1.7.2. B-trees. Another interesting consequence of the secondary storage abstraction is that it provides an implementation of the dictionary abstraction that is generally more efficient than hashing. In a RAM model of computation, even a balanced search tree is not generally considered more efficient than a hash table for implementing the dictionary. But in the secondary storage model, the analog of a balanced search tree is really quite efficient. A B-tree13 is essentially a balanced search tree, where the nodes of the tree are entire disk blocks. Each leaf block is between half and entirely full of members of the set S being represented. Further, each interior node is at least half-full of values that separate the contents of the subtrees for its children and pointers to those children.

Since disk blocks are large, each interior node has many children, so much so that in practice even very large sets can be stored in a three-level B-tree. Moreover, at least the root, and perhaps the children of the root, can all be stored in main memory. The result is that, excluding occasional disk I/Os for rebalancing the tree, only two to four I/Os are needed for each insert or delete operation and only one to two I/Os for a lookup.

Back to Top

2. Abstractions for Compilers

Modern compilers refine the translation process into a composition of phases, where each phase translates one representation of the source program into another semantically equivalent representation, usually at a lower level of abstraction. The phases in a compiler typically include lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and target code generation. A symbol table that is shared by all the phases is used to collect and make available information about the various constructs in the source program. The first four phases are often called the front end of the compiler, and the last two the back end.

The story of advances in compiler implementation involves a number of important abstractions. We shall talk specifically about three such abstractions: regular expressions, context-free grammars, and flow graphs. The first two are declarative abstractions with interesting optimization stories. The third, while not declarative, also presents interesting implementation challenges.

2.1. Regular expressions and lexical analysis. Lexical analysis, the first phase of the compiler, reads the source program as a sequence of characters and maps it into a sequence of symbols, called tokens, which is passed on to the next phase, the syntax analyzer.

EXAMPLE 2.1. If the source program contained the statement

fahrenheit = centigrade * 1.8 + 32

the lexical analyzer might map this statement into the sequence of seven tokens:

<=> <*> <+>

Here, id is the token for any program variable, or identifier. The operators =, *, and + are tokens by themselves, and the two constants are turned into tokens real and int, respectively.b

One great advance in compiler construction was the creation of a lexical-analyzer generator—a program such as Lex30 that takes as input the description of the tokens and generates a program that breaks the source program into tokens and returns the sequence of tokens that corresponds to the source program. The abstraction that made Lex possible is the regular expression.26 As originally defined by Kleene, we can think of regular expressions as an abstraction whose data model is strings over some specific alphabet of characters—for example, ASCII—and whose programming language is an algebra with operators union, concatenation, and the Kleene star, or “any number of” operator. We assume the reader is familiar with these operators and their effect on sets of character strings. In practice, systems such as Lex, which use the regular-expression abstraction, employ a number of useful shorthands that make it easier to write regular expressions but do not change the sets of strings that can be defined in this abstraction.

EXAMPLE 2.2. The sets of strings that are legal identifiers in some programming languages might be defined as follows:

letter = [a-zA-Z]

digit = [0-9]

id = letter (letter+digit)*

In this shorthand, an expression like a – z stands for the union of the one-character strings with ASCII codes between a and z. So the regular expression for letter is, in the original set of three operators:

ueq01.gif

Digits are defined analogously, and then the set of strings for the token is defined to be those strings consisting of any letter followed by any string of zero or more letters and/or digits.

2.1.1. Before Lex: bibliographic search. It was well understood from theoretical studies that the regular expression abstraction could be compiled into one of several abstract implementations, such as deterministic or nondeterministic finite automata (NFA and DFA, respectively). However, there were still some techniques to be discovered when practical problems needed a solution. One interesting step was taken at Bell Laboratories in connection with the first attempt at automated search for relevant literature. They had on tape the titles of the entire Bell Labs library, and they had developed software to take a list of keywords and find the documents with those keywords. However, when given a long list of keywords, the search was slow because it made one pass over the tape for each keyword.

A big improvement was made in Aho and Corasick.6 Instead of searching for each keyword separately, the list of keywords was treated as a regular expression for the set of all strings that contained any occurrence of a keyword, that is:

ueq02.gif

Note that the dot is an extension that stands for “any character.” This expression was converted to a deterministic finite automaton. It was then possible to make a single pass over the tape, regardless of how many keywords were involved. Each title was examined once by the finite automaton to see if any of the keywords were found therein.

2.1.2. Design of a lexical-analyzer generator. In essence, a lexical-analyzer generator such as Lex uses the same idea as in Section 2.1.1. We write regular expressions for each of the tokens and then apply the union operator to these expressions. That expression is converted to a deterministic finite automaton, which is set to work starting at some point in the program. This automaton will read characters until it finds a prefix of the string that matches a token. It then removes the characters read from its input, adds that token to its output stream, and repeats the process.

There are a few additional considerations, since unlike keywords, there can be some complex interactions among tokens. For example, while looks like an identifier, but it is really a keyword used for control flow in a program. Thus, when seeing this sequence of characters, the lexical analyzer must return the token , not . In Lex, the order in which the regular expressions are listed in its input file breaks ambiguities such as this, so all you have to do is list your keywords before identifiers to make sure the keywords are treated as such, and not as identifiers. Another problem is that some tokens can be prefixes of another. We would not want to recognize < as a token if the next character in the input was =. Rather, we want to recognize <= as a token. To avoid such errors, the lexical analyzer is designed to keep reading as long as what it has seen so far is accepted by the finite automaton as a legal token.

2.1.3. Lazy evaluation of the DFA. There is one more optimization that can improve the running time of algorithms using the regular exp

NOW WITH OVER +8500 USERS. people can Join Knowasiak for free. Sign up on Knowasiak.com
Read More

Vanic
WRITTEN BY

Vanic

“Simplicity, patience, compassion.
These three are your greatest treasures.
Simple in actions and thoughts, you return to the source of being.
Patient with both friends and enemies,
you accord with the way things are.
Compassionate toward yourself,
you reconcile all beings in the world.”
― Lao Tzu, Tao Te Ching