Fifty Years of Databases

Thomas Haigh

December 11, 2012

Fifty years ago a small team working to automate the business processes of the General Electric Low Voltage Switch Gear Department in Philadelphia built the first functioning prototype of a database management system. The Integrated Data Store was designed by Charles W. Bachman, who later won the ACM’s Turing Award for the accomplishment. He was the first Turing Award winner without a Ph.D., the first with a background in engineering rather than science, and the first to spend his entire career in industry rather than academia.

The exact anniversary of IDS is hard to pin down. Detailed functional specifications for the system were complete by January 1962, and Bachman was presenting details of the planned system to GE customers by May of that year. It is less clear from archival materials when the system first ran, but Bachman’s own recent history of IDS suggests that a prototype was operational by the end of that year.

According to this May 1962 presentation, initial implementation of IDS was expected to be finished by December 1962, with several months of field testing and debugging to follow. Image courtesy of Charles Babbage Institute.

The technical details of IDS, Bachman’s life story, and the context in which it arose have all been explored elsewhere in some detail. He also founded a public company, played a leading role in formulating the OSI seven layer model for data communications, pioneered online transaction processing, and devised the first data modeling notation. Here I am focused on two specific questions:

(1) why do we view IDS as the first database management system, and

(2) what were its similarities and differences with later systems?

There will always be an element of subjectivity in such judgments about “firsts,” particularly as IDS predated the concept of a database management system and so cannot be compared against definitions from the time period. I have elsewhere explored the issue in more depth, stressing the way in which IDS built on early file management and report generation systems and the further evolution of database ideas over the next decade. As a fusty historian I value nuance and am skeptical of the idea that any important innovation can be fully understood by focusing on a single moment of invention.

However, if any system deserves the title of “first database management system” then it is clearly IDS. It served as a model for the earliest definitions of “data base management system” and included most of the core capabilities later associated with the concept.

What was IDS For?

Bachman was not, of course, inspired to create IDS as a contribution to the database research literature. For one thing there was no database research community. At the start of the 1960s computer science was beginning to emerge as an academic field, but its early stars focused on programming language design, theory of computation, numerical analysis, and operating system design. The phrase “data base” was just entering use but was not particularly well established and Bachman’s choice of “data store” would not have seemed any more or less familiar at the time. In contrast to this academic neglect, the efficient and flexible handling of large collections of structured data was the central challenge for what we would now call corporate information systems departments, and was then called business data processing.

During the early 1960s the hype and reality of business computing diverged dramatically. Consultants, visionaries, business school professors, and computer salespeople had all agreed that the best way to achieve real economic payback from computerization was to establish a “totally integrated management information system.” This would integrate and automate all the core operations of a business, ideally with advanced management reporting and simulation capabilities built right in. The latest and most expensive computers of the 1960s had new capabilities that seemed to open the door to a more aggressive approach. Compared to the machines of the 1950s they had relatively large memories, featured disk storage as well as tape drives, could process data more rapidly, and some had even been used to drive interactive terminals in specialized applications. Unfortunately the reality of data processing changed much more slowly, and remained focused on simple administrative applications that batch processed large files of records to accomplish discrete tasks such as weekly payroll processing, customer statement generation, or accounts payable reports.

Many companies announced their intention to build totally integrated management information systems, but few ever claimed significant success. A modern reader would not be shocked to learn that firms were unable to create systems of comparable scope to today’s Enterprise Resources Planning and data warehouse projects using computers with perhaps the equivalent of 64KB of memory, no real operating system, and a few megabytes of disk storage. Still, even partially integrated systems covering significant portions of a business with flexible reporting capabilities would have real value. The biggest challenges to even modest progress towards this goal were the sharing of data between applications and the effective use of random access disk storage by application programmers.

Data processing techniques had evolved directly from those used with pre-computer mechanical punched card machines. The concepts of files, fields, keys, grouping, merging data from two files, and the hierarchical combination of master and detail records within a single file all predated electronic computers. These worked with magnetic tape much as they had done with punched cards, except that sorting was actually much harder with tape. Getting a complex job done might involve dozens of small programs and the generation of many working tapes full of intermediate data. These banks of whirring tape drives provided computer centers with their main source of visual interest in the movies of the era. However the formats of tape files were very inflexible and were usually fixed by the code of the application programs working with the data. Every time a field was added or changed all the programs working with the file would need to be rewritten. But if applications were integrated, for example by having order records from the sales accounting system automatically exported as input for the production scheduling application, then the resulting web of dependencies would make it ever harder to carry out even minor changes in response to shifting business needs.

This 1962 diagram, drawn by Stanley Williams, sketched the complex dependencies between different records involved in the production planning process. Courtesy Charles Babbage Institute.

The other key challenge was making effective use of random access storage in business application programs. Sequential tape storage was conceptually simple, and the tape drives themselves provided some intelligence to aid programmers in reading or writing records. But the only really practical computer applications were batch-oriented because searching a tape to find or update a particular record was too slow to be practical. Instead, master files were periodically updated with accumulated data or read through to produce reports. The arrival in the early 1960s of disk storage a programmer theoretically made it possible to apply updates one at a time as new data came in, or to create interactive systems that could respond to requests immediately. A programmer could easily instruct the drive to pull data from any particular platter or track, but the hard part was figuring out where on the disk the desired record could be found. Harnessing the power of the new technology meant finding ways to order, insert, delete, or search for records that did not simply replicate the sequential techniques used with tape. Solutions such as indexing, inverted files, hashing, linked lists, chains and so on were quickly devised but these were relatively complex to implement and demanded expert judgment to select the best method for a particular task. In addition, application programmers were beginning to shift from assembly language to high level languages such as COBOL. Business oriented languages included high level support for working with structuring data in tape files but lacked comparable support for random access storage. Without significant disk file management support from the rudimentary operating systems of the era only elite programmers could hope to create of an efficient random access application.

This image, from a 1962 internal General Electric document, conveyed the idea of random access storage using a set of “pigeon holes” in which data could be placed. Courtesy of Charles W. Bachman.

IDS was intended to substantially solve these two problems, so that applications could be integrated to share data files and ordinary programmers could effectively develop random access applications using high level languages. Bachman designed it to meet the needs of an integrated systems project run as an experimental prototype within General Electric by the group of systems-minded specialists he was working for at its corporate headquarters. General Electric had many factories spread over its various divisions, and could not produce a different integrated system for each one. Furthermore it was entering the computer business, and recognized that a flexible and generic integrated system based on disk storage would be a powerful tool in selling its machines to other companies.

Was IDS a Data Base Management System?

IDS carried out what we still consider the core task of a database management system by interposing itself itself between application programs and the files in which they stored data. Programs could not manipulate data files directly, instead making calls to IDS so that it would perform the rested operation on their behalf.

Like modern database management systems, IDS explicitly stored and manipulated metadata about the records and their relationships, rather than expecting each application program to understand and respect the format of every data file it worked with. It could enforce relationships between different record types, and would protect database integrity. Database designers would specify indexes and other details of record organization to boost performance based on expected usage patterns. However the first versions it did not include a formal data manipulation language. Instead of being defined through textual commands the metadata was punched onto specially formatted input cards. A special command told IDS to read and apply this information.

IDS was designed to be used with high level programming languages. In the initial prototype version, operational in 1962, this was General Electric’s own GECOM language, though performance and memory concerns drove Bachman’s team to shift to assembly language for the application programming in a higher performance version completed in 1964. Part of IDS remained resident in memory while application programs were executed. Calls to IDS operations such as store, retrieve, modify, and delete were interpreted at runtime against the latest metadata and then executed. As high level languages matured and memory grew less scarce, later version of IDS were oriented towards application programs written in COBOL.

This provided a measure of what is now called data independence for programs. If a file was restructured to add fields or modify their length then the programs using it would continue to work properly. Files could be moved around and reorganized without rewriting application programs. That made running different application programs against the same database much more feasible. IDS also included its own system of paging data in and out of memory, to create a virtual memory capability transparent to the application programmer.

The concept of transactions is fundamental to modern database management systems. Programmers specify that a series of interconnected updates must take place together, so that if one fails or is undone they all are. IDS was also transaction oriented, though not in exactly the same sense. It took over the entire computer, which had only 8,000 words of memory. Bachman devised an innovative transaction processing system, which he called the Problem Controller. Bachman’s original version of IDS was not loaded by application programs when needed to handle a data operation. Instead IDS reversed the relationship: it ran when the computer booted and loaded application programs as needed. Only one application program ran at a time. Requests from users to run particular programs were read from “problem control cards” and buffered as IDS records. The computer worked its way through the queue of requests, updating it after each job was finished. By 1965 an improved version of this system was in use at Weyerhauser, on a computer hooked up to a national teletype network. Requests for reports and data submissions were inserted directly into the queue by remote users.

Bachman’s original prototypes lacked strong backup and recovery systems, key features of later database management systems, but this was added as early as 1964 when IDS was first being prepared as a package for distribution to General Electric’s customers. A recovery tape logged memory pages modified by each transaction, so that the database could be restored to a consistent state if something went wrong before the transaction was completed. The same tape served as an incremental backup of changes since the last full backup.

This first packaged version of IDS did lack some features later viewed as essential for database management systems. One was the idea that specific users could be granted or denied access to particular parts of the database. This was related to another limitation: IDS databases could be queried or modified only by writing and executing programs in which IDS calls were included. There was no interactive capability to request “ad hoc” reports or run one-off queries without having to write a program. Easy to use report generator systems (such as 9PAC and MARK IV) and online interactive data management systems (such as TDMS) were created during the 1960s but they were generally seen as a separate class of software from data base management systems. (By the 1970s these packages were still popular, but included optional modules to interface with data stored in database management systems).

IDS and CODASYL

After Bachman handed IDS over to a different team within General Electric in 1964 it was made available as a documented and supported software package for the company’s 200 series computers. Later versions supported its 400 and 600 series systems. New versions followed in the 1970s after Honeywell brought out General Electric’s computer business. IDS was a strong product, in many respects more advanced than IBM’s IMS which appeared several years later. However IBM machines dominated the industry so software from other manufacturers was doomed to relative obscurity whatever its merits. In those days software packages from computer manufactures were paid for by hardware sales and given to customers without an additional charge.

During the late 1960s the ideas Bachman created for IDS were taken up by the Database Task Group of CODASYL, a standards body for the data processing industry best known for its creation and promotion of the COBOL language. Its 1969 report drew heavily on IDS in defining a proposed standard for database management systems, in part thanks to Bachman’s own service on the committee. In retrospect, the committee’s work, and a related effort by CODASYL’s Systems Committee to evaluate existing systems within the new framework, were significant primarily for formulating and spreading the concept of a “data base management system.”

CODASYL’s definition of the architecture of a database management system and its core capabilities was quite close to that included in textbooks to this day. In particular, it suggested that a data base management system should support on-line, interactive applications as well as batch driven applications and have separate interfaces. COSASYL’s initial report, published in 1969, documented foundational concepts and vocabulary such as data definition language, data manipulation language, schemas, data independence, and program independence. It went beyond early versions of IDS by adding security features, including the idea of “privacy locks” and included “sub-schemas,” roughly equivalent to views in relational systems, so that different programs could work with specially presented subsets of the overall content of the database.

Although IBM itself refused to support the CODASYL approach, continuing to favor its own IMS with its simple hierarchical data model, many other computer vendors supported its recommendations and eventually produced systems incorporating these features. The most successful CODASYL system, IDMS, came from an independent software company and began as a port of IDS to IBM’s dominant System/360 mainframe platform.

The Legacy of IDS

IDS and CODASYL systems did not use the relational data model, formulated years later by Ted Codd, which underlies today’s dominant SQL database management systems. Instead it introduced what would later be called the “network data model.” This encoded relationships between different kinds of records as a graph, rather than the strict hierarchy enforced by tape systems and some other software packages of the 1960s such as IBM’s later and widely used IMS. The network data model was widely used during the 1970s and 1980s, and commercial database management systems based on this approach were among the most successful products of the mushrooming packaged software industry.

Bachman spoke memorably in his Turing Award lecture of the “Programmer as Navigator,” charting a path through the database from one record to another. The IDS approach required programmers to work with records one at a time. Performing the same operation on multiple records mean retrieving a retrieving a record, processing and if necessary updating it, and then moving on to the next record of interest to repeat the process. For some tasks this made programs longer and more cumbersome than the equivalent in a relational system, where a task such as deleting all records more than a year old or adding 10% to the sales price of every item could be performed with a single command. In addition, IDS and other network systems encoded what we now think of as the “joins” between different kinds of records as part of the database structure rather than specifying them in each query. This made IDS much less flexible than later relational systems, but also much simpler to implement and more efficient for routine operations.

This drawing, from the 1962 presentation “IDS: The Information Processing Machine We Need,” shows the use of chains to connect record. The programmer used GET commands to navigate between related records.

IDS was a useful and practical tool for business use in the early 1960s, while relational systems were not commercially available until the 1980s. Relational systems did not become feasible until computers were orders of magnitude more powerful than they had been in 1962 and some extremely challenging implementation issues had been overcome. Even after relational systems were commercialized the two approaches were seen for some time as complementary, with network systems used for high performance transaction processing systems handling routine operations on large numbers of records (for example credit card processing) and relational systems best suited for flexible “decision support” data crunching. Although IDMS is still in use for some few very large applications it, and other database management systems based on Bachman’s network data model, have long since been superseded for new applications and for mainstream computing needs.

Still, without IDS and Bachman’s tireless championing of the ideas it contained the very concept of a “database management system” might never have taken root in the first place. When database specialists look at IDS today it is easy to see its limitations compared to modern systems. Its strengths are easy to miss because its huge influence on the software industry meant that much of what was revolutionary about it in 1962 was soon taken for granted. IDS did more than any other single piece of software to broaden the range of business problems to which computers could usefully be applied and so to usher in today’s world where every administrative transaction involves a flurry of database queries and updates rather than the filing of forms completed in triplicate.

Thomas Haigh