Copyright (c) 2000, 2002 Simon Williams
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the copyright owner.
The right of Simon Williams to be identified as the author of this work has been asserted by him in accordance with the Copyright, Designs and Patents Act 1988.


PREFACE TO "THE ASSOCIATIVE MODEL OF DATA"
By Simon Williams


The relational model of data, invented by Ted Codd of IBM in the 1960s, is the standard database architecture in use today for mainstream transaction processing and information systems in industry, commerce and government. Relational databases store data in the form of tables (strictly, "relations") using a separate table for every different type of data. Each table is uniquely structured, so the programs that allow users to interact with the database must be built around the tables, and the structure of the data becomes hard-coded into the programs. This has two consequences. Firstly, each time we develop a new application, we have to create a whole new set of programs to fit the new tables. Secondly, each time we change the structure of the database to store a new type of information, all the programs that use new or altered tables have to be amended and retested.

          By contrast, databases that use the associative model of data store all different types of data together in a single, consistent structure that never changes, no matter how many types of data are stored. Information about the structure of the data and the rules that govern it is stored in the database alongside the data itself. This sets the scene for a programming technique called "omnicompetent programming", whereby the data structures and the rules that govern them are no longer hard-coded into the programs, but are obtained by the programs from the database itself. Omnicompetent programs can work with any associative database without being amended or recompiled in any respect whatsoever. Thus, for the first time, such programs are truly re-useable, and no longer need to be amended when the data structures change. This dramatically reduces the cost of application development and maintenance.

          Codd's seminal paper "A Relational Model of Data for Large Shared Data Banks" [4] begins with the words "Future users of large data banks must be protected from having to know how the data is organized in the machine". This was a critical step forward in allowing programmers to use their time more productively. But the demands on our limited supply of programming resource are now such that it is time for database architectures to move to the next level of abstraction. The aim of the relational model was to free programmers from having to know the physical structure of data; the aim of the associative model is to free them in addition from having to know its logical structure.

          Why is this such an important step? Notwithstanding the efficiency of modern software tools, the development and maintenance of database applications remains extremely labour-intensive, and the cost of building and deploying application software is unsustainably high. The problem is further compounded by the size and complexity of modern applications: SAP's R/3 ERP system comprises over 16,500 separate, uniquely structured tables [Source: SAP]. Such complexity comes at further expense: the implementation costs of major application packages like R/3 are typically between five and ten times the cost of the software itself.

          The high cost of software infrastructure is cause enough for concern and action in its own right, going as it does hand-in-hand with the difficulties experienced by small and medium-sized enterprises in gaining access to relevant application software. But the high cost of software is also ultimately damaging to competitiveness. Earlier in my career I spent some years developing and deploying custom-built applications for users of IBM's System/3X range of computers. The competitive edge that my customers gained from these applications convinced me that enterprises of every size and type can benefit directly and measurably from the use of software that is designed to meet their specific needs and to fit the way they work. Most pre-eminent enterprises become so not by conforming to established business models, but by radically reinventing them - would Amazon have achieved what it did if it had tried to run its web site on someone else's package? And yet, more and more companies today deploy the same mission-critical application software as their competitors, homogenising their operations and forgoing vital opportunities to gain competitive advantage. The notion that a successful business should change the way it works to fit an application package would have been anathema just a few years ago, but now such compromises are commonplace. The reason is clear: today, custom-built application software is simply beyond the economic reach of all but those with the deepest pockets.

          Why hasn't the application development industry responded to this challenge? The answer is that it has been constrained by the relational model of data. Programming languages have evolved through abstraction: machine code was abstracted into symbolic languages; symbolic languages were abstracted into third generation high level languages. Application development tool vendors such as Synon, which I founded in 1984, and others have sought to move to the next level of abstraction through fourth generation languages (4GLs) but these typically proved too complex for general usage, and modern object-oriented programming languages such as Java and C# embody no more abstraction than their non-OO predecessors. Higher levels of useful abstraction in programming languages can now be attained only through a higher level of abstraction in persistent data structures.

          The associative model of data embodies this new level of data abstraction. It frees programmers from the need to understand either the physical or the logical structure of data, and allows them to focus solely on the logic and processes by which users interact with databases, and databases interact with each other.


4. E. F. Codd: "A Relational Model of Data for Large Shared Data Banks", Communications of the ACM, Vol 13, No 6, June 1970.