The DOI Handbook
Home > DOI® Handbook > Table of Contents > 1 Introduction
 
 

Previous Chapter: Glossary    Next Chapter: 2 Numbering

1 Introduction

This chapter describes the DOI® Handbook and its updating process; explains the environment which leads to the need for the DOI® System, and outlines the components and use of the DOI System.

© International DOI Foundation 2006.

 
1.1 The DOI® Handbook
1.2 Identification and the Internet
1.3 What is an Identifier?
1.4 What is a DOI® Name?
      1.4.1 DOI names are persistent identifiers
      1.4.2 DOI names are actionable identifiers
      1.4.3 DOI names are interoperable identifiers
      1.4.4 Identifying at the appropriate level
      1.4.5 Identifying copies and versions
1.5 Components of the DOI System
1.6 What can be identified by a DOI name?
      1.6.1 Intellectual Property
      1.6.2 Identification of abstractions
      1.6.3 Formal definition
      1.6.4 Granularity
1.7 Benefits of the DOI System to Publishers, Intermediaries, and users
      1.7.1 Summary of benefits
      1.7.2 Benefits in internal content management
      1.7.3 Benefits in the distribution and sales life-cycle
      1.7.4 Benefits in the production life-cycle
      1.7.5 Quantified benefits: case studies
1.8 The DOI System as social infrastructure
1.9 The DOI System as a managed system

 

1.1 The DOI® Handbook

This Handbook is intended as:

  • a definitive reference to the DOI System for the non-technical reader;
  • a central point of reference for more complex technical content, through the appendices;
  • a means of providing updated information to be disseminated; through the release of new versions.

The Glossary of Terms defines selected terms unique to the DOI System, and other terms with meanings used in a specific way within the DOI System which are discussed in the Handbook.

Other introductory material may be found in:

The Handbook is regularly updated to reflect progress in DOI System development. The primary publication medium of the Handbook is the DOI.ORG web site; users working with print versions are advised to ensure that they are using the most up-to-date version by checking the version on the DOI.ORG Web site. Earlier versions of development documentation are superseded by any later edition.

The numbering system of versions follows the convention of edition.release.update (the most significant digit on the left). Minor changes such as typographical corrections with no substantive effect will be numbered as updates; more substantive changes as releases; major changes as editions. Criteria for numbering are pragmatic: the IDF's aim is to clearly distinguish new versions for users, especially when use of an earlier version may result in error.

Edition 1 of this Handbook was issued in February 2001. Edition 2 was issued in February 2002 and followed by several updated releases throughout 2002. Edition 3 was issued in May 2003 followed by several updated releases throughout 2003. This Edition 4 (first release April 2004) incorporates substantial additional material especially on the DOI® Data Model and DOI System Applications, related appendices and revision of all other chapters.

If you have any questions or suggestions relating to this Handbook, please let us know by contacting contact@doi.org; your input will help to improve future versions of the Handbook.

DOI® and DOI.ORG® are registered trademarks of the International DOI Foundation, Inc., filed with the U.S. Patent and Trademark Office, and granted registration numbers 2,360,527 and 2,360,526, respectively. The "doi>" is a trademark of the International DOI Foundation.

indecsTM is a registered trademark of the International DOI Foundation, Inc., and EDItEUR, filed with the U.K. Patent Office, and granted registration number 2257426.

The Handle System® is a registered trademark of the Corporation for National Research Initiatives, Inc. (U.S. registration number 6,135,646) and is used by permission.

© International DOI Foundation 2006. The contents of this Handbook are copyright of the International DOI Foundation, Inc.

1.2 Identification and the Internet

One of the key challenges in the move from physical to electronic distribution of content is the rapid evolution of a set of common technologies and procedures to identify and manage pieces of digital content. A widely implemented and well understood approach to naming digital objects is essential if we are to see the development of services that will enable content providers to grow and prosper in an era of increasingly sophisticated computer networking. The boundaries that currently exist between different types of content, especially at the level of the infrastructure that supports their production and distribution, will be broken down and ultimately eliminated. Instead of different physical formats requiring different content distribution infrastructures, all content will consist of streams of digital data moving over networks. Diverse content industries will increasingly find themselves sharing the same challenges and opportunities in delivering content to their customers, whether direct or through intermediaries.

"A developing trend that seems likely to continue in the future is an information centric view of the Internet that can live in parallel with the current communications centric view. Many of the concerns about intellectual property protection are difficult to deal with, not because of fundamental limits in the law, but rather by technological and perhaps management limitations in knowing how best to deal with these issues. A digital object infrastructure that makes information objects "first-class citizens" in the packetized "primordial soup" of the Internet is one step in that direction. In this scheme, the digital object is the conceptual elemental unit in the information view; it is interpretable (in principle) by all participating information systems. The digital object is thus an abstraction that may be implemented in various ways by different systems. It is a critical building block for interoperable and heterogeneous information systems. Each digital object has a unique and, if desired, persistent identifier that will allow it to be managed over time. This approach is highly relevant to the development of third-party value added information services in the Internet environment." (What Is The Internet (And What Makes It Work) -- Robert E. Kahn and Vinton G. Cerf, 1999).

The International DOI Foundation (IDF) was established in 1998 to address this challenge, assuming a leadership role in the development of a framework of infrastructure, policies and procedures to support the identification needs of providers of intellectual property in the multinational, multi-community environment of the network. The IDF has developed, and continues to evolve, a fully implemented solution to this challenge: the DOI System, using the DOI name, an "actionable identifier" for intellectual property on the Internet. The DOI System is now widely implemented by hundreds of organisations through millions of identified objects.

1.3 What is an Identifier?

For detailed information on concepts of identification and metadata, see the documents referenced in the bibliography. As the use of numbering in digital networks has developed, the use of the word "identifier" in this context has become expanded to the point where it is now used synonymously to cover several different things, all of which are useful but which actually carry different implications that need to be distinguished. It is not possible to compare two "identifiers" unless it is clear which of the following is implied by each:

(1) A single unambiguous string or "label" that references an entity (e.g. ISBN 0-19- 853737-9)

(2) A numbering scheme: a formal standard, an industry convention, or an arbitrary internal system providing a consistent syntax for generating a series of labels (identifiers (1)) denoting and distinguishing separate members of a class of entities (e.g. ISBN, or DOI® Syntax NISO Z39.84). The scheme is a specification for generating a number: this resulting "number" may include alphanumeric characters, but the accepted parlance is to speak of these as numbers (e.g. ISBN=International Standard Book Number). The intention is establishing a one-to-one correspondence between the members of a set of labels (numbers), and the members of the set counted and labelled. The product of the process is enumeration, a cardinality judgement, and assigned numbers for each cardinal member.The numbering scheme may or may not be accompanied by some policy apparatus -- for example, a registration agency and maintenance agency. An important point is that the resulting number is simply a label string. It does not of itself create a string that is actionable in a digital or physical environment without further steps being taken. It may be used (and probably will be used) in databases; or it may be incorporated into another mechanism later.

Common standard numbering schemes of interest in digital content management include those standardised by ISO:

  • ISBN: ISO 2108:1992 International Standard Book Numbering (ISBN)
  • ISSN: ISO 3297:1998 International Standard Serial Number (ISSN)
  • ISRC: ISO 3901:2001 International Standard Recording Code (ISRC)
  • ISRN: ISO 10444:1997 International Standard Technical Report Number (ISRN)
  • ISMN: ISO 10957:1993 International Standard Music Number (ISMN)
  • ISWC: ISO 15707:2001 International Standard Musical Work Code (ISWC)
  • ISAN: Draft ISO 15706: International Standard Audiovisual Number (ISAN)
  • V-ISAN: Draft ISO 20925: Version Identifier for audiovisual works (V-ISAN)
  • ISTC: Draft ISO 21047: International Standard Text Code (ISTC)

Whilst these ISO TC46 identifiers were originally simple numbering schemes, of late they have also begun to adopt the notion of associating some minimal structured descriptive metadata with the identifier. Also relevant are the ISO- affiliated NISO standards including:

  • ANSI/NISO Z39.84 The Digital Object Identifier

(3) An infrastructure specification: a syntax by which any identifier (1) can be expressed in a form suitable for use with a specific infrastructure, without necessarily specifying a working mechanism (e.g. URI). This is sometimes known as creating an "actionable identifier" -- meaning that in the context of that particular piece of infrastructure, the label can now be used to perform some action: e.g. in an Internet Web browser, it can be "clicked on" and some action takes place. The set of Internet specifications known as Uniform Resource Identifiers (embracing URLs and URNs) provides mechanisms for taking labels and specifying them as actionable within the Internet. The same principles apply in the physical environment -- for example by prefixing an ISBN with the EAN sequence 978 or 979, the ISBN becomes a UPC/EAN identifier expressible as a physical bar code symbol, or a radio-frequency tag, for use in the physical supply chain.

Importantly, note here that such "identifiers" do not mandate a way of creating labels, they merely accept any labels: hence if one does not have an existing numbering scheme, it will be necessary to adopt or create one in order to form URIs. A URI specification merely ensures that a label follows the rules to become actionable in an Internet environment: a specification is not an implementation, with all the other aspects that a fully functioning identifier system (see below) may require: URI may for example specify the syntax, and specify a recording registration procedure, but not create a managed environment (e.g. by which registrations are "policed"), or carry any specifications of metadata or policy. Some identifier specifications of this form may have limited rules or requirements for implementation: so far this is limited to the URN specification including a proposed (not implemented) mechanism for resolution. The acid test one should ask of such a specification is: what does specifying my label in this particular form get me, in practical terms, in a specific infrastructure?

(4) A system for implementing labels (identifiers (1)) through a numbering scheme (identifiers (2)) in an infrastructure using a specification (identifiers (3)) and management policies (e.g. DOI System). The DOI System is an "identifier system" in the digital supply chain, just as the UPC/EAN is an "identifier system" in the physical supply chain; ISBNs for example become implemented in the physical supply chain through UPC/EAN bar codes or RFID tags. This sense of "Identifier" denotes a fully implemented identification mechanism that includes the ability to incorporate labels, conforms to an infrastructure specification, and adds to these practical tools for implementation such as registration processes, structured interoperable metadata, and a policy/governance mechanism. Such a system is necessary for practical DRM applications; since DRM deals with digital entities, structured metadata will be an essential component of such a system. The DOI System is one of the better developed, with several million DOI names currently in use by several hundred organisations.

1.4 What is a DOI® name?

A DOI® (Digital Object Identifier) name is an identifier (not a location) for an entity on digital networks. It provides a system for persistent and actionable identification and interoperable exchange of managed information on digital networks. Unique identifiers are essential for the management of information in any digital environment.

It is an identifier in sense (4) above. One of the components is a syntax specification (identifier (2)). The DOI System conforms to a URI (identifier (3)) specification. It provides an extensible framework for managing intellectual content based on proven standards of digital object architecture and intellectual property management. It is an open system based on nonproprietary standards. It has the following notable features:

1.4.1 DOI names are persistent identifiers

A DOI name differs from commonly used Internet pointers to material such as the URL -- Uniform Resource Locator, the usual means of referring to World Wide Web material -- because it identifies an object as a first-class entity, not simply the place where the object is located. A first-class entity or object in the information infrastructure is stored on one or more servers and is accessible from these servers using a globally accessible identifier (URI). An entity is referred to as first class when it represents an object, not some attribute of an object; e.g. an address is an attribute of a thing, whereas the thing itself is a first class object. The DOI System is not solely designed for use on the World Wide Web; the same functionality can be made available through any digital network and protocol, but the Web demonstrates its advantages well.

1.4.2 DOI names are actionable identifiers

The purpose of the DOI System is to make the DOI name an actionable identifier: a user can use a DOI name to do something. The simplest action that a user can perform using a DOI name is to locate the entity that it identifies. In this respect, a DOI name may look superficially like a URL. However, the technology which underlies the DOI System facilitates much more complex applications than simple location; and the DOI name identifies the intellectual property entity itself rather than its location. The ease of assigning URLs was no doubt responsible in part for the expansion of the Web -- but the fact that they are easy to create (and neglect) means they are not strong enough alone for a commercial basis. "Not Found" link messages are a scourge across the Internet: the rate at which once-valid links start pointing at non-existent addresses -- a process called "link rot" -- is reported to be as high as one sixth of all links in six months. The fact that URLs change (technically, they are not "persistent") isn't a bad thing in itself: in fact, it is very helpful to separate names from locations -- since location is only one property of (or piece of metadata about) a name which we might want to manage by the process of resolution. We want to be able to move things around -- there are legitimate reasons such as change of ownership. The problem is that using URLs alone we can't track what's changed, or use one name persistently irrespective of where the item is.

This does not imply that the DOI name will necessarily resolve to the entity that it identifies -- although that will sometimes be the case. The DOI name, though, can be used to identify classes of intellectual property -- abstract "works", physical "manifestations", performances -- that cannot be directly accessed in a digital file. Even when the DOI name does identify a digital file, this will not always be the most appropriate or useful data for the DOI name to resolve to. Even if there is no current location for a digital file, it might still be useful to know what it represented, or who owned it, or search for it elsewhere. Even if we have a location, we might want to offer other resolution results. Therefore it is very important to distinguish what the DOI name identifies from what the DOI name resolves to. They may be the same thing, but they will often be very different.

The technology used to manage the resolution of DOI names is the Handle System®; a description can be found in Appendix 2. The Handle System is unlike most other resolution technologies in supporting multiple resolution. A DOI name may have multiple data values of different types associated with it (email addresses and URLs, for example), and multiple data values of the same type (several URLs). The same DOI name can resolve to different data, depending on the way in which the Handle System is queried. This enables the DOI name, and the metadata with which it is associated, to form the foundation for many different services relating to the management of intellectual property in the network environment, to the benefit of intellectual property owners and users alike.

In order for the DOI name to be resolved, the Registrant (or the Registration Agency he uses) needs to maintain the data associated with that DOI name in the Handle System; this data is referred to as "state data". The simplest form of state data is a single URL. However, a DOI name can resolve to many other forms of data.

1.4.3 DOI names are interoperable identifiers

The DOI System has been designed to interoperate with past, present and future technologies.

  • "Legacy" identifiers can form an integral part of DOI names. Businesses can continue to use familiar -- and proven -- naming or numbering systems in this new environment.
  • The Handle System is an efficient, extensible system designed to operate on any existing or future Internet service.
  • The metadata (DOI® Data Model) component of the DOI System is designed to offer maximum interoperability of data through a structured, extensible, design resulting from significant work over the past few years in the indecs project and its successors.

1.4.4 Identifying at the appropriate level

An achievement of DOI System work has been a practical implementation of the idea of rethinking the Internet as management of information, not movement of data packets. Managing information on the Internet at the appropriate level is a recurring theme in the vision of the future of the Internet.

As will be seen in what follows, the DOI System is not (only) an identifier of digital objects but (more widely) a digital identifier of objects -- that is, it facilitates digital management of any entities (focussing on those involved in intellectual property transactions). Identification of non-digital entities, such as underlying abstractions (the "work") and physical manifestations are also needed in expressing real world transactions, and any technology which considers only "digital representations" is inadequate for digital rights management. There is nothing new in using abstractions or representations in trading -- we do it all the time with physical property: representations such as deeds and mortgages are what alters (not the physical bricks etc.) when a house changes hands. Similarly with intellectual property, representations such as licences and files are traded. Digital trading of these pieces of property requires that each entity be uniquely and persistently identified, and associated with data.

The indecs framework recognises the concept of functional granularity ("it should be possible to identify an entity when there is a reason to distinguish it"); this is echoed in the DOI System treatment of an identified entity as a first class object (an object in itself, not some attribute of an object). Whereas URLs are grouped by domain name and then by some hierarchical structure (originally based on file trees), DOI names offer a more finely grained approach to naming, where each name stands on its own, unconnected to any Domain Name System (DNS) or other hierarchy. The most common mechanism for resolution on the Internet is DNS (http as used in URL is a use of DNS). The Handle System used by the DOI System uses TCP/IP but avoids the need to use the DNS, and this has significant advantages. One advantage is that names are not implicated in trademark disputes. Another is flexibility over time as the document origins reflected in a hierarchy lose meaning, such as a change in ownership (if acmeco.com sells some assets to newco.com, all URL filenames beginning acmeco.com/ which pertain to the sale need to be changed. This benefit has already been seen in the case of CrossRef, where millions of DOI names identified through the Academic Press IDEAL system were merged into Elsevier's Science Direct system when the companies merged). In order to manage DOI names we have created tools that allow more flexible management of sets of DOI names, in a more useful way than as a fixed sub-domain: a DOI name, DOI® Application Profile and DOI System services can all be thought of as layers of abstraction which allow this. Functionality such as URL partial redirection and relative URLs (which assume as "known" or inherited a part of a URL / domain name address) make a lot of sense in the context of URLs. However since DOI names deliberately have a more finely grained approach to naming things, functionality such as partial redirection is dealt with through tools that capitalise on that finer granularity: precise definition of components and their associated services.

Identifying at the appropriate level is key to managing information. Too low a level of granularity makes it impossible to pick out important differences: too high a level of granularity makes it too complex to group similarities. Here is a good analogy from Jorge Luis Borges: "Locke, in the seventeenth century, postulated (and rejected) an impossible idiom in which each individual object, each stone, each bird and branch had an individual name; Funes had once projected an analogous idiom, but he had renounced it as being too general, too ambiguous. In effect, Funes not only remembered every leaf on every tree of every wood, but even every one of the times he had perceived or imagined it. He determined to reduce all his past experience to some seventy thousand recollections, which he would later define numerically. Two considerations dissuaded him: he thought the task was interminable and that it was useless. He knew that at the hour of his death he would scarcely have finished classifying even all the memories of his childhood." ("Funes the Memorious")

1.4.5 Identifying copies and versions

A common question is: if I identify entity A with a DOI name, and then I adapt it in some way to create entity B, should I assign a new DOI name to entity B?

The answer is: there can be no general rule which applies to all cases and each must be treated in context. If a registrant finds it useful to do so, they may. The rules of Application Profiles, and business rules of Registration Agencies, will help in deciding for DOI names registered in Application Profiles. The key point is that one should precisely specify what A is and what B is; two digital entities are never the same in any absolute sense and can be considered copies of each other only in the context of some defined purpose.

For a more detailed explanation of this fundamental topic, see the article "On Making and Identifying a Copy" http://dx.doi.org/10.1045/january2003-paskin.

1.5 Components of the DOI System

The DOI System has four components:

  • Numbering: assigning a number (or name) to the intellectual property entity that the DOI name identifies (It is more correct to talk about the DOI name as an alphanumeric string, since a DOI name may contain characters as well as numbers but we use the term "number" to apply to this string, to avoid unnecessary complexity). This is the NISO syntax, standardised as ANSI/NISO Z39.84-2000.
  • Description: creating a description of the entity that has been identified with a DOI name, through the DOI Data Model, which is based on the open indecs initiative.
  • Resolution: making the identifier "actionable" by providing information about what the DOI name should resolve to, and the technology to deliver the services that this can provide to users; this uses the Handle System. The DOI System is an implementation of URI (Uniform Resource Identifier, sometimes-called Universal Resource Identifier, IETF RFC2396).
  • Policies: the rules that govern the operation of the system, in a social infrastructure.

DOI System Components

By combining a tool for naming "content objects" as first class objects in their own right with a mechanism to make these names actionable through "resolution", the DOI System offers persistent managed identification for any entity. But that alone is not enough: managing resources interoperably requires appropriate metadata: creating a mechanism to provide a description of what is identified in a structured way allows services about the object to be built for any purpose. The IDF has outlined, and is actively developing in more detail, a standard way of not only doing this, but linking to existing standards such as ONIX, Dublin Core and so on, allowing each community to bring its own identifiers and descriptions into play. Finally, wrapping these tools into a social and policy framework, through the Registration Agency federation, allows the development of DOI names in a consistent quality-assured way across many sectors, opening the possibility of managing multimedia objects seamlessly.

1.6 What can be identified by a DOI name?

1.6.1 Intellectual property

A DOI name can be used to identify any resource involved in an intellectual property transaction including, for example, text, audio, images, software, etc., and the agreements and parties involved. While the scope of intellectual property transactions is quite broad, it is unlikely that DOI names would be appropriate for identifying entities such as people or natural objects unless they are involved in such a transaction, or entities such as trucks. Intellectual property transactions don't necessarily involve money: DOI names can be used to identify free materials and transactions as well as entities of commercial value.

While a DOI name can be used like any other URI to identify "anything that has identity", the DOI System is a combination of components (identification, resolution, data model and policies) devised with the specific primary aim of identifying any "intellectual property entity". The initial focus of DOI System applications was "Creations" -- that is, resources made by human beings, rather than other types of resource (natural objects, people, places, events, etc.). Other types of resource are also necessarily involved in intellectual property transactions, and so may be identified by DOI names where appropriate. As an example, the initial aim of the DOI System was not to be used to identify natural objects (e.g., specimens in a natural history museum, or natural substances used in pharmaceutical research): but if these were involved in intellectual property interactions there may be an application of DOI name to museum artefacts or pharmaceutical components which would be appropriate. Similarly, the DOI System was not initially an identifier for agreements or licences (which in the indecs framework are types of events), but implementers may find it useful to identify these with DOI names alongside the intellectual property that they govern.

Critically, a DOI name is a persistent identifier: even if ownership of the entity or the rights in the entity change, the identification of that entity should not (and does not) change. The responsibility for managing the DOI name changes, but not the name itself.

1.6.2 Identification of abstractions

Creations may be in both tangible and intangible forms. DOI names can be assigned not only to manifestations of intellectual property (books, recordings, electronic files) but also to performances and to "abstractions" -- the underlying concepts (often referred to as "works") that underlie all intellectual property. This may be necessary for applications such as rights management or citation. These "abstractions" are what enable us to recognize a performance of a song, or the words of a book, entirely separately from any particular performance or specific edition. In fact there is nothing new in using abstractions or representations in trading -- we do it all the time with physical property: representations such as deeds and mortgages are what alters (not the physical bricks etc.) when a house changes hands. Similarly with intellectual property, representations such as licences and files are traded. Digital trading of these pieces of property requires that each entity be uniquely identified. DOI names can be used to identify any of the various physical objects that are "manifestations" of intellectual property: for example, printed books, CD recordings, videotapes, journal articles. A DOI name can also be used to identify less tangible manifestations, the digital files that are the common form of intellectual property in the network environment. But the use of a DOI name can go beyond the identification only of "manifestations" -- it can also be used to identify performances of intellectual property or the "abstractions" that underlie the different manifestations, and other types of resources where they are involved in intellectual property transactions.

1.6.3 Formal definition

Formally, DOI System scope is defined in terms of a data model, the model underlying the indecs work: a DOI name can be assigned to any entity which is a Resource within the indecs context model. This means the type of entity must be described in terms of attributes in the dictionary (e.g., media, mode, content, subject), and become an entry in the indecs Data Dictionary used by the DOI System. The practical outcome of this is important and provides a pragmatic functional specification: a DOI name can identify any Resource, but the DOI System requires that the Resource is defined (technically and hence precisely) in terms of agreed public (RDD) attributes. This is one role of the DOI Data Model.

1.6.4 Granularity

A DOI name can be applied at any level of granularity; in other words, there is no preset definition of the size or form of an entity that may be identified with a DOI name. Rather the decision as to what a DOI name identifies is taken by the Registrant on a purely functional basis -- what is it that I need to be able to identify? This is an application of what the indecs analysis calls Functional Granularity. The principle of functional granularity proposes that "it should be possible to identify an entity whenever it needs to be distinguished".

A DOI name can equally be used to identify a complete opera, an individual aria or a single bar of music. In the same way, it can be used to identify a journal, an individual issue of a journal, an individual paper in the journal, or a single table in that paper. However, it is not always possible to identify in advance which specific elements will need to be identified. It has to be possible to identify only those elements where there is a recognized need to do so -- whenever that need is recognized.

Functional granularity should be considered in addressing any question as to application. For example, if a journal publication were to exist in English and Spanish, how many DOI names would there be per article? There is no simple yes/no answer. This is a "functional granularity" issue, and hence ultimately a decision for the publisher. A publisher could consider the English (E) and Spanish (S) to be different "versions" of the same underlying "work" or "creation" (similar to having both a pdf and html version) in which case one DOI name. Or a publisher could consider them two separate underlying works, hence two DOI names. These could perhaps be related in one or more applications using the indecs entities and relationships or they could be grouped together under a third DOI name for the work. This latter approach is envisioned as a possible future evolution of the DOI System involving multiple resolution, in which a single DOI name for the work could be resolved to multiple additional DOI names for versions of the work, e.g., language, and each of those DOI names could further be resolved to multiple locations. Functionally the decision comes down to this; does the publisher wish to distinguish between E and S for any purpose, e.g., to enable certain mirror sites to carry only the Spanish or English versions and not have to carry both. The safe option is always to take granularity down as low as possible (two DOI names), retaining the flexibility to aggregate them in one or more ways at a later date.

1.7 Benefits of the DOI System to Publishers, Intermediaries, and users

1.7.1 Summary of benefits

The DOI System offers a unique set of functionality:

  • Persistence, if material is moved, rearranged, or bookmarked;
  • Interoperability with other data from other sources;
  • Extensibility by adding new features and services through management of groups of DOI names;
  • Single management of data for multiple output formats (platform independence);
  • Class management of applications and services;
  • Dynamic updating of metadata, applications and services.

For users, these features provide the ability to

  1. Know what you have
  2. Find what you want
  3. Know where it exists
  4. Be able to get it
  5. Be able to use it in a transaction

Some benefits which the DOI name enables:

  • Links appear as standard hyperlinks, but unlike URLs are persistent; changes to URLs defined as resolution points may be made without affecting the DOI name and any links bookmarked using it.
  • Once DOI names are registered, they are available to everyone who wants to use them.
  • Multiple-resolution options are registered along with the DOI name, and may be added or modified at any time. Using DOI name aware software, links can pop up on a menu dynamically, in real-time, out of the global DOI System directory -- if the registrant adds an option to the DOI name record, it will show up the next time a user clicks on a link based on that DOI name.
  • DOI names may be cut and pasted: users might encounter the DOI name as text in e-mail rather than on a web site, etc;
  • A DOI name is a universal, machine-readable number allowing cross-system communication -- like the ISBN for physical books, or the CUSIP # for Securities, yet without the need for a separate standard -- and so usable in all back office functions, which are not visible on the surface -- e.g., allowing the retailer to track sales according to a specific number, then report sales back to the publisher, have the publisher's internal systems tracking sales, performing accounting, calculating royalties, etc.
  • DOI names can incorporate existing identifiers such as ISBN, ISTC, proprietary identifiers, etc and add value to them and enforce interoperability; DOI names can utilise existing metadata schemes.
  • DOI System metadata mapping is based on the indecs (interoperability of data in e commerce) framework. Conformance with this framework facilitates the use of DOI names with MPEG-and ONIX compliant tools for multimedia content management and digital rights management, and with other schemes following the same principles.

Some specific benefits of DOI names in various aspects of the supply chain are described below in more detail.

1.7.2 Benefits in internal content management

DOI names and associated metadata ensure accurate, interoperable and efficient product information is available both externally but also internally, reducing costs in many places:

  • System Management cost overhead for existing systems: the costs of managing multiple systems holding the same data, system and interface maintenance, ongoing data cleansing, etc;
  • Data handling cost associated with manual entry of data, data cleaning;
  • Development efforts for new systems - costs associated with having to re-identify and validate data sources, complexity of data mapping and increased risk of errors, cleansing of data for every new project;
  • Reconciliation costs associated with having to reconcile different data sources;
  • Customer Query/Complaint cost associated with handling customer queries/complaints and loss of revenue and customers due to poor quality data;
  • Lost sales opportunities: revenue loss associated with not selling existing stock due to missing or incorrect product information;
  • Misinformation: costs associated with making decisions based on incorrect management information;
  • Dissatisfaction with the company: inability to provide accurate product information results in dissatisfaction amongst staff and customers;
  • Lack of knowledge of product suit: difficult to identify the entire relevant product suite, meaning that many customer-facing staff are unable to offer the complete range to customers;
  • New system development is delayed, as each project needs to address issues such as identifying and cleansing data. Increasingly this may impede product development particularly for electronic offerings. Standardisation of product type and data definitions would increase speed of new product development.

1.7.3 Benefits in the distribution and sales life-cycle

  • More sales. Any hyperlink anywhere on the Internet which refers to a product's DOI name now becomes an active, dynamically-controlled, updateable sales & marketing tool -- clicking on it can pop up a menu of all the actions or services which the Seller wants to provide to the consumer, a single click away for the consumer.
  • Managing new options. Changing these choices dynamically is easy and cheap for the Seller: in order to add another retailer, add another review, run a special discount or temporary promotion, or make any other change in the choices the consumer sees, the Seller simply updates one central DOI name record -- at which point, thousands of links all over the Internet which refer to that DOI name will now reflect the new or changed menu options, even if already stored, bookmarked, or printed.
  • Less labour, errors, and costs: Sales can be tracked more easily and more precisely (at the object level, or aggregated into new collections). Sales are tracked more cheaply and accurately when all transactions are keyed to an unambiguous identifier which computers can understand, instead of people trying to match the different IDs of each pair of partners, or trying to track sales based on descriptive metadata about the item instead of a numerical identifier.
  • Royalty and licensing revenue can be tracked more easily, cheaply and accurately, not only reducing costs but also capturing revenue more fully.
  • Cross-linking on a wide scale is feasible and inexpensive, both among a publisher's own content and between the publisher's content and other sources. This increases revenue opportunities as well as enhancing the value of the content itself through its rich hyperlinks.
  • Wider dissemination of content, through enhanced "discoverability" for all third parties and through greater use of metadata for targeting the content more precisely to different audiences whom it might not otherwise reach.
  • Greater leverage over distribution channels, both by reducing the costs of switching distributors (because set-up costs are minimised when everyone shares a universal ID) and by increasing the reach and breadth of the channels available (again, because set-up and ongoing administration are much simpler via the DOI name).
  • Additional, incremental product revenue enabled by offering highly-targeted, customized information products from the same asset base to specific audiences who might not otherwise purchase that information. The DOI System enables object-level control over digital assets - finding them, recombining them, distributing them, tracking their usage, etc.

1.7.4 Benefits in the production life-cycle

  • Unique, unambiguous, universal content ID -- to identify content objects throughout the entire content lifecycle, i.e.:
  • pre-publication (content Authoring, Aggregation, Selection, Rights Acquisition)
  • post-publication (Distribution, Syndication, Sales, Superdistribution)
  • archiving/digital asset management (for later re-use and re-purposing)
  • Ability to find content assets internally -- in order to facilitate re-purposing, recombining with other assets (internally or with external partners), future editions, etc.
  • Reduced costs of determining ownership, clearing rights, etc.
  • Reduced costs (and greater control) of licensing-out or syndicating-out
  • Reduced costs (labour and errors) of sales tracking, channel management, P&L calculation, etc.
  • Greater corporate-wide leverage over assets which are otherwise invisible behind separate divisional content management systems keyed with separate content IDs.
  • Cheap to implement: existing content management systems internal content ID field can be used to create a DOI name; so the internal ID also becomes usable by distribution/sales partners and all others.
  • Reduces costs, streamlines efficiency, and increases the functional capabilities of all internal systems which manage digital assets; creates a foundation for digital asset management which allows those assets to be leveraged more profitably; facilitates the creation of new, additional products over that same base of assets.

1.7.5 Quantified benefits: case studies

A white paper "Enterprise Content Integration with the Digital Object Identifier: a business case for information publishers", (http://dx.doi.org/10.1220/whitepaper5) quantifies the business benefits for information publishers of implementing the DOI System to facilitate internal content management and to enable faster, more scalable product development, by delivering four key advantages in making it easier and cheaper to:

  1. Know what you have (users able to look at catalogues of content available throughout the enterprise);
  2. Find what you want (users able to search and browse for content to be used or re-purposed);
  3. Know where it exists (able to see where the item exists within the organization);
  4. Be able to get it (users and production tools able to retrieve the content).

This is illustrated by four examples of cost savings, each of which is supported by a worked actual case study:

  • Cost avoidance in cross-brand product development (example case study: $120K savings for a vertical market information publisher building a new cross-brand web portal);
  • Scalable product development through repurposing (example: annual incremental revenue of $700K for a periodical publisher creating books from repurposed periodical content);
  • Cost reduction in existing production processes (example: 94% reduction in staff effort = $400K for a textbook publisher building web sites to accompany textbooks);
  • Increasing revenue and market share through tool integration example: <$1.2M incremental revenue for a financial information publisher selling documents through third party links from investment selection tools.

1.8 The DOI System as social infrastructure

The implementation of the DOI System adds value, but necessarily incurs some costs. The three principle areas of cost currently lie in the following tasks:

  • Number registration; maintenance of resolution destination(s); declaration of metadata; validation of number syntax and of metadata; liaison with the Handle System registry; customer guidance and outreach; marketing; administration
  • Infrastructure: resolution service maintenance, scaling and further development
  • Governance: common "rules of the road"; development of the generic system

There is a widespread recognition of the advantages of assigning identifiers; and a widespread misconception that an abstract specification (like a URN or URI) actually delivers a working system rather than a namespace that still needs to be populated and managed. A common misperception is that one can have such a system at no cost. It is inescapable that a cost is associated with managing persistence and assigning identifiers and data to the standards needed to ensure long-term stability. This is because of the need for human intervention and support of an infrastructure. Assigning a library catalogue record, for example, will typically cost anything up to $25. Assigning an ISBN or ISSN or National Bibliography Numbers will also have costs, even if these are not paid directly by the assigner. Although a DOI name is free at the point of use, there is a small fee to an assigner for creating a DOI name (a few cents). This is because we have deliberately chosen to make the DOI System a self-funding (though not for profit) system. Our task now is to show that the system offers value for money as a tool which producers of information can use: CrossRef is one proven example of a registration Agency and Application Profile in text publishing; we expect to see other variants on this theme develop.

If adding a URL "costs nothing" (which itself ignores some infrastructure costs), why should assigning a name? It is indeed possible to use any string, assigned by anyone, as a name -- but to be useful and reliable any name must be supported by a social as well as technical infrastructure that defines its properties and utilities. URLs for example have a clear technical infrastructure (standards for how they are made), but a very loose social infrastructure (anyone can create them, with the result that they are unreliable alone for long term preservation use as they have no guarantee of stability let alone associated structured metadata). Product bar codes, Visa numbers, and DOI names have a tighter social (business) infrastructure, with rules and regulations, costs of maintaining and policing data -- and corresponding benefits of quality and reliability (When a credit card is presented, we can be reasonably certain that the number is valid, and has been issued only after careful correlation with associated metadata by the registrant). It does not necessarily imply a centralised system -- it may be a distributed system (like domain names), but it must have some form of regulation.

Such regulation of infrastructure for a community benefits all its members; funding the development of it is often a problem, and there is no "one size fits all" solution to how this should be done. But finding a workable model for the development of an infrastructure can yield obvious benefits. There are many modern examples -- 3G telephone networks, railways -- which are struggling with the right model for supporting a common infrastructure. The Internet was largely a creation of central (US) government; the product bar code, a creation of a commercial consortium. The IDF has chosen as its model the concept of Registration Agencies, based on market models like bar codes and Visa rather than on centralised subsidy: these Agencies effectively hold a "franchise" on the DOI System: in exchange for a fee to the IDF, and a commitment to follow the ground rules of the DOI System, they are free to build their own offerings to a particular community, adding value services on top of DOI name registration and charging fees for participation.

At the outset of the DOI System development, a very simple model was introduced whereby a prefix assignment was purchased for a one-off fee from the IDF. It was recognized at the outset that this fee structure was a starting point but would be insufficiently flexible for the long term. DOI names allocated using these prefixes purchased directly from IDF are registered without structured metadata: they are now defined as being in the zero Application Profile.

We are now in a process of migration to the long term aim of a wide variety of potential business models, using third party Registration Agencies, in recognition of the fact that such a simple model is not a "one size fits all" solution. The disadvantage of using direct prefix purchase is that IDF cannot offer the level of metadata support and social infrastructure support of the type which can be given by a Registration Agency. DOI name prefixes obtained directly from IDF may however be useful if you wish to experiment or consider developing your own applications. DOI name prefixes will now only be issued through this direct route at the discretion of the Managing Agent.

Our intention is that eventually all DOI names will be registered through one of many Registration Agencies, each of which is empowered to offer much more flexible pricing structures. The pricing structures and business models of the Registration Agencies will not be determined by the IDF; each RA will be autonomous as to its business model, which could include, but not be limited to, cost recovery via direct charging based on prefix allocation, numbers of DOI names allocated, numbers of DOI names resolved, volume discounts, usage discounts, stepped charges, or any mix of these; indirect charging via cross subsidy from other value added services, agreed links, etc.

DOI names may be made available at "no charge", if the costs of doing so can be met from elsewhere (there is no such thing as "free", only "alternatively funded"). IDF itself is willing to allocate a DOI name prefix free of charge to organizations for limited experimental non-commercial uses at the discretion of the Managing Agent. For the longer term, the business model includes two separate steps: a business relationship between IDF and an RA (the "franchise fee"); and a business relationship between an RA and a DOI name registrant (the "registration fee"). The two are not directly connected; this enables the RA to offer to registrants any business model whatever, which suits its needs. This could include assigning DOI names without charge. Hence DOI names can be used in both commercial and non-commercial settings, interoperably. Like any other piece of infrastructure, an identifier system (especially one which adds much value like metadata and resolution) must be paid for eventually by someone. So an organization could, if it wished, assign DOI names freely (registration fee zero to registrants) and subsidize this added-value service by paying a franchise fee to IDF from a central fund, as an acceptable cost for supporting the service.

1.9 The DOI System as managed system

Like Domain Name registration, assignment of DOI names requires a fee and agreement to follow the defined standard and rules. This does not make the system closed, or commercial, but it does make it managed. The IDF is a not-for-profit organization, not a commercial operation; however, the system has costs that need to be met. Persistence is a function of organizations, not technology: to support a persistent identifier system, a persistent organization needs to exist. The principle concern of a persistent organization is of continuing funding; hence the model selected for a long-term position for a DOI System organization was a body that is not reliant on external sources, such as grants or membership, but is a self-funding system that can be supported in perpetuity from its own resources. The IDF is currently undergoing controlled migration from its initial member-funded organization (like W3C) to an organization that is operationally funded.

The implementation of the DOI System adds value, but the implementation necessarily incurs some resource costs in data management, infrastructure provision and governance, all of which contribute to persistence. The mechanism chosen to recoup those costs incurred by the organization is a self-funding "franchise" business model, as used by the physical bar code UCC/EAN system, and other proven systems. This is funded by a fee for participation (which may optionally be passed on to registrants, waived, or subsidised by the operating entity), but not for use of a DOI name once issued.

To make such a system work effectively requires protection of the assets within the system (1) from illicit exploitation, and (2) for assured quality control. Illicit exploitation would include someone calling something a DOI name when it is not part of the system; this could be damaging to one or both of the financial health (avoiding payment of an issuing fee) or the quality (poor data) of the system. To prevent this exploitation requires the availability of legal remedies: specifically, the DOI System relies on copyright and trademark law to protect the "DOI" brand and reputation. The DOI System is not a patented system; the IDF has not developed any patent claims on the DOI System and does not rely on patent law for remedy.

The underlying technologies used by the DOI System also have similar considerations. The Handle System is used by IDF under licence from Corporation for National Research Initiatives, who have certain intellectual property claims to protect the misuse of the Handle System; <indecs> intellectual property (IP) is assigned to, jointly and solely, IDF and EDItEUR and made available freely but under stated terms to others (an example being the <indecs>RDD work contributed to MPEG 21).

There is a widespread recognition of the advantages of assigning identifiers as well as a widespread misconception that an abstract-free specification (like a URN or URI) actually delivers a working system rather than a namespace that still needs to be populated and managed. URLs, for example, have a clear technical infrastructure (standards for how they are made) but a very loose social infrastructure (anyone can create them once a domain name has been obtained, with the result that they are unreliable: they have no guarantee of stability, let alone associated structured metadata). Product bar codes, Visa numbers, and DOI names have tighter social (business) infrastructures, with rules and regulations, costs of maintaining and policing data, and corresponding benefits of quality and reliability. From this need for management stems some misconceptions about the DOI System funding and business model. The most common myths are:

  • Myth: the DOI System is for, run by, or only to the benefit of, commercial publishers. The publishing community was the first to see the benefits of persistent identification and to attempt to build an open system (rather than a system for, e.g., a library or a campus); several publishers have not only joined the IDF but provided initial loan funding, and the initial Crossref application is in the publishing sector. However, there is nothing to prevent any other application, or any non-publisher involvement.
  • Myth: the DOI System is "a commercial packaging of something that is available for free elsewhere". The practical implementation offered by the DOI System is more than a collection of the underlying technical specifications.
  • Myth: the DOI System is "only for rights management". Whilst that was the initial impetus, since rights management requires an extensible system, it is in fact applicable for any use.
  • Myth: The DOI System is "untested" or unrelated to other activities. All of the components are proven in other contexts, and there are millions of working DOI names. The DOI System builds on the Handle System and <indecs>, and so it inherits the strengths and real-world testing of these: for example, the <indecs> approach has been validated by rigorous analysis in the MPEG 21 framework development. These underlying technologies (rather than the DOI System per se) are often appropriate to answer the question of "how the DOI System relates to X".
  • Myth: the DOI System "allows only one business model" (seeing a swan and claiming that all birds are white and swim). As more applications are developed, the flexibility of a system that deliberately allows any business model will be appreciated.
 

Previous Chapter: Glossary    Next Chapter: 2 Numbering