Who Owns the Data?

Data, in various forms, drive both basic and applied research in most fields. Data collection and use require many costly steps, from conceptualization to maintenance. In academia, there exists an ideal of maintaining public-domain access to data and a strong belief that society benefits from research findings. Current proposals to tighten the controls over databases, therefore, appear antithetical to longstanding academic values.

Increasingly, academic researchers find they must pay to access databases. Biotechnology firms license out the annotations that enrich the genome projects, financial economists need comprehensive securities trading statistics, and effective scholarship in public policy, law, and taxation requires access to online subscription services. Ever-increasing privacy rights, confidentiality agreements, and national security restrictions are curtailing open access to many other important databases. Certainly some universities could enhance their royalty income by following suit: licensing their databases for use by outside parties. Should research universities embrace this trend?

The issues raised are neither simple nor straightforward. In U.S. law, ideas are as “free as the air” unless they are embodied in a patented invention or kept confidential as trade secrets. It is a fundamental tenet of academic scholarship that information is a form of public goods. In practice, U.S. copyright law protects only the selection and arrangement of the data in a database, not the data itself. For broader protection, the courts have required recourse to trade-secret laws employing physical and contractual controls.

By contrast, the European Union's (EU) expansive Database Protection Directive, enacted in 1996, requires each EU member nation to enact sui generis protection for databases. Sui generis intellectual property rights are unique new forms of protection. They cover new “subject matter”—new categories of things to be protected. Proponents argue that sui generis rights are needed to incentivize creativity where it is not adequately addressed by established intellectual property forms.

Sui generis database protection raises public-policy issues that reveal the depth of the conflict between open access and proprietary values. Should Congress or the states create new sui generis intellectual property rights in databases? Should the U.S. respond to European developments with attempts to harmonize itself with EU law? Several proposals for sui generis protections have already been introduced in Congress. To complicate matters, many state legislatures are likely to draft bills of their own, raising the prospect of a patchwork of inconsistent laws.

This is not really new territory. Since the 19th century, Congress has created several forms of sui generis intellectual property rights. Many of these forms are familiar to the university community: design patents (ornamental designs, 1842); plant patents (asexual reproduction, 1930); plant varieties (sexual reproduction, 1970); and semiconductor chips (maskworks, 1984). Occasionally, the courts and state legislatures have also developed new subject matter or otherwise expanded intellectual property rights, as has happened with software, business methods, plant species, and even life forms. In each case, proponents have argued that stronger property rights encourage investment that would not otherwise occur, and that such investment benefits society more than it costs. Today various sectors of the Internet “content” industry are lobbying heavily for sui generis protections. They argue that pirates can too easily “ride free” on the “sweat of the brow” of database creators. Content providers believe that weak U.S. protections encourage wholesale infringement, and narrow the possibilities for profit using various business models.

Critics argue that sui generis database rights would impose substantial new costs on research, and that they are particularly inappropriate where the database results from publicly funded work. Indeed, the National Research Council (NRC) strongly cautions that sui generis database protections could retard scientific research. Furthermore, the NRC argues that current trade-secret controls and licensing restrictions afford sufficient protection.

The significance of this debate is magnified by the expanding definition of what constitutes a database. Traditional definitions are narrow; they limit databases to organized collections of numerical observations taken from tightly controlled experiments. Sui generis database laws contemplate much broader definitions, including structured content of nearly any type of information or work. In some proposals, databases are being broadly defined to include “any physical or digital collection of information or works arranged in a systematic or methodical way for retrieval or access by manual or electronic means.” Under this view, databases would include literary and artistic works, texts, sounds, images, numbers, facts, statistics, production or shipping information, transactions, financial data, health information, geographic information, and private personal data.

Recent technological developments have enhanced the value of databases considerably, which raises the stakes. Examples that expand the concept of databases include peer-to-peer database creation and file-sharing using freely available programs like Napster; automated data harvesting by Internet “bots”—computer programs functioning as electronic agents or posing as human users; and the capability for near-instantaneous aggregation and association of data from physically separate or independent databases, resulting in data “mining” and data “warehousing.”

Thomas Jefferson, an accomplished inventor and the first federal administrator of federal patents rights as Secretary of State, remains an important influence on intellectual property rights in the U.S. Jefferson insisted that society should not suffer the “embarrassment” of granting a monopoly in intellectual property rights unless society's benefits are clear-cut and substantial. Modern Jeffersonians expand this reasoning to argue that the burden of proof must be kept very high on proponents of sui generis rights. They argue that new rights must: (1) fit harmoniously with existing intellectual property protections; (2) be defined in a reasonably clear and satisfactory manner; (3) be based on an honest cost-benefit analysis; and (4) clearly enrich and enhance the public domain. The ultimate question is whether the benefits of new or expanded intellectual property rights will offset their costs. Many projections of such benefits are highly speculative and too often the costs are overlooked. For example, the costs of infringement enforcement and compliance are systematically underestimated in public policy debates, largely to the benefit of intellectual property professionals.

A more thorough understanding of the diverse types of databases seems essential before any new form of sui generis intellectual property is created. Database rights may subvert scholarship if they are not more precisely targeted. An abrupt introduction of new sui generis property rights could have negative effects not only on scientific inquiry but also on small businesses and on incentives for other types of innovation.

Traditionally, the groups influencing intellectual property policy debates have been rather limited, and may not have adequately represented all affected interests. However, the digital revolution brings many more groups to the table. Why should the academic community participate in this sui generis database protection debate? At least four reasons are apparent: (1) sui generis database rights may upset traditional academic values; (2) they will likely impact research and scholarship; (3) they could conceivably provide royalty cash flow to some institutions; and (4) academics are in a good position to provide the conceptual understanding about data and databases that will be needed to refine sui generis database protections. Objective academic research into the EU experience with sui generis database protections, for example, would provide valuable information for drafting U.S. legislation.

The academic community could more effectively participate in this debate by developing some consensus on the information policy issues raised. It seems fundamental, first, to distinguish clearly between various types of databases according to such factors as the nature of the data, the source of funding for data collection and maintenance, the degree of innovation in database architecture and functionality, the likely uses for the data, and other considerations. Without a common basic understanding and some sense of others' experience with database protections, it seems unlikely that new sui generis database rights will contribute to the public interest or help society as much as proponents argue. The impact on academic values may be profound, and will likely cause dissatisfaction unless academia participates in this debate.

