Licensing your data

When you publish data, you should provide licence information to tell other users how they may use your data. What they are allowed to do by law differs around the world, and may depend on the nature of the data themselves, how they were gathered and how they were arranged. By releasing data under a licence, you remove that uncertainty and confusion. The licence enables them to use the data without fear of legal reprisals, provided they follow the conditions you set.

Agreeing a licence to use

Datasets can only be licensed by the person or organisation that holds the relevant rights. Usually the rights in datasets generated by University of Bath staff and students are held by the University, but you can assign a licence on the University's behalf so long you follow the University of Bath Research Data Policy Guidance. In short, this means the licence should require users to acknowledge the originator of the data in any publication or derived work; it should not apply restrictions such as non-commercial or share-alike terms without a strong justification.

If you are collaborating with or funded by any third parties, the rights situation may be more complicated, so you should check the terms of your collaboration or funding agreement. If it assigns the rights in the data generated by your project to other parties, you must reach agreement with all of them over the licence to use. It is also good practice to agree on a contact person or organisation to make decisions on behalf of all rights holders, in case users have queries or request additional permissions.

If you have incorporated third-party data in your dataset, look again at the licence under which you used them. Check if you are able to distribute the resulting dataset, and if so, whether you are obliged to use a particular licence.

Choosing a licence

Attribution-only licences

In most cases, the most appropriate licence to use is an attribution-only licence. By applying such a licence, you would give users permission to derive new datasets and other resources from your data, and redistribute your data and their derivations, for any purpose. In return, they must give an appropriate attribution to the originator of the dataset.

The most widely known attribution-only licence suitable for data is CC BY; be sure to use version 4.0 or later as the earlier versions did not cater satisfactorily for data. Even though it is a permissive licence, it would prevent a user from changing your data and passing it off as the original, or from introducing new legal or technological measures that would withdraw the permissions granted by the licence.

For software, appropriate licences include the MIT (Expat) Licence and the Apache Licence. The Apache Licence is longer and provides more clarity in jurisdictions where software patents are recognised, such as the United States.

Copyleft licences

By applying a copyleft licence, you give permission for users to derive new datasets and other resources from your data, but if they distribute their derivations, they must do so under the same licence terms. This would ensure that any other research derived from your data would also be shared by the community, but would prevent your data being combined with data released under a different copyleft licence.

The CC BY-SA licence combines the attribution requirement and other protections of CC BY with a copyleft condition. As with CC BY, only version 4.0 or later should be used for data.

For software, the most commonly used copyleft licence is the GNU GPL. Note that, as a special case, it is possible to mix resources that use the CC BY-SA 4.0 and GNU GPL v3 licences and release the result under GNU GPL v3, though as the GPL does not specifically mention database rights it is not ideal for data.

Restrictive licences

It is possible to apply licences that give fewer permissions or apply more stringent conditions on how others may use the data.

Creative Commons provides the CC BY-ND licence, which allows others to use, copy and redistribute the data as they stand for any purpose, but prohibits them from distributing modified versions or derived works. This may be appropriate for materials that would lose their value or present a risk if altered, such as images. For other data, these terms are likely to be too restrictive, due to the potentially broad way in which 'derived work' may be interpreted (including, for example, graphs and other visualisations).

Public domain dedications

With a public domain dedication, you would waive your rights to the data, including copyright and the database right, to the fullest extent possible. Users would not be obliged legally to acknowledge the originator of the data, though in line with academic norms they would still be expected to acknowledge the source in papers derived from the data, or in the documentation of derived datasets.

University policy does not normally support the use of a public domain dedication for research data, but it may be the most appropriate choice where the value of the data is in their ability to be combined with others, and where to provide attribution would be a barrier to an effective research process. It is usually the most appropriate licence for metadata and all metadata in the University's University of Bath Research Data Archive have been dedicated to the public domain.

The most commonly used and recognised public domain dedication suitable for use with data and software is the one provided by Creative Commons, known as CC0.

Bespoke licences

Writing your own bespoke licence is not usually recommended. It requires considerable effort, both on your part to write it and on the part of end users to understand it. It may, however, be the most appropriate option if you have specific conditions to apply. Before starting on this, you should contact Research Commercialisation and Contracts for advice.

For highly sensitive data, you may need to set up a more stringent access regime whereby you or another responsible person review all requests for access, and end users must sign an agreement before being allowed to use the data. Again, you should contact Research Commercialisation and Contracts for advice on the wording of the agreement, and ask your data archive about setting up an on-request access regime.

All rights reserved

If you choose not to apply a licence at all, this means you reserve all rights to the data. In this case, end users can only do with them what copyright law permits. This would include viewing your data to confirm research findings, but not much else.

End users do have the option to contact you for individual permissions, which you can grant in writing. If you find yourself having to do this a lot, you may be better off setting up a bespoke licence or end user agreement.

Applying a licence

The simplest way of applying a licence is to select it as an option when uploading data to an archive. The University of Bath University of Bath Research Data Archive allows you to select a licence from a pre-defined list, on a per-file basis.

If your archive does not support this, or if you are using a bespoke licence, you should instead include a file called 'LICENSE.txt' in your dataset that states which licences apply to which files. This statement need not include the full terms of the licence or licences, but should provide a link to where the full terms may be read, ideally in other text files in your dataset.

You may find the following external guidance useful.