Using third party research data

When you use the ideas and words of other authors in your research publications, you are required to observe limitations placed on you by copyright law, and provide proper attribution to avoid charges of plagiarism. The same is true when you reuse data shared by other researchers and organisations. Such data will often be provided to you under certain conditions, which you should respect.

Respecting the rights of data originators

When using third party research data, you have a responsibility to respect the various rights that may be held by other people or organisations, including copyright, sui generis database rights and moral rights.

If data are supplied to you without any sort of licence information, terms of use or usage agreement, you should interpret this as meaning that the originator has reserved all rights to the data. This has the unfortunate result of making it less than clear what you are allowed to do with them, as the legal position is quite complex and rests on several value judgements. In all likelihood, you would be limited to verifying results that had already been derived from the data. In such cases, you should ask the rights holder (usually the originator) for written permission to use the data as you intend, and also to retain and share any derivations of the data.

Normally, however, you should expect to receive the data under a licence. The contents of such licences vary but the following types of terms are common.

No derivatives

The essence of this requirement is that you can use and share the data as they stand but you are not allowed to alter or transform them in any way. As with the case of all rights being reserved, you are unlikely to be able to do much more than verify results that have already been derived from the data.

The implementation in the Creative Commons version 4.0 licences is more nuanced, in that you are allowed to change the file format and share the result. You could also make private adaptations or remixes of the data, so long as you did not share them. It is unclear how useful that might be, though, since research results derived from the licensed data (including tables and graphs) might be considered transformations of those data and therefore you would not be able to share them.


Some licences forbid data being used for commercial purposes. This is not usually an issue in academic research, but you might come across circumstances where you would not be able to use the data. Examples include performing consultancy for an external organisation, applying for a patent, or commercialising your research in some other way.


A licence with a copyleft or share-alike requirement allows you to make adaptations to the data, and combine them with data from different sources, but if you share the resulting dataset you must apply the same licence to it.

Some licences are stricter than others about their copyleft condition. Most allow you to use a later version of the same licence. Some allow you to use a functionally equivalent licence, or have explicit compatibility clauses:

  • Resources derived from content licensed under either or both of Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 and the Free Art Licence 1.3 may be released under either licence.
  • Resources derived from content licensed under both CC BY-SA 4.0 and the GNU General Public Licence (GPL) v3.0 may be released under the GPL v3.0, though this is not entirely satisfactory for data.
  • Resources derived from content licensed under the Non-Commercial Government Licence may be released under any other non-commercial licence.


Some licences have terms that explicitly prevent you from locking down the copies or derivations of the data that you share with others:

  • The Open Data Commons Open Database Licence requires that, if you place technological restrictions like DRM (Digital Rights Management measures) on the data, you must also distribute a version without the measures in place.
  • If you modify a resource licensed under the GNU and share a compiled version, you must also share the uncompiled source code for the modifications you made.


Most licences require you to acknowledge that you have used the resource in question, and many require an explicit acknowledgement of the originator or rights holder:

  • The Open and Non-Commercial Government Licences require you to include or link to a specified attribution statement in any derived resource, and where possible provide a link to the licence.
  • The Open Data Commons licences require you distribute the resource's rights statements with any derived dataset, and make users of any non-dataset resources based on it aware that it contains information from the resource and under what licence.
  • The Creative Commons licences require you to preserve, in any redistribution or derivative work, any attribution statement, notices of copyright and licensing, or warranty disclaimer supplied with the resource, along with a link to the original.

Over and above the licence requirements, you are also expected to acknowledge in your research outputs any third party data underlying your results, preferably in the form of a data citation.

Using public domain data

If you use third party data that have been dedicated to the public domain, you do not have to fulfil any particular legal responsibilities in respect of those data. Nevertheless, you are still expected to act honestly regarding them, meaning you should acknowledge the source of the data in your documentation. You should also acknowledge that you used the data in any research outputs arising from them, preferably in the form of a data citation.

Archiving third party data

The approach you take regarding archiving third party data depends on how you have used the data and what permissions you have been granted.

If you have used third party data without altering them in a significant way, and they are already available from a third party archive, you do not need to archive them again. Simply cite the original dataset when you publish your results.

If they are not available from an archive, check with the data originator to see if they plan to archive the data themselves. If not, and you have permission to do so, you should archive your copy. Be sure you credit the correct creators and rights holders in the archive record, and apply the licence under which you received them.

If you used a subset of a third party dataset or database, and it would take some effort to extract the same subset again, you should consider archiving your subset. Again, ensure you have permission to retain and share your copy, credit the original creators and rights holders, and apply the licence under which you received them.

If you have integrated a third party dataset with other data, check the licence you received it under. If you have permission to share the resulting dataset, archive it, remembering to fulfil all relevant licence terms such as those relating to onward licensing, acknowledgement, preservation of notices, and so on. If you do not have permission to share the resulting dataset, archive those components of the dataset you do have the rights or permissions to share, and in the documentation provide full instructions for how to obtain the remaining components and derive the final dataset.

Whenever you archive third party data, it is good practice to inform the data originator. They may wish to set up links to your archived copy, thereby demonstrating the impact of their own work and raising awareness of yours.