Code & Data Sharing

Science suffers from a reproducibility crisis. This is particularly a problem in machine learning studies, which have been criticised for their lack of reproducibility. However, this can be combated by sharing the code used for data processing and analysis and (where possible) the data too.

Code Sharing tools

Some of the most popular repositories are GitHub (https://github.com/), GitLab (https://about.gitlab.com/) and Zenodo (https://zenodo.org/).


Code can also be uploaded to a relevant Open Science Framework project (https://help.osf.io/article/387-project-files)





Is my code good enough to share?


If it works - yes! Research code doesn’t have to

be industry standard. However, it should be:

  • Organised
  • Simple
  • Short
  • Extensively Documented
  • Tested

‘Point-and-click’ software

Open source languages, such as R and Python, use code, which is inherently shareable. However, some analysis is performed in software using Graphical User Interfaces (‘point-and-click’). This is a legitimate way to do quantitative analysis. However, it is not, in itself, reproducible. 

Free, open-source software is available that combines the usability of GUIs, while also outputting code that can be shared, ensuring reproducible and transparent analyses. These include JASP (https://jasp-stats.org/) and jamovi (https://www.jamovi.org).

Data sharing

Ideally, data should be shared alongside the code used to process and analyse it. Of course, this isn’t always possible, particularly with sensitive healthcare data. 

Where data can be shared, it should follow the FAIR principles (https://www.go-fair.org/fair-principles/):

  • Findable
  • Accessible 
  • Interoperable
  • Reusable

The 5-star deployment scheme (https://5stardata.info/en/) for Open Data describes the different levels, costs and benefits of open data. 

Data can be uploaded to repositories alongside the code  - but make sure you are abiding to the licence of the data when you upload them! 


Licensing

It is a good idea to attach a license to your code when sharing it.


There are many types of licenses. Some, like Copyleft licenses, ensure that whatever comes out of your code remains open to the public. More permissive licenses, like WTFPL (https://en.wikipedia.org/wiki/WTFPL), allow people to use your code to build closed source commercial software. GitHub provides a tool to help you identify the license that best suits you (https://choosealicense.com/).

Share by: