Reproducible guide
Draft
2022-02-16
1 Overview
This guide is a combination of resources for reproducible and open science tools for graduate research and beyond using RMarkdown and RStudio.
Welcome!
This is a minimal example of a book based on R Markdown and bookdown (https://github.com/rstudio/bookdown).
This template provides a skeleton file structure that you can edit to create your book.
The contents inside the .Rmd files provide some pointers to help you get started, but feel free to also delete the content in each file and start fresh.
Additional resources:
The bookdown book: https://bookdown.org/yihui/bookdown/
The bookdown package reference site: https://pkgs.rstudio.com/bookdown #
If you prefer text as the link instead of a numbered reference use: any text you want can go here.
1.1 Working with this resource
1.2 Description
Using tools like git for version control and knitr for dynamic figure generation are great steps forward towards better research transparency and reproducibility.
But there are also steps to be made in improving code and preparing data for re-use. These steps don’t necessarily involve the use of new tools, but instead would be the result of applying “best practice” guidelines.
We aim to begin compiling resources here. This is only a start, and we’d encourage anyone to suggest further guidelines, particularly those that are specific to various statistical software packages.
1.3 Guidelines for writing code
Very useful manual:
We’d recommend reading the full guide, but briefly outline the sections of “RA Manual: Notes on Writing Code” here so as to give a preview. We also include one example from each section, but there are many other examples in the original paper.
1.3.0.0.0.1 Code should be logical
- Despise redundancy
- Separate functional code and metadata
- Use the right data structures
- Make your functions shy
- Use overriding methods instead of switches where appropriate
Example: use the right data structures
Instead of writing a function like:
myfunction <- function(name, surname, age, height) {
if (age < 18) {}
}
We can use a structure like an object or array for that.
myfunction <- function(person) {
if (person.age < 18) { }
}
This would make the code easier to read and maintain.
1.3.0.0.0.2 Code should be readable
- Keep it short
- Order your functions for linear reading
- Choose descriptive names
- Use white space and indents to make code scannable
- Pay special attention to coding algebra
- Make logical switches intuitive
- Use enough comments and no more
- Be consistent
- Avoid commands that make code hard to read
Example: choose descriptive names
It might be easier for other users to read and adapt existing code if the variables, folders, classes and other elements have simple and intuitive names. So prefer writing code like:
calculate_status <- function(person) {}
Rather than:
cs <- function(p) {}
1.3.0.0.0.3 Code should be robust
- Check for errors
- Write tests
Example: check for errors
Users can use your code with different parameters and in different environments. It is a good practice to include code that check for erroneous values and provide clear feedback.
myfunction <- function(x, y) {
return x / y;
}
myfunction <- function(x, y) {
if (y == 0)
return 'y must be different than zero!';
return x / y;
}
Of course you can also write tests to make sure your verifications are working correctly ;-)
1.3.0.0.0.4 Code should be efficient
- Profile slow code relentlessly
- Store “too much” output from slow code
- Separate slow code from fast code