•Reproducible Analysis
Bibliographies :
Often you want to cite books or papers in a report. You can, of course, handle citations manually, but a
better approach is to have a file with the citation information and then refer to it using markup tags. To add a
bibliography, you use a tag in the YAML header called bibliography.
---
...
bibliography: bibliography.bib
...
---
You can use several different formats here; see the R Markdown documentation (http://rmarkdown.
rstudio.com/authoring_bibliographies_and_citations.html) for a list. The suffix .bib is used for
BibLaTeX. The format for the citation file is the same as BibTeX, and you get citation information in that
format from nearly every site that will give you bibliography information.
To cite something from the bibliography, you use [@smith04] where smith04 is the identifier used in
the bibliography file. You can cite more than one paper inside square brackets separated by a semicolon,
[@smith04; doe99], and you can add text such as chapters or page numbers [@smith04, chapter 4]. To
suppress the author name(s) in the citation, say when you mention the name already in the text, you put
- before the @, so you write As Smith showed [-@smith04].... For in-text citations, similar to \citet{}
in natbib, you just leave out the brackets: @smith04 showed that... and you can combine that with
additional citation information as @smith04 [chapter 4] showed that....
To specify the citation style to use, you use the csl tag in the YAML header.
---
...
bibliography: bibliography.bib
csl: biomed-central.csl
...
---
Check out the citation styles list at https://github.com/citation-style-language/styles for a large
number of different formats. There should be most, if not all, of your heart desires there.
Controlling the Output (Templates/Stylesheets)
The pandoc tool has a powerful mechanism for formatting the documents it generates. This is achieved using
stylesheets in CSS for HTML and from using templates for how to format the output for all output formats.
The template mechanism lets you write an HTML or LaTeX document, say, that determines where various
part of the text goes and where variables from the YAML header is used. This mechanism is far beyond
what we can cover in this chapter, but I just want to mention it if you want to start writing papers using R
Markdown. You can do this, you just need to have a template for formatting the document in the style a
journal wants. Often they provide LaTeX templates, and you can modify these to work with Markdown.
There isn’t much support for this in RStudio, but for HTML documents, you can use the Output Options
command (click on the tooth-wheel) to choose different output formatting.
Running R Code in Markdown Documents
The formatting so far is all Markdown (and YAML). Where it combines with R and makes it R Markdown
is through knitr. When you format a document, the first step evaluates R code to create a Markdown
document. This translates an .rmd document into an .md document, but this intermediate document is
deleted afterward unless you explicitly tell RStudio not to do so. It does that by running all the R code you
want to be executed and putting it into the Markdown document.
The simplest R code you can evaluate is part of a text. If you want an R expression evaluated, you use
backticks but add r right after the first. So to evaluate 2 + 2 and put the result in your Markdown document,
you write `r and then the expression 2 + 2 and get the result 4 inserted into the text. You can write any
R expression there to get it evaluated. This is useful for inserting short summary statistics like means and
standard deviations directly into the text and ensuring that the summaries are always up to date with the
actual data you are analyzing.
For longer chunks of code, you use the block-quotes, the three backticks. Instead of just writing:
```r
2 + 2
```
which will only display the code (highlighted as R code), you put the r in curly brackets.
This will insert the code in your document but will also show the result of evaluating it right after the
code block. The boilerplate code you get when creating an R Markdown document in RStudio shows you
examples of this (see Figure 2-3).
Figure 2-3. Code chunk in RStudio
You can name code chunks by putting a name right after r. You don’t have to name all chunks, and if
you have a lot of chunks, you probably won’t bother naming all of them. But if you give them a name, they
are easily located by clicking on the structure button in the bar below the document (see Figure 2-4). You can
also use the name to refer to chunks when caching results, which we will cover later.
■ Reproducible Analysis
If you modify these options, you will see that the options are included in the top line of the chunk. You
can of course also manually control the options here, and there are more options than what you can control
with the window in the GUI. You can read the knitr documentation for all the details (http://yihui.name/
knitr/).
This dialog box will handle most of your needs, though, except for displaying tables or when you want to
cache results of chunks, both of which we return to later.
Using Chunks when Analyzing Data (Without Compiling Documents)
Before continuing, though, I want to stress that working with data analysis in an R Markdown document is
useful for more than just creating documents. I personally do all my analysis in these documents because
I can combine documentation and code, regardless of whether I want to generate a report at the end. The
combination of explanatory text and analysis code is just convenient to have.
The way code chunks are evaluated as separate pieces of analysis is also part of this. You can evaluate
chunks individually, or all chunks down to a point, and I find that very convenient when doing an analysis.
There are keyboard shortcuts for evaluating all chunks, all previous chunks, or just the current chunk (see
Figure 2-7), which makes it very easy to write a bit of code for an exploratory analysis and evaluate just that
piece of code. If you are familiar with Jupyter or similar notebooks, you will recognize the workflow
Comments
Post a Comment