STAT 513/413: Lecture 3 R in style and spirit
(looks are important)
One reason STAT 513 was created
Last time, we arrived to a script that looked like this
What is wrong with that?
1
A simple answer
Does any published book with R feature a code like that? Really, does it?
Most of the code out there is typically: monospaced
properly styled
and also
in the R spirit
and sometimes commented
(On the other hand: nothing is a dogma here, and there is almost always more than one way to do it)
But in this course we better agree on some standards So let us work on a code improvement
2
Monospaced is easy
Just use the appropriate font, or even better, the appropriate editor, or even better, the appropriate format
m=1
n=0
for (k in 1:20) { m[k]=k
n[k]=2+3*m[k]+rnorm(1) }
plot(m,n)
3
Now: style
Well, there is a bylaw on that, but roughly this:
code inside braces should be indented
indent is two or four spaces (consistently throughout though)
unless you continue a function: then you return where it started because you should break long lines into nicer shorter ones
closing brace } should have its own line
there should be spaces
but not excessively many of them (no function ( x , y ) , say)
Some refined ones:
use <-, not =, and certainly not -> use TRUE, FALSE, not merely T, F
Finally, comments: you should use, but not abuse; use taste
4
References on the bylaw
More precisely here:
https://style.tidyverse.org http://adv-r.had.co.nz/Style.html https://google.github.io/styleguide/Rguide.xml R Code Style R Bloggers (RStudio)
5
And the best way to that is
via the programming editor does it for you automagically (note: it is important your files have extension .R)
Some of those are
ESS with Emacs
RStudio (configurable!), ATOM,
It is also possible to run your code through R packages:
styler formatR
6
m=1
n=0
for (k in 1:20) {
m[k]=k
n[k]=2+3*m[k]+rnorm(1)
}
plot(m,n)
This is a bit C style; some may prefer
m=1
n=0
for (k in 1:20)
{
m[k]=k
n[k]=2+3*m[k]+rnorm(1)
}
plot(m,n)
With us, both are fine
So: organization
7
Ah, spacing now!
m=1
n=0
for (k in 1:20) {
m[k] = k
n[k] = 2 + 3 * m[k] + rnorm(1)
}
plot(m, n)
Here, there is more leeway; I personally prefer less in formulas. Somebody else may add also vertical spaces, to separate important blocks of code:
m=1 n=0
for (k in 1:20) {
m[k] = k
n[k] = 2 + 3*m[k] + rnorm(1)
}
plot(m, n)
8
And let us do also the assignments
Well, at least if you want to publish book on R, you cannot go with = But on the other hand, you may be also a bit fancy
m <- 1; n <- 0for (k in 1:20) {m[k] <- kn[k] <- 2 + 3*m[k] + rnorm(1)}plot(m, n)9So, could I publish the book on R?(Everybody did already…)Well, the code look is OK now – but the contentsFor instance, you do not do loops in R: you vectorize if you can… Rule of thumb: the less lines of code in R, the better.But this is cheap:m <- 1; n <- 0Succesful vectorization is much better – how about thism <- 1:20n <- 2 + 3*m + rnorm(20)plot(m, n)(There is no need for empty lines – they would not count anyway – when there are only three lines of code altogether)10So, what is the R spirit?Well, this aspect is not that easily encapsulated into few guidelines – we will rather strive all this course to get an idea what it isBut one thing we may start immediately with: avoid loops…… think in terms of vectors/matrices, if possibleAnother one, related to the previous one: use the code of experts11For now, perhaps the last touch# points scattered about a linem <- 1:20n <- 2 + 3*m + rnorm(20)plot(m, n)12But do not overdo itComments yes, but less is more – unlike this# points scattered about a line# assign 1:20 to mm <- 1:20# n lines on the line 2+3m + random errorn <- 2 + 3*m + rnorm(20)# plotting the resultplot(m, n)If at all – if you really must – then at least like this### points scattered about a linem <- 1:20# uniformly spreadn <- 2 + 3*m + rnorm(20) # normal errorplot(m, n)13Modus operandi already mentioned: functions function: the input can be varied in a better way than a script – which has to be reedited – and the variables inside the function do not mess up in your working environment (scoping)line <- function(x, s=1, a=2, b=3)### plots x points approximately following a line with given### intercept and slope, plus normal error controled by s{m <- 1:xn <- a + b*m + rnorm(x,0,s)plot(m, n)}All this process enables you to vary input – first in script, then in function – and thus get some more confidence that the whole concoction does the right thingHowever, once again: for this course we are just fine with scripts – although successful scripts can be easily upgraded to functions, and those are allowed as well14However: a word about packagesPackages, add-ons, are very useful at times; they may save us unnecessary workHowever, this course is not about R, but about statistical computing. This implies the following rulePACKAGES ARE NOT TO BE USEDunless (every rule has an exemption) they are not essential to the understanding of what needs to be doneExample: if a problem asks for constructing a generator of random numbers with a prescribed distribution, then its solution is not finding on the internet a package that does it. That misses the point; it is better to learn something via programming it. But, if such a generator is just a small component used for achieving a more complex objective, it is fine to use a packageIf in doubt, better ask!15
Reviews
There are no reviews yet.