Data Science part IV (Functions)

Chapter 1 ■ Introduction to R Programming 11 The function(x) x**2 expression defines the function, and anywhere you need a function, you can write the function explicitly like this. Assigning the function to a name lets you use the name to refer to the function, just like assigning any other value, like a number or a string to a name, will let you use the name for the value. Functions you write yourself works just like any function already part of R or part of an R package. With one exception, though: you will not have documentation for your own functions unless you write it, and that is beyond the scope of this chapter (but covered in Chapter 11). The square function just does a simple arithmetic operation on its input. Sometimes you want the function to do more than a single thing.
If you want the function to do several operations on its input, you need several statements for the function, and in that case you need to give it a “body” of several statements, and such a body has to go in curly brackets. square_and_subtract <- function(x, y) { squared <- x ** 2 squared - y } square_and_subtract(1:5, rev(1:5)) ## [1] -4 0 6 14 24 (Check the documentation for rev to see what is going on here. Make sure you understand what this example is doing.) In this simple example, we didn’t really need several statements. We could just have written the function as: square_and_subtract <- function(x, y) x ** 2 - y As long as there is only a single expression in the function, we don’t need the curly brackets. For more complex functions you will need it, though. The result of a function—what it returns as its value when you call it—is the last statement or expression (there really isn’t any difference between statements and expressions in R; they are the same thing). You can make the return value explicit, though, using the return() expression. square_and_subtract <- function(x, y) return(x ** 2 - y) This is usually only used when you want to return a value before the end of the function—and to see examples of this, you really need control structures, so you will have to wait a little bit to see an example—so it isn’t used as much as in many other languages. One important point here, though, if you are used to programming in other languages: the return() expression needs to include the parentheses. In most programming languages, you could just write: square_and_subtract <- function(x, y) return x ** 2 - y This doesn’t work for R. Try it, and you will get an error. Chapter 1 ■ Introduction to R Programming 12 Vectorized Expressions and Functions Many functions work with vectorized expressions just as arithmetic expressions. In fact, any function you write that is defined just using such expressions will work on vectors, just like the square function. This doesn’t always work. Not all functions take a single value and return a single value, and in those cases, you cannot use them in vectorized expressions. Take for example the function sum, which adds all the values in a vector you give it as an argument (check ?sum now to see the documentation). sum(1:4) ## [1] 10 This function summarizes its input into a single value. There are many similar functions, and naturally, these cannot be used element-wise on vectors. Whether a function works on vector expressions or not depends on how it is defined. Most functions in R either work on vectors or summarizes vectors like sum. When you write your own functions, whether the function works element-wise on vectors or not depends on what you put in the body of the function. If you write a function that just does arithmetic on the input, like square, it will work in vectorized expressions. If you write a function that does some summary of the data, it will not. For example, if we write a function to compute the average of its input like this: average <- function(x) { n <- length(x) sum(x) / n } average(1:5) ## [1] 3 This function will not give you values element-wise. Pretty obvious, really. It gets a little more complicated when the function you write contains control structures, which we will get to in the next section. In any case, this would be a nicer implementation since it only involves one expression: average <- function(x) sum(x) / length(x) Oh, one more thing: don’t use this average function to compute the mean value of a vector. R already has a function for that, mean, that deals much better with special cases like missing data and vectors of length zero. Check out ?mean. A Quick Look at Control Structures While you get very far just using expressions, for many computations you need more complex programming. Not that it is particularly complex, but you do need to be able to select a choice of what to do based on data—selection or if statements—and ways of iterating through data—looping or for statements. If statements work like this: if () Chapter 1 ■ Introduction to R Programming 13 If the Boolean expression evaluates to true, the expression is evaluated; if not, it will not. # this won't do anything if (2 > 3) "false" # this will if (3 > 2) "true" ## [1] "true" For expressions like these, where we do not alter the program state by evaluating the expression, there isn’t much of an effect in evaluating the if expression. If we, for example, assign it to a variable, there will be an effect. x <- "foo" if (2 > 3) x <- "bar" x ## [1] "foo" if (3 > 2) x <- "baz" x ## [1] "baz" If you want to have effects for both true and false expressions, you have this: if () else if (2 > 3) "bar" else "baz" ## [1] "baz" If you want newlines in if statements, whether you have an else part or not, you need curly brackets. This won’t work: if (2 > 3) x <- "bar" But this will: if (2 > 3) { x <- "bar" } An if statement works like an expression. if (2 > 3) "bar" else "baz" This evaluates to the result of the expression in the if or the else part. x <- if (2 > 3) "bar" else "baz" x ## [1] "baz"

Comments