DAL tutorial - Week 1

R basics

1 Why R?

R can be used to analyse all sorts of data, from tabular data (also known as “spreadsheets”), textual data, geographic data and even images.

This course will focus on the analysis of tabular data, since all of the techniques relevant to this type of data also apply to the other types.

The R community is a very inclusive community and it’s easy to find help. There are several groups that promote R in minority/minoritised groups, like R-Ladies, Africa R, and Rainbow R just to mention a few.

Moreover, R is open source and free!

2 The R console

R
  • R is a programming language.

  • We use programming languages to interact with computers.

  • You run commands written in a console and the task related to the command is executed.

We will begin our R journey with some basics concepts from computer science. The box above introduces you to three important concepts:

  • Programming languages.
  • Executing commands.
  • Console.

R comes with its own console. Open now the R Console.

It should look like the following (there will be some aesthetic differences since you are using Windows).

The Console is an interactive interface that allows you to input commands and execute them.

You know you can enter a command because the prompt (>) is displayed, and next to it you can see the text cursor (|) flashing.

Try writing the following command (you will learn more about R commands below):

cat("Hello!")

To execute the command (aka run the command), press ENTER/RETURN on your keyboard.

The command cat("Hello!") returns (aka outputs) in the console the text given between double quotes: Hello.

Congratulations, you have run your first R command! This command involved a function (more on functions below): the cat() function (no feline involvement…).

So there are different types of R commands that you can use. In the following sections you will learn about the basic types of R commands and what they can be used for.

You will learn more and more commands throughout the course. You don’t have to memorise them all at once: focus on understanding what they can be useful for and if you don’t remember the details, you can always check them!

3 R basics

In this part of the tutorial you will learn the very basics of R.

If you have prior experience with programming, you should find all this familiar. If not, not to worry! Make sure you understand the concept highlighted in the green boxes and practice the related skills.

For this tutorial, you will just run code directly in the R Console, i.e. you will type code in the Console and press ENTER/RETURN (ENTER from now on) to run it.

In future tutorials, you will learn how to save your code in a script file, so that you can keep track of what you have run and make your work reproducible.

3.1 R as a calculator

Write this line of code 1 + 2 in the Console, then press ENTER to run it.

Fantastic! You should see that the answer of the addition has been printed in the Console, like this:

[1] 3

(Never mind the [1] part for now).

Arithmentic operations

You can run arithmetic operations using maths operators: the most common are +, -, *, / for addition, subtraction, multiplication and division.

Now, try some more operations (write one line and press ENTER, then write the following line and so on…). Feel free to add your own operations to the mix!

67 - 13
2 * 4
268 / 43

You can also chain multiple operations.

6 + 4 - 1 + 2
4 * 2 + 3 * 2
Quiz 2

Are the following statements true of false?

  1. 3 * 2 / 4 returns the same result as 3 * (2 / 4)

  2. 10 * 2 + 5 * 0.2 returns the same result as (10 * 2 + 5) * 0.2

3.2 Variables

Forget-me-not.

Most times, we want to store a certain value so that we can use it again later.

We can achieve this by creating variables.

Variable

A variable holds one or more values and it’s stored in the computer memory for later use.

You can create a variable by using the assignment operator <-.

Let’s assign the value 156 to the variable my_num.

my_num <- 156

Now, you can just call the variable back when you need it! Write the following in the Console and press ENTER.

my_num
[1] 156

You should see the value of my_num being printed in the console.

A variable like my_num is called a numeric vector: i.e. a vector that contains a number (hence numeric).

Vector

A vector is an R object that contains one or more values of the same type.

A numeric vector is a type of vector. However, it’s fine in most cases to use the word variable to mean vector (just note that a variable can also be something else than a vector; you will learn about other R objects from next week).

Let’s now try some operations using variables.

income <- 1200
expenses <- 500
income - expenses
[1] 700

See? You can use math operators with variables too!

And you can also go all the way with variables.

savings <- income - expenses

Now check the value of savings

savings
[1] 700

Vectors can hold more than one item or value.

Just use the combine c() function to create a vector containing multiple values.

The following are all numeric vectors.

a <- 6
# Vector with 2 values
b <- c(6, 8)
# Vector with 3 values
c <- c(6, 8, 42)

You can check the type of vector (called class in R) with the class() function: for example, class(a) returns "numeric".

class(a)
[1] "numeric"
Numeric vector

A numeric vector is a vector that holds one or more numeric values.

Note that the following are the same:

a <- 6
a
[1] 6
d <- c(6)
d
[1] 6

Another important aspect of variables is that they are… variable! Meaning that once you assign a value to one variable, you can overwrite the value by assigning a new one to the same variable.

my_num <- 88
my_num <- 63
my_num
[1] 63

What if you want to know which variables you have created so far? Easy: use the ls() function. Just write ls() in the console and press ENTER: a list of existing variables will be returned.

Quiz 3

True or false?

  1. A vector can be created with the c() function.

  2. Not all variables are vectors.

  3. A numeric vector can only hold numeric values.

3.3 Functions

R cannot function without… functions.

We have encountered a few functions: cat(), c(), class() and ls().

Function

A function usually runs an operation on one or more specified arguments.

A function in R has the form function() where:

  • function is the name of the function, like cat.
  • () are round parentheses, inside of which you write arguments, separated by commas.

Let’s see an example with the function sum() (can you guess what it does?):

sum(3, 5)
[1] 8

The sum() function sums the numbers listed as arguments. Above, the arguments are 3 and 5.

And of course arguments can be vectors!

my_nums <- c(3, 5, 7)

sum(my_nums)
[1] 15
mean(my_nums)
[1] 5

Some functions work without specifying an argument, like ls().

You can also nest functions one inside the other: the output of the “lowest” function is used as the argument of the function above. Try and untangle the following.

y <- 10
u <- 6
i <- 7
o <- 2

cat(mean(c(sum(y, u), sum(i, o))))
12.5

Quiz 4

True or false?

  1. You can use functions within functions.

  2. All function arguments must be specified.

  3. All functions need at least one argument.

If you are familiar with Python, you will soon realise that R and Python, although they share many concepts and types of objects, they can differ substantially. This is because R is a functional programming language (based on functions) while Python is an Object Oriented programming language (based on methods applied on objects).

Generally speaking, functions look like print(x) while methods look like x.print()

3.4 String and logical vectors

Not just numbers.

We have seen that variables can hold numeric vectors. But vectors are not restricted to being numeric. They can also store strings.

A string is basically a set of characters (a word, a sentence, a full text).

In R, strings have to be quoted using double quotes " ".

Change the following strings to your name and surname. Remember to keep the double quotes

name <- "Stefano"
surname <- "Coretta"

name
[1] "Stefano"

Strings can be used as arguments in functions, like numbers can.

cat("My name is", name, surname)
My name is Stefano Coretta

Remember that you can reuse the same variable name to override the variable value.

name <- "Raj"

cat("My name is", name, surname)
My name is Raj Coretta

You can combine multiple strings into a character vector, using c().

Character vector

A character vector is a vector that holds one or more strings.

fruit <- c("apple", "oranges", "bananas")
fruit
[1] "apple"   "oranges" "bananas"

Use the class() function to check the vector class.

class(fruit)
[1] "character"

Another type of vector is one that contains either TRUE or FALSE. Vectors of this type are called logical vectors and their class is logical.

Logical vector

A logical vector is a vector that holds one or more TRUE or FALSE values.

groceries <- c("apple", "flour", "margarine", "sugar")
in_pantry <- c(TRUE, TRUE, FALSE, TRUE)
class(in_pantry)
[1] "logical"
data.frame(groceries, in_pantry)

TRUE and FALSE values must be written in all capitals and without double quotes (they are not strings!).

(We will talk about data frames, another type of object in R, in the following weeks.)

Quiz 5
  1. Which of the following is not a character vector.
  2. Which of the following is not a logical vector.

You can use the class() function to check the type (“class”) of a vector.

class(FALSE)
[1] "logical"
class(c(1, 45))
[1] "numeric"
class(c("a", "b"))
[1] "character"

5a

  • c(1, 2, "43") is a character vector because the last number "43" is a string (it’s between double quotes!). A vector cannot have a mix of types of elements: they have to be all numbers or all strings or else, but not some numbers and some strings. Numbers are special in that if you include a number in a character vector without quoting it, it is automatically converted into a string. Try the following:
char <- c("a", "b", "c")
char <- c(char, 1)
char
class(char)
  • c(letters) is a character vector because letters contains the letters of the alphabet as strings (this vector comes with base R).

  • c(apple) is not a character vector because the variable apple holds a number, 45!

5b

  • "FALSE" is not a logical vector because FALSE has been quoted (anything that is quoted is a string!).

This course does not cover programming in R in the strict sense, but if you are curious here’s a short primer on for-loops and if-else statements in R.

For-loops

fruits <- c("apples", "mangos", "durians")

for (fruit in fruits) {
  cat("I like", fruit, "\n")
}
I like apples 
I like mangos 
I like durians 

If-else

for (fruit in fruits) {
  if (grepl("n", fruit)) {
    cat(fruit, "has an 'n'", "\n")
  } else {
    cat(fruit, "does not have an 'n'", "\n")
  }
}
apples does not have an 'n' 
mangos has an 'n' 
durians has an 'n' 

4 Summary

You made it! You completed this week’s tutorial.

Here’s a summary of what you learnt.

  • R is a programming language while RStudio is an IDE.

  • You can perform mathematical operations with +, -, *, /.

  • You can store values in variables.

  • A typical object to be stored in a variable is a vector: there are different type of vectors, like numeric, character and logical.

  • Functions are used to perform an operation on its arguments: sum() sums it’s arguments, mean() calculates the mean and cat() prints the arguments.

If you are interested in learning about programming in R, I recommend you go through Chapters 26-28 of the R4DS book and the Advanced R book.

Note that these topics are not covered in the course, nor will be assessed.