Lists and Vectors
Overview
Vectors
Vectors are one of the most important data structures in R. Vectors contain values that are all the same type. The following are some examples of vectors.
# a logical vector
logical_vector <- c(F, T, TRUE, FALSE)
typeof(logical_vector)
'logical'
Be careful! The code above could fail. Remember,
[1] "FALSE" "I\'m a string, now" "TRUE" "FALSE" |
# a double vector
double_vector <- c(1, 2, 3, 4)
typeof(nvec)
'double'
# a character vector
character_vector <- c("a", "b", "c", "d")
typeof(character_vector)
'character'
As mentioned before, as soon as you try and mix and match types, elements are coerced to the least-specific type. For example, the following code will coerce all elements in the vector to a character.
typeof(c(0, 1, 2))
'double'
typeof(c(0, 1, 2, "ok"))
'character'
Lists
Lists are vectors that can contain any class of data. For example, the following produces a list with various types of elements.
my_list <- list(TRUE, 1, 2, "OK", c(1,2,3))
typeof(my_list)
'list'
You may be thinking, "well it looks like everything will be coerced to some other type, right?" Nope. The elements of a list keep their underlying storage type.
typeof(my_list[[1]])
typeof(my_list[[2]])
typeof(my_list[[5]])
'logical' 'double' 'double'
Indexing
Indexing enables us to access a subset of the elements in a vector or list. There are three types of indexing: positional indexing, logical indexing, and named indexing.
Vectors
- Positional
-
The following code demonstrates positional indexing. Here, we access the first few values in our vector,
vec
.
vec <- 1:10
vec[1:3]
[1] 1 2 3
It is important to note that R is a 1-based indexing system. What this means is that the first element in a vector is at index 1. This is contrary to many other extremely popular languages, such as Python, where the first element is at index 0. For example, the following code will result in no output as there is no element at index 0.
vec[0]
- Logical
-
The following code demonstrates logical indexing. Here, we access the first three elements in our vector,
vec
, but, instead of using positional indexing, we use a series of logical values.
vec <- 1:10
vec[c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)]
[1] 1 2 3
Of course, it is not very common to manually write out a series of TRUE
and FALSE
values in order to index. It would be much simpler to use positional indexing. Recall, however, that we can use logical operators to generate a logical vector to use for indexing!
vec <- 1:10
my_logical <- vec <= 3
vec[my_logical]
[1] 1 2 3
Or, even more succinctly.
vec <- 1:10
vec[vec <= 3]
[1] 1 2 3
- Named
-
Lastly, we have named indexing. In order to use named indexing, we must first create a vector, and give the values names by adding the names attribute to the vector. The following is an example of taking an unnamed vector, and providing it with names.
vec <- 1:10
attributes(vec)
NULL
vec <- 1:10
names(vec) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j")
attributes(vec)
$names= [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
Once names have been added, you can use them to index.
vec['a']
a: 1
Lists
Lists are slightly more difficult to index than vectors. Indexing a list with a single pair of square brackets will return a list, regardless of the type of the elements in the list, and regardless of the number of elements returned. For example, the following bits of code always return a list.
my_list <- list(TRUE, 1, 2, "OK", c(1,2,3), list("OK", 1,2, F))
typeof(my_list[1:2])
typeof(my_list[3])
'list' 'list'
In order to extract the data from a list to its original type, we must use double brackets.
my_list <- list(TRUE, 1, 2, "OK", c(1,2,3), list("OK", 1,2, F))
typeof(my_list[[1]])
typeof(my_list[[3]])
'logical' 'double'
It is important to note that you cannot use double brackets to extract more than 1 element from a list. |
Lastly, if we are dealing with a named list (i.e. a list with a names attribute), we can use the names attribute to extract the elements.
my_list <- list(first=TRUE, second=1, third=2, fourth="OK", embedded_vector=c(1,2,3), embedded_list=list("OK", 1,2, F))
typeof(my_list$first)
typeof(my_list$embedded_list)
'logical' 'list'
Examples
Given a vector, vec
, extract the values that are greater than 2.
vec <- c(1, 13, 2, 9)
vec[vec > 2]
[1] 13 9
Given a vector, vec
, extract the values greater than 5 and smaller than 10.
vec[vec > 5 & vec < 10]
[1] 9
Recycling
Often operations in R on two or more vectors require them to be the same length. When R encounters vectors with different lengths, it automatically repeats (recycles) the shorter vector until the length of the vectors is the same.
vec1 <- 1:10
vec2 <- 1:5
vec1 + vec2
[1] 2 4 6 8 10 7 9 11 13 15
As you can see in the output above, first, 1 is added to 1 to get 2, 2 is added to 2 to get 4, etc. Once vec2
runs out of numbers, the values are recycled as needed. So 1 is added to 6, 2 is added to 7, etc.
R will display a warning when a recycled vector is used in an operation. It is critical that you pay attention to this warning as it is often a sign that something unintentional is occuring.