For better or worse, having lots of time inside during the COVID-19 Pandemic has led me to tackle some of my outstanding R coding problems. Here’s one possible solution I’ve come up with for generating random community abundance matrices following one of the suggested approaches in Ulrich and Gotelli 2010.
Implementing the IT random matrix filling / creation algorithm from Ulrich and Gotelli 2010, Ecology.
To quote from the text:
IT assigns individuals randomly to matrix cells with probabilities proportional to observed row and column abundance totals until, for each row and column, total abundances are reached.
Approach
To sample with “probabilities proportional to observed row and column abundance totals”, I used the chisq.test
function. The expected values in a matrix (or contingency table) are a function of the row and column abundances. I believe this approach is supported by Ulrich and Gotelli’s comparison of the results they found using the IT algorithm compared to \(\chi^2\) analysis for 200 random matrices (p. 3395).
Using these expected values, we can generate matrices comprised of all zero values except a single 1 value for a random matrix element. The element that is assigned a value of 1 will be decided based on a random draw from the matrix indices that is weighted by the expected values in the original matrix. Next, by adding a series of these random matrices we can make a new matrix with the same matrix sum as the original matrix. Finally, using an if
statement, we can do this while assuring that the row and column abundances remain the same as the original matrix.
Working through an example
Below I present two examples of using this approach. The first comes from the help file for the chisq.test
function, with some small modifications.
chisq.test
example
First thing to do is to use chisq.test
to get the expected counts for any matrix. These expected counts are a function of the row and column sums:
\[ \text{expected count} = \frac{\text{row sum} \times \text{column sum}}{\text{matrix sum}} \]
M <- as.table(rbind(c(762, 327, 468), c(484, 239, 477)))
Xsq <- chisq.test(M) # Prints test summary
# Xsq$observed # observed counts (same as M)
Xsq$expected # expected counts under the null
## A B C
## A 703.6714 319.6453 533.6834
## B 542.3286 246.3547 411.3166
Next, we use a while
loop to make a new matrix with a matrix sum equal to that of the original matrix. Within the while
loop, new matrices are created with all zeros except a randomly placed value of 1. We’re adding an if
statement to “throw out” new zero matrices that result in row or column sums that are greater than those for the original matrix.
# Make a new matrix of all zeros
new_mat <- M * 0
while(sum(new_mat) < sum(M)){
zero_mat <- M * 0
zero_mat[sample(1:length(Xsq$expected), size = 1, prob = Xsq$expected)] <- 1
new_mat_temp <- new_mat + zero_mat
if( !any((rowSums(new_mat_temp) - rowSums(Xsq$expected)) > 0 ) &
!any((colSums(new_mat_temp) - colSums(Xsq$expected)) > 0 )) {
new_mat <- new_mat_temp
}
}
new_mat
## A B C
## A 707 309 541
## B 539 257 404
Finally, let’s do just a few checks to make sure the matrix sums are the same and the row and column sums are the same.
sum(new_mat) == sum(M)
## [1] TRUE
all(rowSums(new_mat) == rowSums(Xsq$expected))
## [1] TRUE
all(colSums(new_mat) == colSums(Xsq$expected))
## [1] TRUE
A larger random matrix
Here I’m applying the same setup as above on a 10 x 10 matrix, with shuffled values between 1 and 100.
rand_matr <- matrix(sample(1:100), nrow = 10, ncol = 10)
rand_matr_exp <- chisq.test(rand_matr)$expected
# Make a new matrix of all zeros
new_mat <- rand_matr * 0
while(sum(new_mat) < sum(rand_matr)){
zero_mat <- rand_matr * 0
zero_mat[sample(1:length(rand_matr_exp), size = 1, prob = rand_matr_exp)] <- 1
new_mat_temp <- new_mat + zero_mat
if( !any((rowSums(new_mat_temp) - rowSums(rand_matr_exp)) > 0 ) &
!any((colSums(new_mat_temp) - colSums(rand_matr_exp)) > 0 )) {
new_mat <- new_mat_temp
}
}
new_mat
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 32 36 44 66 65 45 52 30 43 24
## [2,] 47 52 68 61 70 70 82 40 58 50
## [3,] 35 56 44 48 75 51 61 40 53 49
## [4,] 38 53 70 70 67 69 67 43 55 45
## [5,] 39 47 58 48 84 42 64 48 50 45
## [6,] 15 39 32 48 48 35 51 22 34 27
## [7,] 48 63 73 75 106 83 67 48 52 55
## [8,] 44 50 67 60 75 67 59 41 50 49
## [9,] 29 33 22 42 38 39 34 22 35 26
## [10,] 41 42 54 53 52 52 50 43 66 45
sum(new_mat) == sum(rand_matr)
## [1] TRUE
all(rowSums(new_mat) == rowSums(rand_matr_exp))
## [1] TRUE
all(colSums(new_mat) == colSums(rand_matr_exp))
## [1] TRUE
Making a function for this
Let’s make this into a function
it_randomize <- function(matr){
# perform chisq test to get expected values
matr_exp <- chisq.test(matr)$expected
# make a new matrix of all zeros
new_mat <- matr * 0
# use a while loop to add matrices with a single, randomly
# placed value of 1 until
# the new_mat sum is the same as the original matrix
while(sum(new_mat) < sum(matr)){
# make a random matrix
# this is done by randomly choosing a matrix index value, weighted
# by the expected matrix values at each index
zero_mat <- matr * 0
zero_mat[sample(1:length(matr_exp), size = 1, prob = matr_exp)] <- 1
# add the random matrix to the new_mat temporalily
new_mat_temp <- new_mat + zero_mat
# keep new_mat_temp if it doesn't result in over filling the
# row or column sums
if( !any((rowSums(new_mat_temp) - rowSums(matr_exp)) > 0 ) &
!any((colSums(new_mat_temp) - colSums(matr_exp)) > 0 )) {
new_mat <- new_mat_temp
}
}
return(new_mat)
}
Apply the new function
new_mat <- it_randomize(M)
new_mat
## A B C
## A 717 306 534
## B 529 260 411
sum(new_mat) == sum(M)
## [1] TRUE
all(rowSums(new_mat) == rowSums(Xsq$expected))
## [1] TRUE
all(colSums(new_mat) == colSums(Xsq$expected))
## [1] TRUE
new_mat <- it_randomize(rand_matr)
new_mat
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 39 38 37 49 61 51 56 32 33 41
## [2,] 49 61 51 63 90 59 61 51 50 63
## [3,] 33 45 55 59 73 56 51 40 59 41
## [4,] 31 63 70 71 76 58 70 39 52 47
## [5,] 43 53 62 59 59 58 48 39 60 44
## [6,] 38 34 43 44 33 35 35 24 36 29
## [7,] 40 67 78 73 87 68 83 51 70 53
## [8,] 35 35 54 62 75 71 82 42 59 47
## [9,] 24 31 33 32 45 37 40 26 32 20
## [10,] 36 44 49 59 81 60 61 33 45 30
sum(new_mat) == sum(rand_matr)
## [1] TRUE
all(rowSums(new_mat) == rowSums(rand_matr_exp))
## [1] TRUE
all(colSums(new_mat) == colSums(rand_matr_exp))
## [1] TRUE