Optimal sample allocation in experiments
11 October 2014
$ \newcommand{\trace}{\mathop{\rm trace}\nolimits} $

One of the recurrent problems I face at work is about sample allocation. The situation is as follows. Given a fixed sample size that is determined by an external “budget,” I have to perform an experiment to test a standard treatment against two alternatives to check which one is superior. For instance, we have a feature in place that I need to test against two possible replacements to know which one we should use moving forward. Because of that, I am not interested in the value of each treatment, as much as on the difference between the effect of two treatments. Although there are occassions in which we have more complex structures, in most situations I can get away with complete randomization.

The intuitive solution in this case would be to divide $N$, the sample size I am allowed to use, into roughly equal parts. However, in most cases it makes sense for me to attempt to use the sample allocation to optimize over a reasonable quantity, such as the variance of the estimates in which I am interested.

Following Morris (2010), say I am estimating a model such as

where $X$ is a $n\times t$ matrix, in which $X_{ij}$ is an indicator of whether case $i$ has received treatment $j$. The rest of the notation is exactly what you expect. As I said, I am usually less interested in $\tau$ than I am in $C\tau$, where $C$ is a linear transformation. For instance, with two treatments, if I am interested in testing $\tau_1 - \tau_2$ and $\tau_1-\tau_3$, the matrix $C$ would be:

Given that, the variance of the quantities of interest for me is given by

which is the estimate of $\var(\hat \tau)$ that we know from least squares but transformed by $C$.

One reasonable quantity to minimize is thus $ \trace(C (X’ X)^{-1} C’) $, the average variance (after removing constants), where it is convenient to note that $X’X$ is a square diagonal matrix $t\times t$ with the size allocated to each treatment (the vector $n$) in the main diagonal. Minimizing over $n$, subject to the constraint $N=n’ \mathbf{1}$, we can write the Lagrangian as

to get that the (continuous) number of cases for treatment $i$ is

where $s_i = \{C’C\}_{ii}$, the $i$-th element in the main diagonal of $C’C$.

Now let’s put some R code. Consider, for example, a case with three alternative treatments aside from the standard treament, in which we want to test $\tau_1 - \tau_2$, $\tau_1 - \tau_3$, and $\tau_1 - \tau_4$. Let’s first build the matrix $C$

C <- matrix(c(1, -1,  0,  0,
              1,  0, -1,  0,
              1,  0,  0, -1), byrow=TRUE, 3, 4)

With a $N=100,$ the optimal sample allocation will be given then by:

print(100*(sqrt(diag(t(C) %*% C)))/sum(sqrt(diag(t(C) %*% C))))
[1] 36.6 21.1 21.1 21.1 

which we can round to $n = (37, 21, 21, 21)$. We can now test the average variance produced by this design relative to $n = (25, 25, 25, 25):$

N1 <- c(25, 25, 25, 25)
N2 <- c(37, 21, 21, 21)

tau <- c(1.1, 2.4, 4.2, 6.2)
alpha <- 3.1

R <- 1000
s1 <- s2 <- vector('list', R)
for (i in 1:R) {
    ee <- lapply(c(sum(N1), sum(N2)), function(x) rnorm(x, 0, 1))
	X <- lapply(c(sum(N1), sum(N2)),
	            function(x) matrix(rnorm(x*4, 0, 1), x, 4))

    y <- lapply(c(1, 2),
	            function(m) alpha + X[[m]] %*% tau + ee[[m]])

    sigma <- lapply(c(1, 2),
	                function(m) summary(lm(y[[m]] ~ X[[m]]))$sigma^2)

    s1[[i]] <- sigma[[1]] * C %*% diag(1/N1) %*% t(C)
	s2[[i]] <- sigma[[2]] * C %*% diag(1/N2) %*% t(C)
}

In the simulation, a thousand replications of the experiment are performed and, for each replication, we extract the covariance matrix associated with $C\tau$. We can now calculate the mean in each of the two to check that

print(mean(sapply(s1, function(x) mean(diag(x)))))
[1] 0.0801
print(mean(sapply(s2, function(x) mean(diag(x)))))
[1] 0.0753

where we see that the diagonal elements (all elements, as a matter of fact) of the design using N2 are smaller than with N1.