Optimal sample allocation in experiments

Saturday, October 11, 2014

$$ \newcommand{\trace}{\mathop{\rm trace}\nolimits} $$

One of the recurrent problems I face at work is about sample allocation. The situation is as follows. Given a fixed sample size that is determined by an external “budget,” I have to perform an experiment to test a standard treatment against two alternatives to check which one is superior. For instance, we have a feature in place that I need to test against two possible replacements to know which one we should use moving forward. Because of that, I am not interested in the value of each treatment, as much as on the difference between the effect of two treatments. Although there are occassions in which we have more complex structures, in most situations I can get away with complete randomization.

The intuitive solution in this case would be to divide $$N$, the sample size I am allowed to use, into roughly equal parts. However, in most cases it makes sense for me to attempt to use the sample allocation to optimize over a reasonable quantity, such as the variance of the estimates in which I am interested.

Following Morris (2010), say I am estimating a model such as

where i$ has received treatment $j$. The rest of the notation is exactly what you expect. As I said, I am usually less interested in C$ is a linear transformation. For instance, with two treatments, if I am interested in testing $$\tau_1 - \tau_2$ and $\tau_1-\tau_3$, the matrix $C$ would be:

Given that, the variance of the quantities of interest for me is given by

which is the estimate of C$.

One reasonable quantity to minimize is thus X’X$ is a square diagonal matrix $t\times t$ with the size allocated to each treatment (the vector n$, subject to the constraint $N=n’ \mathbf{1}$, we can write the Lagrangian as

to get that the (continuous) number of cases for treatment $$i$ is

where $$s_i = \{C’C\}_{ii}$, the $i$-th element in the main diagonal of $C’C$.

Now let’s put some R code. Consider, for example, a case with three alternative treatments aside from the standard treament, in which we want to test C$

C <- matrix(c(1, -1,  0,  0,
              1,  0, -1,  0,
              1,  0,  0, -1), byrow=TRUE, 3, 4)

With a $$N=100,$ the optimal sample allocation will be given then by:

print(100*(sqrt(diag(t(C) %*% C)))/sum(sqrt(diag(t(C) %*% C))))
[1] 36.6 21.1 21.1 21.1 

which we can round to n = (25, 25, 25, 25):$

N1 <- c(25, 25, 25, 25)
N2 <- c(37, 21, 21, 21)

tau <- c(1.1, 2.4, 4.2, 6.2)
alpha <- 3.1

R <- 1000
s1 <- s2 <- vector('list', R)
for (i in 1:R) {
    ee <- lapply(c(sum(N1), sum(N2)), function(x) rnorm(x, 0, 1))
	X <- lapply(c(sum(N1), sum(N2)),
	            function(x) matrix(rnorm(x*4, 0, 1), x, 4))

    y <- lapply(c(1, 2),
	            function(m) alpha + X[[m]] %*% tau + ee[[m]])

    sigma <- lapply(c(1, 2),
	                function(m) summary(lm(y[[m]] ~ X[[m]]))$$sigma^2)

    s1[[i]] <- sigma[[1]] * C %*% diag(1/N1) %*% t(C)
	s2[[i]] <- sigma[[2]] * C %*% diag(1/N2) %*% t(C)
}

In the simulation, a thousand replications of the experiment are performed and, for each replication, we extract the covariance matrix associated with $$C\tau$. We can now calculate the mean in each of the two to check that

print(mean(sapply(s1, function(x) mean(diag(x)))))
[1] 0.0801
print(mean(sapply(s2, function(x) mean(diag(x)))))
[1] 0.0753

where we see that the diagonal elements (all elements, as a matter of fact) of the design using N2 are smaller than with N1.

Dialogue & Discussion