Likelihood and entropy. An example
20 July 2014

As an example of the ideas in the previous post, I am running here an example of an empirical likelihood estimation in the simplest possible setting following this example.

Asumme n=50 observations is extracted from $x_i \sim N(4, 1)$, $i=1,…,n$:

N <- 50
x <- rnorm(N, 4, 1)

The maximum likelihood estimate of the mean is simply

print(mean(x))
## [1] 4.086

Maximizing the program in the post with respect to $\lambda_2$1, resuts in the following condition

$ \lambda_2(\mu)=\arg_{\lambda_2}\left[\frac{1}{n}\sum_i\left[\frac{x_i-\mu}{1+\lambda_2(x_i-\mu)}\right]=0\right] $

and we can find the zeroes of the previous implicit function by minimizing:

lambda <- function(ll, x, mu) {
    n <- length(x) 
    num <- x - mu
    denom <- 1 + ll * num
    result <- (1/n) * sum(num/denom)
    return(abs(result))
}

The (inverse) of the probability of each observation is given by:

omega <- function(x, mu, lambda) {
    n <- length(x)
    return(n * (1 + lambda * (x - mu)))
}

which means that the log-likelihood program to maximize is $-\sum_i\ln(1/\omega_i(\lambda_2,\mu))$, that defines the empirical likelihood for each value of the Lagrange multiplier, which itself depends on $\mu$. Therefore, we can first solve the partially optimized problem with respect to $\lambda_2(\mu)$ for a given $\mu$, and then find the $\mu$ that maximizes $-\sum_i\ln(1/\omega_i(\lambda_2,\mu))$. Four things are worth noting. First is that, given the simplicity of the problem, grid search is a reasonable option. Second, that as in any grid search problem, we should check the behavior of the program in the boundaries. Third, that the interval in which the optimizer is going to be looking is defined by the upper and lower bounds that we can establish on $\lambda_2$ once we restrict the search space to the grid. Finally, that the grid is defined by the empirical range of $x$.

In this case, the vector of candidates is defined by

sgrid <- seq(min(x) + 1e-10, max(x) - 1e-10, by = 0.01)

and the actual optimization happens here:

el <- elambda <- vector("numeric", length(grid))
ip <- matrix(NA, nrow = length(sgrid), ncol = N)
i <- 1
for (i in 1:length(sgrid)) {
    lb <- (1 - 1/length(x))/(sgrid[i] - max(x))
    ub <- (1 - 1/length(x))/(sgrid[i] - min(x))
    elambda <- optimize(lambda, interval=c(lb, ub),
	                    x=x, mu=sgrid[i])$minimum
    ip[i, ] <- omega(x, sgrid[i], elambda)
    el[i] <- -sum(log(ip[i, ]))
    i <- i + 1
}

Therefore the estimate is

loc <- which(el == max(el))
print(sgrid[loc])
## [1] 4.08

which coincides with the ML estimate shown above. As a matter of fact, we can also see that the empirical probability assigned to each observation follows a nice normal distribution:

plot(sort(std(1/ip[loc,])), (1:N)/N, 
     type='l',
     lty=2, 
     main='Empirical cumulative distribution and Normal CDF',
     xlab='Standardized estimated probabilities',     
     ylab='Probability')
curve(pnorm(x), add = TRUE, col = 'red')

Image

And this is probably the first time I use the base library for graphics in R in 8 months.

  1. It can be shown that $\lambda_1=1$.