Approximating from Sparse Discrete Observations
| | |
Problem Description
Models using categorical data are often faced with few observations and large support distributions. Asymptotic characterizations of the estimators are of little use in such situations. Thus, we are interested in developing estimating procedures that are adapted to extracting the most information from the sparse observations available.
Another related problem, appears when the underlying model is high-dimensional: one might have the knowledge of one marginal distribution. One should be able to accommodate this knowledge into the the estimators
Modelling & Computational Challenges
Having to estimate from few observations over a (comparatively) large discrete support rules out the simple histogram based approximations, although these perform well asymptotically. Smoothing over adjacent cells can contribute to improve on this problem. Many categorical models assume some contiguity or adjacency between the cells, thus the idea of smoothing becomes more natural. This would justify that, observing a few observations that concentrate in some region means that a somewhat larger region has a significant probability.
The polynomial smoothers have serious drawbacks, particularly for the case of sparse observations: they are known to perform very well asymptotically but produce negative approximations for the probabilities. Thus, we need to develop another class of smoothers, that preserve the nonnegativeness and that perform well, if not asymptotically, in the presence of sparse observations.
Research at LCM
We have introduced families of penalized polynomial smoothers. There is computational work to be done in order to have some insight on the optimization. The computational simulations produced so far have hinted a good performance, especially in two dimensional problems. The influence of weighting functions needs to be addressed and optimized.
The theoretical aspects of the estimators are to be addressed. We are interested on finite distance properties, rather than in asymptotic properties. The estimators have rather complex expressions to be manipulated, so several difficulties need to be overcome.
Papers & Reports
- [1] P. Jacob, P. E. Oliveira, Penalized smoothing of discrete distributions with sparse observations, Preprint 06-28 of the Department of Mathematics, University of Coimbra, 2006
- [2] P. Jacob, P. E. Oliveira, Penalized smoothing of sparse tables, Preprint 07-02 of the Department of Mathematics, University of Coimbra, 2007
- [3] Jacob, P., Oliveira, P.E., Relative smoothing of discrete distributions with sparse observations, J. Stat. Comput. Simul. 81 (2011), 109-121
- [4] Jacob, P., Oliveira, P.E., Local smoothing with given marginals, J. Stat. Comput. Simul. 82 (2012) 915-926
- [5] Martins, R., Oliveira, P.E., Schmitt, A., Estimation of age at death from the pubic symphysis and the auricular surface of the ilium using a smoothing procedure, Forensic Science International 219 (2012) 287.e1-287.e7
- [6] Martins, R., Oliveira, P.E., Schmitt, A., Révision de la méthode proposée par Schmitt (2005) pour estimer l'âge au décés des adultes à partir de la surface sacro-pelvienne iliaque, Poster on Colloque Groupement des Anthropologues de Langue Française, Dakar, Sénégal, May 18-21 2011
Software
- A R package is under preparation
Project Team
- Pierre Jacob, Institut de Mathématiques et Modélisation de Montpellier, Université de Montpellier II, France
- Paulo Eduardo Oliveira, LCM/CMUC