ACCUEIL TICE > Ressources vidéos > Conférences > StatLearn 2013 - Workshop on "Challenging problems in Statistical Learning"  

StatLearn 2013 - Workshop on "Challenging problems in Statistical Learning"

 L'apprentissage statistique joue de nos jours un rôle croissant dans de nombreux domaines scientifiques et doit de ce fait faire face à des problèmes nouveaux. Il est par conséquent important de proposer des méthodes d'apprentissage statistique adaptées aux problèmes modernes posés par les différents champs d'application. Outre l'importance de la précision des méthodes proposées, elles devront également apporter une meilleure compréhension des phénomènes observés. Afin de faciliter les contacts entre les différentes communautés et de faire ainsi germer de nouvelles idées, un colloquium d'audience internationale (en langue anglaise) sur le thème «Challenging problems in Statistical Learning» a été organisé à l'Université Bordeaux Segalen les 8 et 9 avril 2013. Vous trouverez ci-dessous les enregistrements des exposés donnés lors de ce colloquium.

Ce colloquium a été organisé par Charles Bouveyron, François Caron, Marie Chavent, Robin Genuer, Vanessa Kuentz-Simonet, Pierre Latouche & Jérôme Saracco.
Recommandé à : étudiant de la discipline, chercheur
Catégorie : conférences
Réalisation : 2013

Ressources vidéo 
Bayesian inference for the exponential random graph model (Nial Friel)
16 mai 2013 - 17 mai 2013
The exponential random graph is arguably the most popular model for the statistical analysis of network data. However despite its widespread use, it is very complicated to handle from a statistical perspective, mainly because the likelihood function is intractable for all but trivially small networks. This talk will outline some recent work in this area to overcome this intractability. In particul [Tout afficher]

Clustering of variables combined with variable selection using random forests : application to gene expression data (Robin Genuer & Vanessa Kuentz-Simonet)
16 mai 2013 - 17 mai 2013
The main goal of this work is to tackle the problem of dimension reduction for highdimensional supervised classification. The motivation is to handle gene expression data. The proposed method works in 2 steps. First, one eliminates redundancy using clustering of variables, based on the R-package ClustOfVar. This first step is only based on the exploratory variables (genes). Second, the synthetic v [Tout afficher]

Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator (Arnaud Doucet)
16 mai 2013 - 17 mai 2013
When an unbiased estimator of the likelihood is used within an Markov chain Monte Carlo (MCMC) scheme, it is necessary to tradeoff the number of samples used against the computing time. Many samples for the estimator will result in a MCMC scheme which has similar properties to the case where the likelihood is exactly known but will be expensive. Few samples for the construction of the estimator wi [Tout afficher]

Investigating on nonlinear relationship in high-dimensional setting (Frédéric Ferraty)
16 mai 2013 - 17 mai 2013
The high dimensional setting is a modern and dynamic research area in Statistics. It covers numerous situations where the number of explanatory variables is much larger than the sample size. This is the case in genomics when one observes (dozens of) thousands genes expression ; typically one has at hand a small sample of high dimensioned vectors derived from a large set of covariates. Such dataset [Tout afficher]

Learning with the Online EM Algorithm (Olivier Cappé)
16 mai 2013 - 17 mai 2013
The Online Expectation-Maximization (EM) is a generic algorithm that can be used to estimate the parameters of latent data models incrementally from large volumes of data. The general principle of the approach is to use a stochastic approximation scheme, in the domain of sufficient statistics, as a proxy for a limiting, deterministic, population version of the EM recursion. In this talk, I will br [Tout afficher]

Modular priors for partially identified models (Ioanna Manolopoulou)
16 mai 2013 - 17 mai 2013
This work is motivated by the challenges of drawing inferences from presence-only data. For example, when trying to determine what habitat sea-turtles "prefer" we only have data on where turtles were observed, not data about where the turtles actually are. Therefore, if we find that our sample contains very few turtles living in regions with tall sea grass, we cannot conclude that these areas are [Tout afficher]

New challenges for (biological) network inference with sparse Gaussian graphical models (Julien Chiquet)
16 mai 2013 - 17 mai 2013
Network inference methods based upon sparse Gaussian Graphical Models (GGM) have recently emerged as a promising exploratory tool in genomics. They give a sounded representation of direct relationships between genes and are accompanied with sparse inference strategies well suited to the high dimensional setting. They are also versatile enough to include prior structural knowledge to drive the infe [Tout afficher]

Regularized PCA to denoise and visualize data (Julie Josse)
16 mai 2013 - 17 mai 2013
Principal component analysis (PCA) is a well-established method commonly used to explore and visualize data. A classical PCA model is the fixed effect model where data are generated as a fixed structure of low rank corrupted by noise. Under this model, PCA does not provide the best recovery of the underlying signal in terms of mean squared error. Following the same principle as in ridge regression [Tout afficher]

Strategies to analyze (Benoît Liquet)
16 mai 2013 - 17 mai 2013
Recent technological advances in molecular biology have given rise to numerous large scale datasets whose analysis have risen serious methodological challenges mainly relating to the size and complex structure of the data. Considerable experience has been gained over the past decade, mainly in genetics, from the Genome-Wide Association Study (GWAS) era, and more recently in transcriptomics and met [Tout afficher]