ACCUEIL TICE > Ressources vidéos > Conférences > StatLearn 2012 - Workshop on "Challenging problems in Statistical Learning" 

StatLearn 2012 - Workshop on "Challenging problems in Statistical Learning"

L'apprentissage statistique joue de nos jours un rôle croissant dans de nombreux domaines scientifiques et doit de ce fait faire face à des problèmes nouveaux. Il est par conséquent important de proposer des méthodes d'apprentissage statistique adaptées aux problèmes modernes posés par les différents champs d'application. Outre l'importance de la précision des méthodes proposées, elles devront également apporter une meilleure compréhension des phénomènes observés. Afin de faciliter les contacts entre les différentes communautés et de faire ainsi germer de nouvelles idées, un colloquium d'audience internationale (en langue anglaise) sur le thème «Challenging problems in Statistical Learning» a été organisé à l'Université Paris 1 les 5 et 6 avril 2012. Vous trouverez ci-dessous les enregistrements des exposés donnés lors de ce colloquium.

Ce colloquium a été organisé par C. Bouveyron, Christophe Biernacki , Alain Célisse , Serge Iovle & Julien Jacques (Laboratoire SAMM, Paris 1, Laboratoire Paul Painlevé, Université Lille 1, CNRS & Modal, INRIA), avec le soutien de la SFdS.

Recommandé à : étudiants de la discipline, chercheur
Catégorie : conférences
Réalisation : 2012

Ressources vidéo 
1.1 Dimension reduction based on finite mixture modeling of inverse regression (Luca Scrucca)
5 avril 2012
Consider the usual regression problem in which we want to study the conditional distribution of a response Y given a set of predictors X. Sufficient dimension reduction (SDR) methods aim at replacing the high-dimensional vector of predictors by a lower-dimensional function R(X) with no loss of information about the dependence of the response variable on the predictors. Almost all SDR methods res [Tout afficher]

1.2 Information Visualization: An Introduction to the Field and Applications for Statistics (Petra Isenberg)
5 avril 2012
Information visualization is a research area that focuses on making structures and content of large and complex data sets visually understandable and interactively analyzable. The goal of information visualization tools and techniques is to increase our ability to gain insight and make decisions for many types of datasets, tasks, and analysis scenarios. With the increase in size and complexity of [Tout afficher]

2.1 Hypothesis Testing and Bayesian Inference: New Applications of Kernel Methods (Arthur Gretton)
5 avril 2012
In the early days of kernel machines research, the "kernel trick" was considered a useful way of constructing nonlinear learning algorithms from linear ones, by applying the linear algorithms to feature space mappings of the original data. Recently, it has become clear that a potentially more far reaching use of kernels is as a linear way of dealing with higher order statistics, by mapping proba [Tout afficher]

2.2 Functional estimation in high dimensional data : Application to classification (Sophie Dabo-Niang)
5 avril 2012
Functional data are becoming increasingly common in a variety of fields. Many studies underline the importance to consider the representation of data as functions. This has sparked a growing attention in the development of adapted statistical tools that allow to analyze such kind of data : functional data analysis (FDA). The aims of FDA are mainly the same as in the classical statistical analysis [Tout afficher]

2.3 Discriminative clustering for high-dimensional data (Camille Brunet)
5 avril 2012
A new family of 12 probabilistic models, introduced recently, aims to simultaneously cluster and visualize high-dimensional data. It is based on a mixture model which fits the data into a latent discriminative subspace with an intrinsic dimension bounded by the number of clusters. An estimation procedure, named the Fisher-EM algorithm has also been proposed and turns out to outperform other subsp [Tout afficher]

3.1 Exploring Clustering Structure in Ranking Data (Brendan Murphy)
6 avril 2012
Cluster analysis is concerned with finding homogeneous groups in a population. Model-based clustering methods provide a framework for developing clustering methods through the use of statistical models. This approach allows for uncertainty to be quantified using probability and for the properties of a clustering method to be understood on the basis of a well defined statistical model. Mixture mod [Tout afficher]

3.2 Co-clustering under different approaches (Mohamed Nadif)
6 avril 2012
Cluster analysis is an important tool in a variety of scientific areas including pattern recognition, document clustering, and the analysis of microarray data. Although many clustering procedures such as hierarchical, strict partitioning and overlapping clusterings aim to construct an optimal partition of objects or, sometimes, variables, there are other methods, known as co-clustering or block c [Tout afficher]

3.3 Complexity control in overlapping stochastic block models (Pierre Latouche)
6 avril 2012
Networks are highly used to represent complex systems as sets of interactions between units of interest. For instance, regulatory networks can describe the regulation of genes with transcriptional factors while metabolic networks focus on representing pathways of biochemical reactions. In social sciences, networks are commonly used to represent relational ties between actors. Numerous graph clust [Tout afficher]

4.1 Data-driven penalties: heuristics, results and thoughts... (Pascal Massart)
6 avril 2012
The idea of selecting a model via penalizing a log-likelihood type criterion goes back to the early seventies with the pioneering works of Mallows and Akaike. One can find many consistency results in the literature for such criteria. These results are asymptotic in the sense that one deals with a given number of models and the number of observations tends to infinity. A non asymptotic theory for [Tout afficher]

4.2 A sliced inverse regression approach for block-wise evolving data streams (Jérôme Saracco)
6 avril 2012
In this communication, we focus on data arriving sequentially by block in a stream. A semiparametric regression model involving a common EDR (Effective Dimension Reduction) direction B is assumed in each block. Our goal is to estimate this direction at each arrival of a new block. A simple direct approach consists in pooling all the observed blocks and estimate the EDR direction by the SIR (Slice [Tout afficher]

4.3 Transfer to an Unlabeled Task using kernel marginal predictors (Gilles Blanchard)
6 avril 2012
We consider a classification problem: the goal is to assign class labels to an unlabeled test data set, given several labeled training data sets drawn from different but similar distributions. In essence, the goal is to predict labels from (an estimate of) the marginal distribution (of the unlabeled data) by learning the trends present in related classification tasks that are already known. In th [Tout afficher]