# Courses

There will be three courses offered, each consisting of four 90-minute sessions. The speakers are:

Details of the courses can be found below.

## Geometry of convex cones with applications to high-dimensional statistics

#### Zakhar Kablucho

This series of lectures will be devoted to applications of convex geometry to problems of high-dimensional statistics. The basic notion is that of a polyhedral convex cone in the $$d$$-dimensional Euclidean space. It is defined as an intersection of finitely many half-spaces whose bounding hyperplanes pass through the origin. In other words, a polyhedral convex cone is the set of solutions to a system of finitely many linear homogeneous inequalities. Alternatively, a polyhedral cone can be defined as the set of positive linear combinations $$\lambda_1 x_1+\ldots+\lambda_n x_n$$ of a finite collection of vectors $$x_1,\ldots, x_n$$ in $$\mathbb R^d$$ with non-negative coefficients $$\lambda_1,\ldots,\lambda_n\geq 0$$. By intersecting a polyhedral cone with the unit sphere centered at the origin we obtain what is called a spherical polytope.  To each polyhedral cone $$C\subset \mathbb R^d$$ one can associate the vector of conic intrinsic volumes $$v_0(C), v_1(C),\ldots,v_d(C)$$. These conic intrinsic volumes are the spherical analogue of the usual intrinsic volumes studied in convex geometry. We shall review the definition and main properties of conic intrinsic volumes and describe their applications to some problems of  stochastic geometry, for example to counting faces of randomly projected cubes, simplices and other polytopes. These problems, studied in the works of Donoho and Tanner, have applications to high-dimensional statistics, signal processing and compressed sensing which we shall also explain. On the other hand,  applications of conic intrinsic volumes to phase transitions in convex optimization problems with random data have been discovered in the work of Amelunxen, Lotz, McCoy, Tropp. We shall explain these applications and, if time permits, also the interconnections between conic integral geometry and various other topics such as random matrices, hyperplane arrangements and the classical Sparre Andersen arcsine laws for random walks.

## Principal Component Analysis: some recent results and applications.

#### Karim Lounici

Several recent applications in statistics, machine learning or numerical analysis can be formulated as high-dimensional matrices processing problems. Extracting information efficiently from these objects often require to develop new computationally efficient methods. Understanding how and when these methods work is a fascinating topic of research that require to combine tools from several fields of mathematics: statistics, probability, perturbation theory and convex optimization. In this course, we will review how to use these tools in the context of Principal Component Analysis to analyse the performances of the standard PCA method. Results will include concentration bounds, asymptotic distributions and minimax lower bounds for functionals of spectral projectors. Next, we will explain how to exploit this recent theory to provide some insight into some new or longstanding problems in machine learning including Gaussian mixture, graph clustering, domain adaptation.

## Statistical inference of incomplete data models to analyse ecological networks

#### Stéphane Robin

Ecological networks aim at describing the interactions between a set of species sharing a same ecological niche. The interactions constituting a network can be directly observed (e.g. via plant-pollinator contacts) or may need to be reconstructed based on the fluctuations of the species’ abundance across different sites. Statistical models are needed either to describe the organisation (or ‘topology’) of an observed network, or to infer the set of interactions that underlies the joint distribution of the abundances. Various models have been proposed for both purposes.

These lectures will focus on two emblematic families of models. The stochastic block-models (SBM) are dedicated to the topological analysis of observed networks and assumes that species have different roles in the network and that the interaction between them depend on their respective roles. The Poisson log-normal (PLN) model is a joint species distribution model (JSDM) that relies on a Gaussian latent layer. Interestingly, both models are incomplete data models and their statistical inference raise similar issues.

After a brief reminder about most popular methods for the inference of incomplete data models, we will show that they do not apply to SBMs or PLN models. We will introduce inference methods based on variational algorithms, which rely on an approximation of the conditional distribution of the unobserved variables given the observed data. Such algorithms have been shown to be efficient for the inference of a large class of incomplete data models, but their theoretical understanding remains itself incomplete.  Eventually, we will discuss various leads to combine variational approximations with statistically grounded estimation procedures.