Michael I. Jordan
加州大学伯克利分校任电气工程与计算机科学系和统计系杰出教授，智源学术顾问委员会委员。美国科学院、美国工程院、美国艺术与科学院三院院士，机器学习领域唯一位获此成就的科学家。他是多家国际顶级期刊和国际顶级学术会议（AAAS、AAAI、ACM、ASA、CSS、IEEE、IMS、ISBA、SIAM）的会士。他曾获IJCAI研究卓越奖（IJCAI Research Excellence Award）（2016）、David E. Rumelhart奖（2015）以及ACM / AAAI Allen Newell奖（2009）等。2016年，Jordan教授被Semantic Scholar评为CS领域最具影响力学者。其研究兴趣涵盖机器学习、统计、认知和生物科学等领域，近年来尤其集中在非参数贝叶斯分析、图模型、谱方法以及分布式计算、自然语言处理、信号处理和统计遗传学等方面。
清华大学计算机系教授、智源学者，智能技术与系统国家重点实验室副主任、卡内基梅隆大学兼职教授。2013年，入选IEEE Intelligent Systems的“人工智能10大新星”（AI’s 10 to Watch）。他主要从事机器学习研究，在国际重要期刊与会议发表学术论文80余篇。担任国际期刊IEEE TPAMI和Artificial Intelligence的编委、国际会议ICML 2014地区联合主席、以及ICML、NIPS等国际会议的领域主席。
1. Towards a theoretical understanding of learning to learn methods
议题简介：Optimization algorithms play a central role in deep learning. Recently a line of work tries to design better optimization algorithms using a meta-learning approach - where one optimizes the performance of an optimizer. However, there are challenges to this approach in both theory and practice. In this talk we will investigate the learning-to-learn approach on simple objectives. We show that (a) For simple quadratic objectives, one can design a loss function whose gradient is well-behaved and gradient descent converges, however the auto-differentiation tools based on backpropagation would run into numerical issues and fail to compute the gradient correctly. We also give a way to fix this issue. (b) Training the optimizer using validation loss is provably better than training the optimizer using training loss. The former can achieve good generalization performance while the latter could overfit even for simple quadratic functions. We verify these results using simple experiments on synthetic data as well as MNIST.
I am now an assistant professor at the Computer Science Department of Duke University. I got my Ph.D. from the Computer Science Department of Princeton University. My advisor is Sanjeev Arora. I was a post-doc at Microsoft Research, New England. I am broadly interested in theoretical computer science and machine learning. Modern machine learning algorithms such as deep learning try to automatically learn useful hidden representations of the data. How can we formalize hidden structures in the data, and how do we design efficient algorithms to find them? My research aims to answer these questions by studying problems that arise in analyzing text, images and other forms of data, using techniques such as non-convex optimization and tensor decompositions. See the Research page for more detail.
2. Near-Optimal Reinforcement Learning with Self-Play
议题简介：Self-play, where the algorithm learns by playing against itself without requiring any direct supervision, has become the new weapon in modern Reinforcement Learning (RL) for achieving superhuman performance in practice. However, the majority of existing theory in reinforcement learning only applies to the setting where a single agent plays against a fixed environment. It remains largely open how to design efficient self-play algorithms in two-player sequential games, especially when it is necessary to manage the exploration/exploitation tradeoff. In this talk, we present the first line of provably efficient self-play algorithms in a basic setting of tabular episodic Markov games. Our algorithms further feature the near-optimal sample complexity---the number of samples required by our algorithms matches the information-theoretic lower bound up to a polynomial factor of the length of each episode.
Chi Jin is assistant professor of Electrical Engineering at Princeton University. He obtained his Ph.D. in Computer Science at UC Berkeley, advised by Michael I. Jordan. He received his B.S. in Physics from Peking University. His research interest lies in theoretical machine learning, with special emphases on nonconvex optimization and reinforcement learning. His representative work includes proving noisy gradient descent / accelerated gradient descent escape saddle points efficiently, proving sample complexity bounds for Q-learning / LSVI with UCB, and designing near-optimal algorithms for minimax optimization.
3. How Private Are Private Algorithms?
议题简介：Privacy-preserving data analysis has been put on a firm mathematical foundation since the introduction of differential privacy (DP) in 2006. This privacy definition, however, has some well-known weaknesses: notably, it does not tightly handle composition. In this talk, we propose a new relaxation of DP that we term "f-DP", which has a number of appealing properties and avoids some of the difficulties associated with prior relaxations. First, f-DP preserves the hypothesis testing interpretation of differential privacy, which makes its guarantees easily interpretable. It allows for lossless reasoning about composition and post-processing, and notably, a direct way to analyze privacy amplification by subsampling. We define a canonical single-parameter family of definitions within our class that is termed "Gaussian Differential Privacy", based on hypothesis testing of two shifted normal distributions. We prove that this family is focal to f-DP by introducing a central limit theorem, which shows that the privacy guarantees of any hypothesis-testing based definition of privacy (including differential privacy) converge to Gaussian differential privacy in the limit under composition. This central limit theorem also gives a tractable analysis tool. We demonstrate the use of the tools we develop by giving an improved analysis of the privacy guarantees of noisy stochastic gradient descent. This is joint work with Jinshuo Dong and Aaron Roth.
Assistant Professor of Statistics at the Wharton School, University of Pennsylvania. Prior to joining Penn, he received his Ph.D. in Statistics from Stanford University in 2016 and his B.S. in Mathematics from Peking University in 2011. Su's research interests include high-dimensional inference, multiple testing, statistical aspects of optimization, and private data analysis. He is a recipient of an NSF CAREER Award in 2019.
4. Conformal Inference of Counterfactuals and Individual Treatment Effects
议题简介：Evaluating treatment effect heterogeneity widely informs treatment decision making. At the moment, much emphasis is placed on the estimation of the conditional average treatment effect via flexible machine learning algorithms. While these methods enjoy some theoretical appeal in terms of consistency and convergence rates, they generally perform poorly in terms of uncertainty quantification. This is troubling since assessing risk is crucial for reliable decision-making in sensitive and uncertain environments. In this work, we propose a conformal inference-based approach that can produce reliable interval estimates for counterfactuals and individual treatment effects under the potential outcome framework. For completely randomized or stratified randomized experiments with perfect compliance, the intervals have guaranteed average coverage in finite samples regardless of the unknown data generating mechanism. For randomized experiments with ignorable compliance and general observational studies obeying the strong ignorability assumption, the intervals satisfy a doubly robust property which states the following: the average coverage is approximately controlled if either the propensity score or the conditional quantiles of potential outcomes can be estimated accurately. Numerical studies on both synthetic and real datasets empirically demonstrate that existing methods suffer from a significant coverage deficit even in simple models. In contrast, our methods achieve the desired coverage with reasonably short intervals.
I'm a postdoctoral researcher in the Statistics Department at Stanford University, advised by Professor Emmanuel Candes. Previously I got my Ph.D. at UC Berkeley, advised by Professors Peter Bickel and Michael Jordan. I was also very fortunate to be supervised by Professors Noureddine El Karoui, William Fithian and Peng Ding on particular projects. Prior to this, I was major in mathematics and statistics in School of Mathematical Sciences at Peking University with an economic minor in China Center for Economic Research at Peking University. I was pleased to be a research assistant with Professor Lan Wu and supervised by Professor Song Xi Chen on my undergraduate thesis. My research interests include multiple hypothesis testing, causal inference, network analysis, high dimensional statistical inference, optimization, resampling methods, time series analysis and econometrics.
5. Shape Matters: Understanding the Implicit Bias of the Noise Covariance
议题简介：The noise in stochastic gradient descent (SGD) provides a crucial implicit regularization effect for training overparameterized models. Prior theoretical work largely focuses on spherical Gaussian noise, whereas empirical studies demonstrate the phenomenon that parameter-dependent noise --- induced by mini-batches or label perturbation --- is far more effective than Gaussian noise. In the talk, I will present a recent work that theoretically characterizes this phenomenon on a quadratically-parameterized model introduced by Vaskevicius et al. and Woodworth et al. We show that in an over-parameterized setting, SGD with label noise recovers the sparse ground-truth with an arbitrary initialization, whereas SGD with Gaussian noise or gradient descent overfits to dense solutions with large norms. Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.
斯坦福大学计算机科学和统计学助理教授。主要研究领域包括机器学习和算法，如非凸优化、深度学习及其理论、强化学习、表示学习、高维统计等。已在国际顶级会议和期刊上发表高质量论文40多篇。获得2018 ACM博士论文奖荣誉奖(Honorable Mentions)，NeuRIPS 2016最佳学生论文， COLT 2018最佳论文奖。本科就读于清华大学交叉信息研究院，是2008级“姚班”学生。毕业后前往美国在普林斯顿攻读博士学位，期间师从Sanjeev Arora教授。
转载请注明：机器学习前沿：Michael Jordan与鬲融、金驰、马腾宇等青年才俊的对话 | 职教数字局-职教导航