# Symposium on Learning, Algorithms and Complexity

#### Indian Institute of Science, Bangalore, India.

Three decades ago, the confluence of learning, algorithms and complexity led to the definition of precise mathematical frameworks for studying computational learning, followed by the development of powerful learning algorithms such as boosting and support vector machines. In today's big data era, as we design new algorithms to learn from massive amounts of data, there is a need to re-visit some of the fundamental questions at this confluence: What are the right models of computational complexity that we should consider? What are the tradeoffs between computational and statistical complexity that must be taken into account? What are new learning models that could be appropriate, and how does one design provably good algorithms under these models?

This symposium will bring together leading researchers in the areas of learning, algorithms and complexity to present and discuss the latest developments in these exciting areas, with a goal of both brainstorming on new research directions and exciting the next generation of mathematical/computer scientists in India to think across boundaries. The target audience will mostly be graduate students in computer science, and the program will be a mix of introductory, tutorial-style lectures and advanced research talks.

The program will also include joint sessions with ICTS Turing Lectures on the afternoon of Wednesday, January 7, and Microsoft Research India Theory Day on Thursday, January 8. There will be free time to go to the I. G. Sarma - Infosys Foundation Lecture on the afternoon of Tuesday, January 6.

### Program

Venue: All sessions will be held in the Biological Sciences Auditorium, New Biological Sciences Building, Indian Institute of Science unless indicated otherwise below.

Monday, January 5
08:55 - 09:00 Address by Govindan Rangarajan, NMI Convener
09:00 - 10:30 Nicolo Cesa-Bianchi
University of Milan
The Online Approach to Machine Learning
Abstract:

The online paradigm has become a standard tool in machine learning and large-scale data analysis. Online learning can be viewed as a repeated game between an adaptive agent and an ever-changing environment. Within this simple paradigm, one can model a variety of sequential decision tasks simply by specifying the interaction protocol and the resource constraints on the learner. We will start the talk by introducing, through a novel and unified analysis, the main results for two well-known settings: prediction with expert advice and multiarmed bandits. A natural extension of this analysis will take us to the model of online convex optimization. This is a very general online learning framework, whose workhorse is the powerful mirror descent algorithm. In the second half of the talk we will explore the applications of mirror descent in expert-like and bandit-like scenarios, highlighting some of the most recent results and the main open problems.

Slides [pdf, 1mb]
10:30 - 11:00 Coffee Break
Microsoft Research New England
Optimization on Noisy Data: Statistical Accuracy vs. Numerical Precision
Abstract:

Many optimization problems that arise in science and engineering are those in which we only have a stochastic approximation to the underlying problem at hand (e.g. linear regression or other problems where our objective function is a sample average). Such problems highlight some of the challenges we face at the interface of computer science and statistics: should we use a highly accurate algorithm (with costly time and space requirements yet returns numerically precise estimates) or a crude stochastic approximation scheme like stochastic gradient descent (which is light on memory and simple to implement, yet has a poor convergence rate)? The tension arises due to the statistical issue of how much accuracy we actually desire of our algorithm. In the absence of computational constraints, the minimizer on a sample average of observed data -- the empirical risk minimizer (ERM) -- is widely regarded as the estimation strategy of choice due to its desirable statistical convergence properties.

This talk will survey results on both these fronts (stochastic approximation algorithms and recent work on linearly convergent algorithms). Furthermore, this work will provide a simple to implement procedure (applicable to many estimation problems including linear regression) that, under mild regularity conditions, enjoys the following properties:

• The algorithm can be implemented in linear time with just a single pass of the observed data, using space linear in the size of a single sample.
• The algorithm achieves the same statistical rate of convergence as the empirical risk minimizer, on every problem (even considering constant factors).
• The algorithm's performance depends on the initial error at a rate that decreases super-polynomially.

Furthermore, we quantify this rate of convergence, showing that after the size of our dataset exceeds some (problem dependent) threshold, the streaming algorithm rapidly approaches the rate of ERM in expectation.

Microsoft Research New England
Machine Learning and Crowdsourcing
Abstract:
Slides [ppt, 34mb]
12:30 - 14:00 Lunch (Guest House lawns)
14:00 - 15:30 Robert C. Williamson
Australian National University
The Geometry of Machine Learning Problems
Abstract:

Machine learning researchers love coming up with new and better algorithms. But users only care about having their problems solved. How can this tension be turned into a theoretical research agenda? In the talk I will explain how. I will argue why it is important and valuable to be able to relate machine learning problems to each other. To that end I will consider the definition of a machine learning problem. A crucial element of formalising a machine learning problem is a loss function and so mapping all the problems requires mapping the structure of loss functions. I will summarise the work I have done on loss functions in the last several years, culminating in a new, intrinsic and intriguing representation in terms of convex bodies. Along the way I will present interesting relationships with divergences, surrogate regret bounds, the aggregating algorithm, and different parametrisations of loss functions.

Slides - Part 1 [ppt, 42mb]
Slides - Part 2 [pdf, 6mb]
15:30 - 16:00 Coffee Break
16:00 - 16:45 Sanjoy Dasgupta
University of California, San Diego
Universal Consistency of Nearest Neighbor in Metric Spaces, and Rates of Convergence
Abstract:

In this talk, I will give an overview of the statistical theory of nearest neighbor classification. I will also talk about some recent progress on obtaining universal consistency, and sharp rates of convergence, in metric spaces.

Bio:

Sanjoy Dasgupta is a Professor in the Department of Computer Science and Engineering at UC San Diego. He received his PhD from Berkeley in 2000, and spent two years at AT&T Research Labs before joining UCSD. His area of research is algorithmic statistics, with a focus on unsupervised and minimally supervised learning. He is the author of a textbook, "Algorithms" (with Christos Papadimitriou and Umesh Vazirani), that appeared in 2006.

Slides [pdf, 1mb]
16:45 - 17:30 Shivani Agarwal
Indian Institute of Science
Statistical Learning in Complex Prediction Spaces: What Do We Know?
Abstract:

In an increasing number of application domains, supervised learning methods are needed for making predictions in complex spaces: predicting sequences in speech recognition, predicting rankings in information retrieval and recommender systems, predicting assignments in image segmentation, and so on. How should we design statistically consistent learning algorithms for such problems? A popular learning approach is to minimize a convex surrogate loss, but how does one design surrogates that lead to consistent algorithms? While the answer to this question is well understood for simple binary classification tasks and a few other specific learning problems, a general understanding has been missing. In this talk I will describe some of our recent work on developing a unified framework for designing statistically consistent learning algorithms for complex prediction problems defined by an arbitrary loss matrix. I will introduce the notion of convex calibration dimension of a general loss matrix, which measures the smallest dimension of a surrogate space in which one can design a convex calibrated surrogate loss for a given loss matrix, and will describe some ways to design explicit convex calibrated surrogates for any given loss matrix. I will also discuss ways to achieve approximate statistical consistency when exact consistency may be computationally hard to achieve. I will conclude by describing some fundamental open questions.

Slides [pdf, 11mb]
Tuesday, January 6
09:00 - 10:30 Santosh Vempala
Georgia Institute of Technology
The Complexity of Unsupervised Learning
Abstract:

Unsupervised learning or making "sense" of unlabeled data is a leading paradigm in modern machine learning. The idea is to either to make structural/distributional assumptions on data, then recover the structure or parameters of the distribution from samples, OR to find the best fit model for a given data set. In this self-contained talk, we will summarize the main ideas and progress along these lines, with a focus on the following "classical" models:

1. Gaussian mixture models
2. Independent Component Analysis
3. Recovering planted structures (cliques, partitions, assignments)

We will discuss algorithmic techniques, lower bounds and some of the many fascinating open problems.

Slides [ppt, 4mb]
10:30 - 11:00 Coffee Break
Massachusetts Institute of Technology
Beyond Berry Esseen: Structure and Learning of Sums of Random Variables
Abstract:

It is well-known that $$\sim1/\epsilon^2$$ coin tosses are necessary and sufficient to estimate the bias of a coin. But how many samples are needed to learn the distribution of the sum of $$n$$ independent Bernoullis? I will show that the answer is still $$O(1/\epsilon^2)$$, independent of $$n$$, and will generalize this bound to sums of independent bounded integer random variables (SIIRVs). I will argue that the celebrated Berry-Esseen theorem and its variants fall short from characterizing the general structure of SIIRVs, and offer stronger finitary central limit theorems, which tightly characterizing their structure and have the afore-mentioned implications to learning.

Bio:

Constantinos Daskalakis is the x-window consortium associate professor of computer science at MIT. He holds a diploma in electrical and computer engineering from the National Technical University of Athens, and a Ph.D. in electrical engineering and computer sciences from UC-Berkeley. His research interests lie in theoretical computer science and its interface with economics and probability. Daskalakis has been honored with the 2007 Microsoft Graduate Research Fellowship, the 2008 ACM Doctoral Dissertation Award, the Game Theory and Computer Science Prize from the Game Theory Society, the 2010 Sloan Fellowship in Computer Science, the 2011 SIAM Outstanding Paper Prize, the 2011 Ruth and Joel Spira Award for Distinguished Teaching, and the 2012 Microsoft Research Faculty Fellowship. He is also a recipient of Best Paper awards at the ACM Conference on Economics and Computation in 2006 and in 2013.

11:45 - 12:30 Ankur Moitra
Massachusetts Institute of Technology
Simple, Efficient and Neural Algorithms for Sparse Coding
Abstract:
12:30 - 14:00 Lunch (Guest House lawns)
16:00 - 17:00 Silvio Micali
Massachusetts Institute of Technology
The Quest for Resilient Mechanism Design
Abstract:

Mechanism design aims at engineering games that, when played by selfish players, yield outcomes satisfying a desired property. Such engineered games, however, are typically vulnerable to computational complexity, privacy, and collusion. Developing a theory of mechanism design resilient to such "forces" will require a totally new framework: techniques, solution concepts, and benchmarks. We shall advocate this point using auctions as an example.

Bio:

Professor Silvio Micali received his Laurea in Mathematics from the University of Rome, and his PhD in Computer Science from the University of California at Berkeley. Since 1983, he has been on the MIT faculty, in Electrical Engineering and Computer Science Department, where he is Ford Professor of Engineering. Silvio's research interests are cryptography, zero knowledge, pseudo-random generation, secure protocols, and mechanism design. Silvio has received the Turing Award (in computer science), the Gödel Prize (in theoretical computer science), and the RSA prize (in cryptography). He is a member of the National Academy of Sciences, the National Academy of Engineering, and the American Academy of Arts and Sciences.

17:30 - 18:30
Poster Session (Faculty Hall)
 Adway Mitra Indian Institute of Science Temporally Coherent CRP: A Bayesian Non-Parametric Approach for Clustering Tracklets with applications to Person Discovery in Videos Anudhyan Boral Harvard University An Economic Interpretation of Edmond's Blossom Algorithm Arpit Agarwal Indian Institute of Science GEV-Canonical Regression for Accurate Binary Class Probability Estimation when One Class is Rare Arun Rajkumar Indian Institute of Science Ranking from Pairwise Comparisons: The Role of the Pairwise Preference Matrix Chandrasekhar Lakshmi Narayanan Indian Institute of Science A generalized reduced linear program for Markov decision processes. Clément Canonne Columbia University Sampling Correctors Girish Varma Tata Institute of Fundamental Research Hardness of Coloring Harikrishna Narasimhan Indian Institute of Science Online and Stochastic Gradient Methods for Non-decomposable Loss Functions Harish G. Ramaswamy Indian Institute of Science Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking Losses Lavanya Sita Tekumalla Indian Institute of Science Mining Block I/O Traces for Cache Preloading with Sparse Temporal Non­parametric Mixture of Multivariate Poisson Palash Dey Indian Institute of Science Sampling Complexity for Winner Determination in Voting Rohit Vaish Indian Institute of Science On the Statistical Consistency of Plug-in Classifiers for Non-decomposable Performance Measures Satyanath Bhat Indian Institute of Science An Optimal Bidimensional Multi-Armed Bandit Auction for Multi-unit Procurement Shahbaz Khan Indian Institute of Technology Kanpur Incremental Algorithm for Maintaining DFS Tree for Undirected Graphs Shilpa Garg Max Planck Institut Informatik Seed node selection for community discovery in social networks Shivaram Kalyanakrishnan Indian Institute of Science Improved Expected Runtime for MDP Planning Shweta Jain Indian Institute of Science An Incentive Compatible Multi-Armed-Bandit Crowdsourcing Mechanism with Quality Assurance Trapit Bansal Indian Institute of Science A Provable SVD-based Algorithm for Learning Topics in Dominant Admixture Corpus
Wednesday, January 7
09:00 - 09:45 Sandeep Juneja
Tata Institute of Fundamental Research
Efficient Rare Event Simulation Algorithms for Heavy Tailed Processes
Abstract:
Slides [pdf, 2mb]
09:45 - 10:30 Anirban Dasgupta
Indian Institute of Technology Gandhinagar
Aggregating Information from the Crowd
Abstract:
Slides [pdf, 2mb]
10:30 - 11:00 Coffee Break
11:00 - 11:45 Arnab Bhattacharyya
Indian Institute of Science
Algebraic Property Testing
Abstract:
Slides [ppt, 3mb]
12:00 - 13:30 Lunch (Guest House lawns)
[Joint sessions with ICTS Turing Lectures]
14:00 - 14:45 Sanjeev Arora
Princeton University
Overcoming Computational Intractability in Unsupervised Learning
Abstract:

Unsupervised learning - i.e., learning with unlabeled data - is increasingly important given today's data deluge. Most natural problems in this domain - e.g. for models such as mixture models, HMMs, graphical models, topic models and sparse coding/dictionary learning, deep learning - are NP-hard. Therefore researchers in practice use either heuristics or convex relaxations with no concrete approximation bounds. Several nonconvex heuristics work well in practice, which is also a mystery.

The talk will describe a sequence of recent results whereby rigorous approaches leading to polynomial running time are possible for several problems in unsupervised learning. The proof of polynomial running time usually relies upon nondegeneracy assumptions on the data and the model parameters, and often also on stochastic properties of the data (average-case analysis). Some of these new algorithms are very efficient and practical - e.g. for topic modeling.

Bio:

Sanjeev Arora is Professor of Computer Science at Princeton University. He is interested in algorithms for intractable problems, and computational complexity. Recently he has been applying these ideas to machine learning. He has won the Packard Fellowship (1997), Simons Investigator Award (2012), Gödel Prize (2001 and 2010), ACM-Infosys Foundation Award in the computing Science (2012), and the Fulkerson Prize (2012).

Slides [ppt, 2mb]
14:45 - 15:30 Robert Schapire
Microsoft Research &
Princeton University
The Contextual Bandits Problem: A New, Fast, and Simple Algorithm
Abstract:

In the contextual bandits learning problem, the learner must repeatedly decide which action to take in response to an observed context, and is then permitted to observe the received reward, but only for the chosen action. The goal is to learn through experience to behave nearly as well as the best policy (or decision rule) in some possibly very large space of candidate policies. We assume that the learner can only access this policy space using an oracle for solving empirical cost-sensitive classification problems; in practice, most off-the-shelf classification algorithms could be used for this purpose. In this very general setting, we present a new, fast, and simple algorithm that achieves a regret guarantee that is statistically optimal. Moreover, this algorithm makes very modest use of the oracle, which it calls far less than once per round, on average. These properties suggest this may be the most practical contextual bandits algorithm among all existing approaches that are provably effective for general policy classes.

This is joint work with Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford and Lihong Li.

Bio:

Robert Schapire is a Principal Researcher at Microsoft Research in New York City, currently on leave from Princeton University. He received his ScB in math and computer science from Brown University in 1986, and his SM (1988) and PhD (1991) from MIT under the supervision of Ronald Rivest. After a short post-doc at Harvard, he joined the technical staff at AT&T Labs (formerly AT&T Bell Laboratories) in 1991. Since 2002, he has been with the Computer Science Department at Princeton University, and was named the David M. Siegel '83 Professor in Computer Science in 2013. He joined Microsoft Research in 2014. His awards include the 1991 ACM Doctoral Dissertation Award, the 2003 Gödel Prize, and the 2004 Kanelakkis Theory and Practice Award (both of the last two with Yoav Freund). He is a fellow of the AAAI, and a member of the National Academy of Engineering. He mainly studies machine learning, especially theoretically well-grounded algorithms, with a particular focus on a constellation of closely related methods and topics that includes boosting, online learning, game theory, and maximum entropy.

Slides [pdf, 1mb]
15:30 - 16:00 Coffee Break
16:00 - 16:45 Ravi Kannan
Microsoft Research India &
Indian Institute of Science
Versatility of Singular Value Decomposition
Abstract:

Singular Value Decomposition (SVD) is a basic tool to deal with matrix data and has traditionally been applied in a variety of fields. In the modern setting, matrix sizes have increased, but improved sampling based algorithms are still effective. Besides, many new applications of SVD to Gaussian Mixtures, Nearest Neighbors, Topic Modeling etc. have been developed. Combined with a simple device of thresholding, SVD is useful on a new bunch of problems. The talk will discuss from first principles some recent developments.

Slides [pdf, 1mb]
Thursday, January 8
[Joint sessions with Microsoft Research India Theory Day]
09:30 - 10:30 Elchanan Mossel
University of California, Berkeley
The Surprising Power of Belief Propagation
Abstract:

Belief propagation is an extremely popular and efficient algorithm in machine learning. The talk will concentrate on recent theoretical effort of understanding why is the algorithm so effective. This research effort reveals new connections between a type of zeta function from number theory, belief propagation and novel linear algebra algorithms.

Slides [pdf, 1mb]
10:30 - 11:00 Coffee Break
11:00 - 12:00 Ronitt Rubinfeld
Massachusetts Institute of Technology &
Tel Aviv University
Testing and Correction of Distributions
Abstract:
Slides [pdf, 1mb]
12:30 - 14:00 Lunch (Guest House lawns)
14:00 - 15:00 Ashish Goel
Stanford University
Two Random Walks that Surprise
Abstract:
Slides - Part 1 [pdf, 22mb]
Slides - Part 2 [pdf, 4mb]
15:00 - 15:30 Coffee Break
15:30 - 16:30 Rocco Servedio
Columbia University
Learning from Satisfying Assignments
Abstract:

We introduce and study a new type of learning problem for probability distributions over the $$n$$-dimensional Boolean hypercube. A learning problem in our framework is defined by a class $$C$$ of Boolean functions over the hypercube; in our model the learning algorithm is given uniform random satisfying assignments of an unknown function in $$C$$ and its goal is to output a high-accuracy approximation of the uniform distribution over the space of satisfying assignments for $$f$$. We discuss connections between the existence of efficient learning algorithms in this framework and evading the curse of dimensionality'' for more traditional density estimation problems.

As our main results, we show that the two most widely studied classes of Boolean functions in computational learning theory --- linear threshold functions and DNF formulas --- have efficient distribution learning algorithms in our model. Our algorithm for linear threshold functions runs in time $$\text{poly}(n,1/\epsilon)$$ and our algorithm for polynomial-size DNF runs in quasipolynomial time. On the other hand, we also prove complementary hardness results which show that under cryptographic assumptions, learning monotone 2-CNFs, intersections of 2 halfspaces, and degree-2 PTFs are all hard. This shows that our algorithms are close to the limits of what is efficiently learnable in this model.

Joint work with Anindya De and Ilias Diakonikolas.

Slides [ppt, 3mb]
Friday. January 9
09:00 - 09:45 Deeparnab Chakrabarty
Microsoft Research India
Provable Submodular Minimization via Wolfe's Algorithm
Abstract:

Submodular functions are fundamental objects which arise in *numerous* applications including machine learning, economics, optimization, biology, information theory, etc. They are mathematically very beautiful. In particular, one can minimize them in polynomial time. This is non-trivial since the function is defined on $$2^n$$ elements and all we can do is query its values. This was first shown using the celebrated ellipsoid method in the early 80s and later on combinatorial algorithms were found.

What we study is an algorithm which predates the ellipsoid method. Wolfe's algorithm is a heuristic proposed in 1976 to find the minimum length point on a polytope, and in 1984 Fujishige showed how to use this to minimize submodular functions. Together, this Fujishige-Wolfe heuristic, also called the min-norm point heuristic, seems to have the best empirical performance currently. However, no non-trivial theoretical guarantees were known. In this work we give the first convergence analysis of Wolfe's algorithm which gives a pseudo-polynomial time guarantee for the Fujishige-Wolfe min-norm point algorithm for submodular function minimization.

Slides [ppt, 1mb]
09:45 - 10:30 Prateek Jain
Microsoft Research India
Non-convex Projection Based Approaches for High-dimensional Learning
Abstract:
10:30 - 11:00 Coffee Break
11:00 - 11:45 Navin Goyal
Microsoft Research India
Algorithms for Independent Component Analysis
Abstract:
Slides [ppt, 2mb]
11:45 - 12:30 Kavitha Telikepalli
Tata Institute of Fundamental Research
Pairwise Spanners
Abstract:
Slides [pdf, 10mb]
12:30 - 14:00 Lunch (Guest House lawns)
14:00 - 15:30 Manindra Agrawal
Indian Institute of Technology Kanpur
Algebraic Complexity Theory
Abstract:
Slides [pdf, 2mb]
15:30 - 16:00 Coffee Break
16:00 - 16:45 Neeraj Kayal
Microsoft Research India
Arithmetic Circuits: from Lower Bounds to Learning and Back
Abstract:
16:45 - 17:30 Chandan Saha
Indian Institute of Science
Lower bounds for small depth arithmetic circuits
Abstract:

An approach to proving a super-polynomial lower bound for arithmetic circuits reduces the problem to proving ‘strong enough’ lower bounds for small depth circuits, in particular (non-homogeneous) depth-3 circuits and (homogeneous) depth-4 circuits. Depth of a circuit is the number of layers of gates in it.

In this talk, we plan to discuss an exponential lower bound for (homogeneous) depth-4 circuits that comes close to being ‘strong enough’. The techniques also yield exponential lower bounds for certain (non-homogeneous) depth-3 circuits, in particular depth-3 circuits with low bottom fan-in which also answers a question posed by Shpilka and Wigderson.

Based on joint works with Neeraj Kayal, Srikanth Srinivasan and Nutan Limaye.

Slides [ppt, 1mb]

### Registration

Registrations are now closed. The list of selected participants is available at the NMI symposium page.

### Visitor Information

Map of the Indian Institute of Science (IISc) campus and symposium venues. Click on the icons and the map menu (top-left corner) for additional details on accommodation, campus facilities, directions to airport, etc.

### Student Volunteers

• Arun Rajkumar (Indian Institute of Science)
• Siddarth Ramamohan (Indian Institute of Science)
• Harish G. Ramaswamy (Indian Institute of Science)
• Aadirupa Saha (Indian Institute of Science)