Citizen science, curated databases and other kinds of social machine all require database support, sometimes on a very large scale. But apart from the problems that are always associated with scale, what are the new demands that these systems place on databases? In this talk I shall describe two new issues that came up in working with the curators of biological data: data citation and data annotation. Both of these require computational insights. Data citation is now widely advocated, but we need computational tools to generate citations automatically; and nearly every kind of social machine has some form of annotation at its core.
Peter Buneman is Professor of Database Systems in the School of Informatics at the University of Edinburgh. His work in computer science has focused mainly on databases and programming languages, specifically: database semantics, approximate information, query languages, types for databases, data integration, bioinformatics and semistructured data. He has recently worked on issues associated with scientific databases such as data provenance, archiving and annotation. In addition he has made contributions to graph theory and to the mathematics of phylogeny. Recently he has initiated a project that has provided high-speed internet access to some of the most remote communities of Scotland.
The computational complexity of machine learning is dominated by the solution of non-analytic numerical problems (large-scale linear algebra, optimization, integration, the solution of differential equations). But a converse of sorts is also true — numerical algorithms for these tasks are learning machines! They estimate intractable, latent quantities by collecting the observable result of tractable computations. Because they also decide adaptively which computations to perform, these methods can be interpreted as autonomous inference agents. This observation lies at the heart of the emerging topic of Probabilistic Numerical Computation, which applies the concepts of probabilistic (Bayesian) inference to the design of algorithms, assigning a notion of probabilistic uncertainty to the result even of deterministic computations. I will outline how this viewpoint is connected to that of classic numerical analysis, and show that thinking about computation as inference affords novel, practical answers to the challenges of large-scale, big data, inference.
Philipp Hennig studied Physics in Heidelberg and London. After receiving his PhD from the University of Cambridge, UK, in 2011, he moved to the Max Planck Institute for Intelligent Systems in Tübingen, Germany, were he now runs an independent research group that develops numerical algorithms both for and as intelligent, autonomous systems. He works primarily in the machine learning community, but also has ties to applied mathematics, control engineering, and statistics.
Talks are in Amphi Estaunié. Breaks and the poster session are in E200.