ParisBD 2017: Paris Big Data Management Summit

Keynotes

Prof. Peter Buneman, Professor of Computer Science at the University of Edinburgh, Scotland.

Citizen science, curated databases and other kinds of social machine all require database support, sometimes on a very large scale. But apart from the problems that are always associated with scale, what are the new demands that these systems place on databases? In this talk I shall describe two new issues that came up in working with the curators of biological data: data citation and data annotation. Both of these require computational insights. Data citation is now widely advocated, but we need computational tools to generate citations automatically; and nearly every kind of social machine has some form of annotation at its core.

Peter Buneman is Professor of Database Systems in the School of Informatics at the University of Edinburgh. His work in computer science has focused mainly on databases and programming languages, specifically: database semantics, approximate information, query languages, types for databases, data integration, bioinformatics and semistructured data. He has recently worked on issues associated with scientific databases such as data provenance, archiving and annotation. In addition he has made contributions to graph theory and to the mathematics of phylogeny. Recently he has initiated a project that has provided high-speed internet access to some of the most remote communities of Scotland.

Prof. Philipp Hennig, Research Group Leader at the Max Planck Institute for Intelligent Systems, Tübingen, Germany.

Probabilistic Numerics — Uncertainty in Computation

The computational complexity of machine learning is dominated by the solution of non-analytic numerical problems (large-scale linear algebra, optimization, integration, the solution of differential equations). But a converse of sorts is also true — numerical algorithms for these tasks are learning machines! They estimate intractable, latent quantities by collecting the observable result of tractable computations. Because they also decide adaptively which computations to perform, these methods can be interpreted as autonomous inference agents. This observation lies at the heart of the emerging topic of Probabilistic Numerical Computation, which applies the concepts of probabilistic (Bayesian) inference to the design of algorithms, assigning a notion of probabilistic uncertainty to the result even of deterministic computations. I will outline how this viewpoint is connected to that of classic numerical analysis, and show that thinking about computation as inference affords novel, practical answers to the challenges of large-scale, big data, inference.

Philipp Hennig studied Physics in Heidelberg and London. After receiving his PhD from the University of Cambridge, UK, in 2011, he moved to the Max Planck Institute for Intelligent Systems in Tübingen, Germany, were he now runs an independent research group that develops numerical algorithms both for and as intelligent, autonomous systems. He works primarily in the machine learning community, but also has ties to applied mathematics, control engineering, and statistics.

Program

Talks are in Amphi Estaunié. Breaks and the poster session are in E200.

08:00

Registration

08:30

Introduction to the event

08:45

Keynote: Probabilistic Numerics — Uncertainty in Computation – Philipp Hennig (Max Planck Institute for Intelligent Systems)

10:00

Coffee break

10:30

Technical talks 1

Content management techniques and tools for fact-checking – François Goasdoué, Ioana Manolescu and Xavier Tannier
Blocking for Big Data Integration – George Papadakis and Themis Palpanas
Schema Inference for Massive JSON Datasets – Mohamed-Amine Baazizi, Ben Lahmar Houssem, Dario Colazzo, Giorgio Ghelli and Carlo Sartiani
Kernel Square-Loss Exemplar Machines for Image Retrieval – Rafael Sampaio de Rezende, Joaquin Zepeda, Jean Ponce, Francis Bach and Patrick Pérez
Online Model-Free Influence Maximization with Persistence – Paul Lagrée, Olivier Cappé, Bogdan Cautis and Silviu Maniu

12:10

Lunch break

13:30

Technical talks 2

Structuring deep networks – Édouard Oyallon
On the benefits of output sparsity for multi-label classification – Evgenii Chzhen, Christophe Denis, Mohamed Hebiri and Joseph Salmon
scikit-learn: open, easy, yet versatile machine learning – Gael Varoquaux

14:30

Keynote: Social Machines and Social Data – Peter Buneman (University of Edinburgh)

15:45

Poster session & coffee break

A deep learning based approach for performance optimization in big data systems – Fei Song, Zhao Cao and Yanlei Diao
Dealing with incompletness in Knowledge Bases, a class-based approach – Jonathan Lajus
EVIDENSE : Allowing a Large-Scale Analysis of the Coverage of Crisis Events in Social Media – Oana Denisa Balalau and Mauro Sozio
Extracting Linked Data from statistic spreadsheets – Tien-Duc Cao, Ioana Manolescu and Xavier Tannier
Indexing and Mining Very Large Collections of Data Series with Varying Lengths – Michele Linardi and Themis Palpanas
Iterative and Expressive Queries for Big Data Series – Anna Gogolou, Anastasia Bezerianos, Theophanis Tsandilas and Themis Palpanas
Large Scale Density-friendly Graph Decomposition via Convex Programming – Maximilien Danisch, Hubert Chan and Mauro Sozio
Metagame analysis for team-based competitive games – Sylvain Lefebvre and Denis Maurel
Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling – Christophe Dupuy and Francis Bach
ParADS: Scalable Indexing of Very Large Data Series Collections Using Modern Hardware – Botao Peng and Themis Palpanas
Random Fourier Features For Operator-Valued Kernels – Romain Brault, Markus Heinonen and Florence d'Alché-buc
Side-Information Regularized Matrix Factorization – Paul-Henri Perrin, Florian Yger, Dario Colazzo and Jamal Atif
Thymeflow: a personal knowledge base system – David Montoya
Unbiased Online Recurrent Optimization – Corentin Tallec
Uncertainty Sampling and Optimization for Interactive Database Exploration – Liping Peng, Enhui Huang, Yuqing Xing, Anna Liu and Yanlei Diao
World wealth & income database – Konstantinos Skianis and Michalis Vazirgiannis

16:45

Technical talks 3

A Circuit-Based Approach to Efficient Enumeration – Antoine Amarilli, Pierre Bourhis, Louis Jachiet and Stefan Mengel
Inventory prediction in foreign exchange markets – Damien Challet, Rémy Chicheportiche, Mehdi Lallouache and Serge Kassibrakis
Model Aggregation for Production Forecasting of Oil and Gas – Sebastien Da Veiga, Raphael Deswarte, Veronique Gervais-Couplet and Gilles Stoltz

17:45

Closing cocktail