## MALOTEC Séminaire de MAthématique et LOgique pour l’exTraction et le traitEment de Connaissances

### Présentation:

• Le séminaire est organisé au LORIA (CNRS, Inria N.G.E., Univ. Lorraine)
• Il cherche à fomenter les échanges entre les chercheurs français et étrangers par la présentation de leurs travaux récents
• Les manifestations des jeunes chercheurs sont bienvenues
• Il est animé par l’équipe Orpailleur : Miguel Couceiro, Amedeo Napoli
• Les thèmes abordés sont très variés et portent sur les sciences de l’information et  de la vie: mathématiques discrètes (théorie des graphes, des ensembles ordonnés, algèbre, combinatoire), mathématiques de la décision et de l’information (théorie de l’agrégation et du choix social, aide à la décision, théorie des jeux), logique (modélisation, représentation et raisonnement sur des connaissances) et intelligence artificielle (apprentissage machine, analyse de donnés, data mining)

### Programme:

• Jeudi, le 16 juin 2022, de 10h30 à 11h30, Salle: TEAMS

Pierre-Alexandre Murena: TBA
Résumé : TBA

• Jeudi, le 09 juin 2022, de 10h30 à 11h30, Salle: TEAMS

Shane, Camille, Kevin (TBC): Solving Morphological Analogies Through Generation
Résumé : In this presentation we will present a first attempt to solving morphological analogies through generation, instead of relying on retrieval approaches. Our preliminary experiments show promising results for some languages and reveal the feasibility of the approach in generating solutions of analogical equations in the morphology setting.
This talk is relies on a recent collaboration with Miguel Couceiro and Esteban Marquer.

• Jeudi, le 02 juin 2022, de 10h30 à 11h30, Salle: TEAMS

Miguel Couceiro: Galois theory of analogical classifiers
Résumé : Recent works have highlighted the role of analogical inference in improving several reasoning and classification results. This led to efforts to better understand the analogy inference principle (AIP) and to describe the classes of classifiers for which the AIP always holds (i.e., without classification errors). This description is tightly related to notion of analogy preservation and intuitively depends on the formal models of analogy considered.
In this talk we will explore this relation between formal models of analogy and the corresponding classes of analogy preserving functions, to establish a general Galois theory of analogical classifiers that does not depend on the underlying domains nor on the formal models of analogy considered. We illustrate the usefulness of this Galois framework over Boolean domains, for which we explicitly determine the Galois closed sets of analogical classifiers (i.e., classifiers that are compatible with AIP), for each pair of known models of analogy.
This talk is relies on a recent collaboration with Erkko Lehtonen, and on ongoing research work also with Esteban Marquer and Pierre-Alexandre Murena.

• Jeudi, le 26 mai 2022: Jour férié (Ascension)
• Jeudi, le 19 mai 2022, de 10h30 à 11h30, Salle: TEAMS

Justine Falque: A classification of P-oligomorphic groups: solving the conjectures of Cameron and Macpherson
Résumé : Given an infinite permutation group G, the profile of G is the function that maps any nonnegative integer n to the number of orbits of n-subsets, for the induced action of G on the subsets of elements. Cameron conjectured that this counting function is asymptotically equivalent to a polynomial whenever it is bounded by a polynomial. Another stronger conjecture was later made by Macpherson stating that, in this case, a certain graded algebraic structure on the orbits, named after Cameron, is finitely generated.
In this talk we will revisit a joint work with Nicolas Thiéry that brought a deep understanding of these groups and that solved positively the two conjectures. We will first introduce the objects and context, and will then sketch the proof of the classification of the groups with polynomially bounded profile. As we will see, by providing a finite encoding, this classification also enables an algorithmic approach and a natural implementation of these infinite groups. The tools used include the study of the block systems lattice, a computer exploration of certain parts of the problem, as well as some group theoretic formalisation tools.

• Jeudi, le 05 mai 2022, de 10h30 à 11h30, Salle: TEAMS

Guilherme Dean Pelegrina: Analysis of Trade-offs in Fair Principal Component Analysis Based on Multi-objective Optimization
Résumé : In dimension reduction problems, the adopted technique may produce disparities between the representation errors of two or more different groups. Depending on the situation, this unfair result may introduce ethical concerns. Aiming at overcoming this inconvenience, a fairness measure can be considered when performing dimension reduction through Principal Component Analysis. However, a solution that increases fairness tends to increase the reconstruction error, i.e., there is a trade-off between equity and performance. In this context, this work proposes to address this trade-off in Fair Principal Component Analysis problems by means of a multi-objective-based approach. For this purpose, we adopt a fairness measure associated with the disparity between the representation errors of different groups and investigate if the solution of a classical Principal Component Analysis can be used to find a fair projection. Numerical experiments attest that a fairer result can be achieved with a very small loss in the reconstruction error.

• Jeudi, le 28 avril 2022, de 10h30 à 11h30, Salle: TEAMS

Laura Zanella: Fine-tuning Pre-trained Transformer Language Models for Biomedical Event Trigger Detection
Résumé : The detection of biomedical events plays an important role in building diverse applications in the biomedical domain, such as in disease prevention, pathway curation and epigenetics. Biomedical event trigger detection is a critical sub-task in extracting events that identify the possible occurrence of an event. Pre-trained transformer language models, such as BERT and its variants, have obtained the state-of-the-art performance in event extraction using different biomedical annotated corpus. However, a comparison between these models for this task has not yet been done. This paper proposes to analyze the differences between the performance of BERT and four of its variants tested on seven merged annotated biomedical corpus. BioBERT emerged as the best model from the evaluation presented here, showing that using a transformer model that is pre-trained from the original model, BERT, and uses biomedical data for its training is useful for recognizing biomedical event triggers, if the training is done for enough number of epochs.

• Pause de Pâques: les 14 et 21 avril 2022
• Jeudi, le 7 avril 2022, de 10h30 à 11h30, Salle: TEAMS

Anes Bendimerad: Mining Useful Patterns in Attributed Graphs
Résumé : Attributed graphs provide a powerful abstraction that can be used to represent many datasets in an intuitive manner. Mining these graphs can be useful for many applications, such as analyzing social networks, the World Wide Web, etc. Several methods have been proposed to identify patterns in these structures. Generally, these methods define a pattern as a subgraph whose vertices satisfy some structural constraints (e.g., density, connectivity) and have a subset of attributes with homogeneous values. When mining attributed graphs, the principled integration of both graph and attribute data poses two important challenges. First, we need to define a pattern syntax that is easy to interpret and lends itself to efficient search. The second challenge is the formalization of the pattern’s interestingness. A pattern is generally relevant if it depicts some local properties that are somehow unexpected. Furthermore, the interestingness of patterns is subjective in practice, i.e., it significantly depends on the final user, her background knowledge and her preferences. In this talk, we will present different approaches that we have proposed to tackle these challenges: we define new pattern languages, interestingness models, algorithms for attributed graphs, and the use of these methods to mine different real world datasets.

• Jeudi, le 31 mars 2022, de 10h30 à 11h30, Salle: TEAMS

Youcef Remil: How can subgroup discovery help AIOps?
Résumé : The genuine supervision of modern IT systems brings new challenges as it requires higher standards of scalability, reliability and efficiency when analysing and monitoring big data streams. Rule-based inference engines are a key component of maintenance systems in detecting anomalies and automating their resolution. However, they remain confined to simple and general rules and cannot handle the huge amount of data, nor the large number of alerts raised by IT systems, a lesson learned from expert systems era. Artificial Intelligence for Operation Systems (AIOps) proposes to take advantage of advanced analytics and machine learning on big data to improve and automate every step of supervision systems and aid incident management in detecting outages, identifying root causes and applying appropriate healing actions. Nevertheless, the best AIOps techniques rely on opaque models, strongly limiting their adoption. As a part of this PhD thesis, we study how Subgroup Discovery can help AIOps. This promising data mining technique offers possibilities to extract interesting hypotheses from data and understand the underlying process behind predictive models. To ensure relevancy of our propositions, this project involves both data mining researchers and practitioners from Infologic, a French software editor.

• Jeudi, le 24 mars 2022, de 10h30 à 11h30, Salle: TEAMS

Félix Gaschi: Evaluating the multilingual alignment in contextualized embeddings
Résumé : Deep learning models like BERT have become the norm in NLP. A multilingual version of BERT, called mBERT, is capable of learning a task in one language and of generalizing it to another. This generalization enables the perspective of efficient models in languages with few annotated data, but still remains largely unexplained. We propose a new method based on in-context translated words, rather than translated sentences, in order to analyze the similarity between contextualized representations across languages. We show that the representations learned by mBERT are closer for deep layers, outperforming other representations that were specifically trained to be aligned.

• Jeudi, le 17 mars 2022, de 10h30 à 11h30, Salle: TEAMS

Esteban Marquer: data2vec, a generalisation of masking-based embedding models like word2vec, BERT, wav2vec and more
Résumé : For embedding models for text, speech, and images, the state of the art is to use pretrained models like variants of BERT (FloBERT, RoBERTa, …) wav2vec 2, or vision transformers. These use the transformer architecture and are pre-trained using self-supervised learning, before being fine-tuned in the task-specific context. Designing models based on transformers for new kinds of data have been a focus for many in the past 3 or 4 years.
On January 20th 2022, Facebook AI (now Meta AI) published an approach called data2vec, a generalisation of the usual self-supervised learning setting using self-distilation and masking. Their models have state of the art or competitive performance on benchmarks for text, speech, and images. This new framework could allow to train models on more task specific and non-standard data, with much less trouble in the design of the model.
In this talk we will survey the above mentioned article, completed with the necessary of context and without requiring background on deep learning approaches.

• Jeudi, le 10 mars 2022, de 10h30 à 11h30, Salle: TEAMS

Nyoman Juniarta: Organizing and Improving a Database of French Word Formation Using Formal Concept Analysis
Résumé : We apply Formal Concept Analysis (FCA) to organize and to improve the quality of Démonette2, a French derivational database, through a detection of both missing and spurious derivations in the database. We represent each derivational family as a graph. Given that the subgraph relation exists among derivational families, FCA can group families and represent them in a partially ordered set (poset). This poset is also useful for improving the database. A family is regarded as an anomaly if its derivational graph is almost, but not completely, identical to a large number of other families.

• Jeudi, le 24 février 2022, de 10h30 à 11h30, Salle: TEAMS

Martin Lentschat: n-Ary relations instantiation from scientific articles driven by a domain Ontological and Terminological Resource
Résumé : I will present my thesis work on the automatic extraction of knowledge in scientific articles. This thesis belongs to the research field of smart data, where we value the recognition of specific information within textual documents. I based my approach on the extraction and representation of knowledge in experimental domains (the methods were evaluated on a corpus of 50 articles in the food packaging domain). The objective of this thesis was to populate a knowledge base with instances of n-ary relations extracted from scientific textual documents. The experimental data can be represented in n-ary relations composed of symbolic and quantitative arguments. The proposed approach was based on an Ontological and Terminological Resource (OTR) and is divided into two Phases: (1) the recognition and extraction of argument instances of interest and (2) the linking of these instances in n-ary relations.
Phase (1) proposes a representation of the extracted argument instances, called SciPuRe (Scientific Publication Representation). It integrates ontological, lexical and structural descriptors that describe the context of the argument instances and allows to sort them by their relevance. The results of argument instances extraction show the importance of sorting the relevant instances using SciPuRe features. The relevance evaluation of symbolic arguments is done using the SciPuRe ontological and lexical features, while quantitative arguments relevance is best measured using lexical and structural features.
Phase (2) relies on the extraction of partial n-ary relations present in the tables of the documents then completed with the argument instances from Phase (1). The partial relation are represented in StaRe (Scientific Table Representation) using the same set of features as for SciPuRe. Three approaches are proposed and evaluated in order to identify the argument instances that should complete the relations: the use of document structure, the analysis of cooccurrences between the argument instances in the texts, and finally the use of word-embedding models to measure the similarities between the candidate argument instances and the arguments already present in the partial relations. Phase (2) experiments first confirmed the usefulness of the relevance scores computed in Phase (1). Then, we experimented with the possibility of selecting multiple candidates for each missing argument instance in a partial relation, in an approach to assist domain experts who can then determine the valid instance. These final experiments showed that the choice of the method depends on the desired expert involvement.

• Jeudi, le 17 février 2022, de 10h30 à 11h30, Salle: TEAMS (CANCELLED)

Justine Reynaud: There is a life after PhD
Résumé : Firstly, I’ll give some feedback about the after-PhD, from my thesis defense to my first steps as a « maître de conférence ». Then, I’ll briefly present my scientific environment : the CODAG team I now belong to, and the MRSH of Caen. Lastly, I will present 3 of my current works in progress, related to redescription mining.

• Jeudi, le 10 février 2022, de 10h30 à 11h30, Salle: B013 (TBC)

Mohammed Fellaji: Learning and Evaluation of Audio Embeddings
Résumé : The combination of sound recognition and artificial intelligence is promising area of research. One of the main challenges is how to extract features from a given audio signal. We will address this problem by exploring audio embeddings models as a tool to extract audio features. My main focus will be on studying audio embeddings to represent audio signals in a latent space that preserves most of the information in the audio signal. Such models are designed using neural networks and are trained on a large scale datasets. By doing so, we expect the features that the model outputs to be informative enough and that these models can be applied to any new dataset – which can be relatively small compared to the large dataset used for training the embedding model. This framework may thus facilitate downstream tasks such as that of classification in our setting.
This presentation will thus consist of three main parts. First, a detailed state-of-the-art is elaborated in order to compare different existing models with different training methods. Second, experiments on three models introduced in the first part is studied with different configurations. As we will see empirically an audio embeddings model provides competitive results fully trained deep learning models. The effect of training a smaller dataset will also be explored and we will also show that audio embeddings models outperform fully learnable models when the size of the training dataset is reduced. Lastly, we will detail a theoretical study focusing on autoencoders, variational autoencoders and normalising flows, and establish a link to audio embeddings.

• Jeudi, le 27 janvier 2022, de 10h30 à 11h30, Salle: TEAMS

Ambroise Baril: An algebraic approach towards the fine-grained complexity of graph coloring problems
Résumé : In this presentation, we are interested in the fine-grained complexity of determining whether there is an homomorphism from an input graph G to a fixed graph H (the H-coloring problem). The starting point is the observation that these problems can be formulated in the language of CSPs, and that primitive-positive definitions (also referred as pp-definitions) of binary relations are of paramount importance in the study of complexity classes of such CSPs.
Thus, we first investigate the expressivity of binary symmetric relations E_H. For irreflexive graphs we observe that there is no pair of graphs H and H’ such that E_H qfpp-defines (ie pp-defines without quantifier) E_H’, unless E_H’ is empty or H=H’, leading to the non-existence of a non-trivial fine-grained complexity preserving reduction between H-COLORING and H’-COLORING that uses the classic method of qfpp-definitions. More generally, we show that the existence of an n-ary relation R such that (i) E_H qfpp-defines R, (ii) R does not qfpp-defines E_H, and (iii) CSP(R) is NP-complete, is equivalent to the presence of an odd cycle of length at most n in H.
Motivated by this, we also describe how pp-definitions can be used as a tool to prove that a graph is a core.

• Jeudi, le 20 janvier 2022, de 10h30 à 11h30, Salle: TEAMS

Georgios Zervakis: An Analogy based Approach for Solving Target Sense Verification
Résumé : Contextualized language models are emerging as a de facto standard in natural language processing due to the vast amount of knowledge they acquire during pretraining. Nonetheless, their ability to solve tasks that require reasoning over this knowledge is limited. Certain tasks can be improved by analogical reasoning over concepts, e.g., understanding the underlying relations in “Man is to woman as King is to Queen”. In this work, we propose a way to formulate target sense verification as an analogy detection task, by transforming the input data into quadruples. We present AB4TSV (Analogy and BERT for TSV), a model that uses BERT to represent the objects in these quadruples combined with a convolutional neural network to decide whether they constitute valid analogies. We test our system on the WiC-TSV evaluation benchmark, showing that it outperforms existing approaches. Our empirical study shows the importance of the input encoding for BERT. This dependence gets alleviated by enforcing invariance to permutations of analogical proportions during training, while preserving performance and improving interpretability.

• Jeudi, le 13 janvier 2022, de 10h30 à 11h30, Salle: TEAMS

Pierre Andrieu: Rank aggregation on biological data : scalability and quality criteria
Résumé : Biologists and physicians regularly query public biological databases, for instance, when they are looking for genes that most related with a given disease. In this case, the chosen keywords are particularly important: synonymous reformulations of the same disease (for example, « breast cancer » and « breast carcinoma ») may lead to very different rankings of (thousands of) genes. The genes, sorted by relevance, can be tied (of equal importance to the disease). Additionally, some genes returned when using a synonym may be absent when using another synonym. The rankings are then called incomplete rankings with ties. The challenge is then how to combine the information provided by these different gene rankings. The problem of aggregating a list of input rankings into a consensus ranking, that is, as close as possible to the input rankings, is called the rank aggregation problem, which is known to be NP-hard. Although most frameworks focus on complete rankings without ties, we will consider incomplete rankings with ties.
In this talk, I will present three contributions pertaining to the solution of this problem. First, I will present a graph-based heuristic to divide the initial problem into independent sub-problems in the context of incomplete rankings with ties. Second, I will talk about quality criteria and present an algorithm capable of identifying common points between all the optimal consensus rankings without computing them. Finally, as missing data can be interpreted in different ways according to the context, I will present a parameterized model and present an axiomatic study rooted in social choice theory to consider these differences.

• Jeudi, le 16 décembre 2021, de 10h30 à 11h30, Salle: Teams (but room C005 is reserved)

Erkko Lehtonen: On clonoids of Boolean functions
Résumé : A clonoid is a class of functions of several arguments that is stable under right and left compositions with fixed clones. In this talk, we discuss first steps towards Post-like complete descriptions of clonoids of Boolean functions. We provide a brief introduction to the topic and present some motivations for studying clonoids.

• Jeudi, le 02 décembre 2021, de 10h30 à 11h30, Salle: B013

Lethicia Magno Massuia De Almeida: Crop modelling: how can numbers help us better understand the effects of climate change?
Résumé : The increased frequency of stressful events, such as heat waves, has been observed in recent decades during the reproductive phase of economically important agricultural crops. Many studies have highlighted the memory effects of stress events, which can be induced when plants are repeatedly subjected to similar stresses. These memory effects have been shown to be potentially beneficial, as plants are better prepared to develop a more effective response when the stress recurs.
Therefore, new climate models are encouraging the inclusion of the effects of repeated stresses in agricultural predictive modelling approaches, in order to estimate their consequences on crop yields and the quality of harvested products. In recent decades, the use of crop models has been extended to climate change predictions, although no modelling study has attempted to decipher simulation biases with respect to stress memory.
In this seminar I will present three crop modelling approaches used in our laboratory with the aim of better understanding plant-climate interactions, namely (i) the mechanistic SuMoToRI model on sulphur dynamics in oilseed rape, (ii) the application of multiple regressions on climate indicators in the field, and (iii) the hypothesis-free statistical method based on data mining (currently under development).

• Jeudi, le 25 novembre 2021, de 10h30 à 11h30, Salle: Teams

Miguel Couceiro: Extremely randomized trees for clustering complex data
Résumé : Many clustering methods are based on a measure that quantifies the (dis)similarity between objects in order to highlight homogeneous groups. Some algorithms are defined with respect to a particular measure. Other algorithms allow more freedom in the choice of the measure, such as hierarchical clustering algorithms. However, the choice of such a measure is not trivial: it depends on the type of data and often requires a pre-processing phase, such as data normalisation, elimination of outliers or correlated variables. This choice is particularly critical in the case of heterogeneous and complex data, such as biomedical data. We will present a method for learning (dis)similarity measures based on extremely random tree forests, which can be applied to heterogeneous data while simplifying (or even eliminating) data pre-processing phases. We will also illustrate how to adapt it to complex data such as attributed graphs. Empirical results on synthetic as well as real data show that the method is competitive with well-known approaches in homogeneous and heterogeneous contexts. This presentation is based in a talk at AFIA-SFC: Recent advances on unsupervised learning and the doctoral work of Kevin Dalleau (supervised by Malika Smail-Tabbone and myself).

• Jeudi, le 18 novembre 2021, de 10h00 à 11h00, Salle: Teams

Rameez Qureshi: Mitigating Unintended Bias in Masked Language Models
Résumé : Algorithmic fairness is currently one of the most debated topics in artificial intelligence. Primarily because of its massive importance and critical impact on various aspects of human lives. One of its highly discussed drawbacks is the unintended stereotypical bias we observe in complex machine learning models, and that carries adverse effects on various fields ranging from healthcare to legal policing. As bias is subjective and may vary from case to case, one major challenge is defining stereotypical bias algorithmically. Another hurdle is to avoid using human-annotated data while tuning such language models to escape the inherent human biases.
This talk will discuss a novel architecture based on Deep Reinforcement Learning to mitigate such biases. We start with defining bias algorithmically and will lay out an architecture to control unintended biases. Forwarding this way, we will discuss various experiments concluding with rather interesting results. The observations are exciting and pave the way for further fascinating avenues in the field of Fairness in AI. This constitutes ongoing research work in collaboration with Luis Galárraga (Inria Rennes) and Miguel Couceiro (LORIA).

• Mercredi, le 10 novembre 2021, de 11h00 à 12h00. Salle: B013

Mario Valencia: Complexity of the cluster deletion problem on cographs and subclasses of chordal graphs.
Résumé : We consider the following vertex-partition problem on graphs, known as the CLUSTER DELETION (CD) problem: given a graph with real
nonnegative edge weights, partition the vertices into clusters (in this case, cliques) to minimize the total weight of edges outside the clusters. The decision version of this optimization problem is known to be NP-complete even for unweighted graphs and has been studied extensively. In this talk, I will focus on the complexity of the decision CD problem for the family of chordal graphs, showing that it is NP-complete for weighted split graphs, weighted interval graphs and unweighted chordal graphs. We will also see that the problem is NP-complete for weighted cographs. Some polynomial-time solvable cases of the optimization problem will be identified, in particular CD for unweighted cographs, split graphs, unweighted proper interval graphs and weighted block graphs.
This is a joint work with Flavia Bonomo and Guillermo Duràn (University of Buenos Aires).

• Jeudi, le 21 octobre 2021, de 10h30 à 11h30, Salle: Teams

Eduardo Calò: Building a Learning Space for Language Acquisition
Résumé : Learning spaces offer a rigorous mathematical foundation for algorithms that can be employed in intelligent tutoring systems to organize the body of knowledge necessary to become proficient in a domain. These mathematical structures are particular types of partial ordered sets, and thus they can be built using formal concept analysis (FCA), which provides powerful formal tools, such as concept lattices, to order data of this nature. Learning spaces are helpful to bring to light implications between notions and uncover students’ different learning paths. These structures eventually help teachers create personalized learning plans and intelligently assess students. Learning spaces have been used to ease teaching and learning in a variety of subjects. However, applications in language acquisition remain challenging.
In this talk, we present an adaptation of this framework to language acquisition, specifically Mandarin, by leveraging the shared properties between learning space theory and FCA. The functioning of the tool realized is based on a local and dynamic exploration of the space, without the need to compute the whole structure beforehand. This way, a scalable system architecture, which is not limited by the size of the input and the power of the machine on which it runs, is achieved. We report experiments conducted using several datasets and feedback provided by teachers who used the first prototype of the system. The results we obtain are encouraging and suggest that the approach can be adapted to other languages.

• Jeudi, le 14 octobre 2021, de 10h30 à 11h30, Salle: Teams

Mehdi Kaytoue: Contributions to Pattern Mining and Formal Concept Analysis for Actionable Knowledge Discovery
Résumé : In this talk, I will present the main research I have conducted or participated in for the last 15 years in the field of knowledge discovery from data. Introducing a problem in neuroscience will rise the need of distinguishing (i) data and pattern formalization, (ii) algorithms for mining patterns and (iii) measures for assessing the interestingness of a pattern or a set of patterns. We’ll start accordingly by presenting a number of contributions in Formal Concept Analysis, providing a convenient way to formalize what we manipulate in pattern mining. Then, we’ll dive into algorithmic issues when enumerating patterns of different natures (especially numerical) that come along as well as two major propositions : mining patterns with a Monte Carlo tree search (MCTS) and mining numerical patterns in the lattice of all discretizations. To better understand the problem of designing pattern quality measures, we’ll review a certain number of applications I participated in over the last years (neuroscience, video game analytics, social media analytics), before focusing on applications in Software Engineering (SQL workload analysis, incident triaging), and other perspectives to be handled in an industrial context in company Infologic, a software editor I work for).

• Salle: TeamsJeudi, le 24 juin 2021, de 11h à 12h00, Salle: Teams

Safa Alsaidi, Amandine Decker and Puthineath Lay: A neural approach to detecting and solving morphological analogies across languages
Résumé : Analogical proportions are statements of the form « A is to B as C is to D » that are used for several reasoning and classification tasks in artificial intelligence and natural language processing (NLP). For instance, there are analogy based approaches to semantics as well as to morphology. In fact, symbolic approaches were developed to solve or to detect analogies between character strings, e.g., the axiomatic approach as well as that based on Kolmogorov complexity.  In this presentation, we propose a deep learning approach to detect morphological analogies, for instance, with reinflexion or conjugation. We present empirical results that show that our framework is competitive with the above-mentioned state of the art symbolic approaches. We also explore empirically its transferability capacity across languages, which highlights interesting similarities between them.

• Jeudi, le 17 juin 2021, de 11h à 12h00, Salle: Teams

Tatiana Makhalova: Contributions to pattern set mining: from complex datasets to significant and useful pattern sets
Résumé : In this talk, we discuss different aspects of pattern mining in binary and numerical tabular datasets. The objective of pattern mining is to discover a small set of non-redundant patterns that may cover entirely a given dataset and be interpreted as useful and significant knowledge units. We focus on such issues as (i) formal definition of pattern interestingness, (ii) the mitigation of the pattern explosion problem, (iii) measure for evaluating the performance of pattern mining, and (iv) the discrepancy between interestingness and quality of the discovered pattern sets.
The first part of the talk is devoted to a so-called closure structure and the GDPM algorithm for its computing. The closure structure allows for estimating both the data and pattern complexity.  Moreover, we discuss how the closure structure allows an analyst to understand the intrinsic data configuration before selecting an interestingness measure for pattern mining.
In the second part, we discuss the difference between interestingness and quality of pattern sets. We present the KeepItSimple algorithm that adopts the best practices of supervised learning in pattern mining and relates interestingness and the quality of pattern sets. We show that KeepItSimple allows for an efficient mining of a set of interesting and good-quality patterns without any pattern explosion.
The third part of the talk is devoted to numerical pattern mining. We present an MDL-based algorithm called Mint for mining pattern sets in numerical data. The Mint algorithm relies on a strong theoretical foundation and at the same time has a practical objective in returning a small set of numerical, non-redundant, and informative patterns. Mint has very good behavior in practice and usually outperforms its competitors.

• Jeudi, le 10 juin 2021, de 11h à 12h00, Salle: Teams

Jaume Baixeries: Data Dependencies and Pattern Structures
Résumé : Pattern Structures (a generalization of Formal Concept Analysis) has been proved to be a suitable mathematical tool to analyze many-valued datasets. In this talk we present recent results on the characterization and computation of various data dependencies with FCA and Pattern Structures. Those dependencies may show interesting patterns or regularities in the dataset. This talk is open to everyone, specially to data practitioners and data analysts. We will also present some basic concepts on Pattern Structures that may not necessarily be known by a general audience.Salle: Teams

• Salle: TeamsJeudi le 13 mai 2021, de 11h à 12h00, Salle: Teams

Rebecca Willett (Seminars MPML at IST): Machine Learning and Inverse Problems: Deeper and More Robust

• Salle: TeamsSalle: TeamsJeudi, le 06 mai 2021, de 11h à 12h00, Salle: Teams

Erkko Lehtonen: Associative spectra of graph algebras
Résumé : The associative spectrum was introduced by Csákány and Waldhauser in 2000 as a method of quantifying the non-associativity of binary operations or the corresponding groupoids. The associative spectrum of a groupoid G is an integer sequence, the n-th member of which equals the number of distinct term operations induced on G by the bracketings of n variables. Graph algebras were introduced by Shallon in 1979 and provide a useful representation of directed graphs as algebras with a binary operation.
In this talk, we report our work on associative spectra of graph algebras. We classify undirected graphs according to the associative spectra of their graph algebras; there are only three distinct possibilities: constant 1, powers of 2, and Catalan numbers. For arbitrary digraphs, the situation is considerably more complicated. We provide a necessary and sufficient condition for a graph algebra to satisfy a given bracketing identity, expressed in terms of several numerical structural parameters associated, on the one hand, with the graph and, on the other hand, with a pair of bracketings. Based on this, we establish bounds on the possible associative spectra of graph algebras; such a spectrum is either a constant sequence bounded above by 2 or it grows exponentially. This stands in stark contrast with associative spectra of arbitrary groupoids, for which other constant and subexponential spectra are also possible.
This is joint work with Tamás WaldhaSalle: Teamsersity of Szeged).

• Salle: TeamsJeudi, le 29 avril 2021, de 11h à 12h00, Salle: Teams

Luis Galarraga: Hierarchical Pattern-aided Regression
Résumé : I will briefly review the different methods of pattern-assisted regression and present HIPAR, a new regression method optimized for multimodal data with categorical and quantitative attributes. HIPAR learns hybrid rules of the form p ⇒ y = f (X) where p is the characterization of a data region and f(X) is a linear regression model on a variable of interest y. HIPAR relies on pattern mining techniques to identify regions in a data set that can be modeled via regression methods. The originality of the method lies in an exhaustive and hierarchical exploration of the space of regions. Such a strategy provides more choice and flexibility by selecting accurate and interpretable hybrid rules that characterize the entire data set. As our experimental evaluation shows, HIPAR outperforms existing pattern-based regression methods in terms of prediction error, becoming in some cases comparable to best-performing (but not interpretable) methods, such as randoSalle: Teamsnd gradient boosting trees.

• Jeudi, le 22 avril 2021: Easter break!
• Jeudi, le 15 avril 2021, de 11h à 12h00, Salle: Teams

Ambroise Barril: Hardness and tractability of the γ-Complete Subgraph problem
Résumé : Finding cohesive subgraphs is a fundamental problem for the analysis of graphs. Cliques are classical (perhaps extreme) examples of cohesive subgraphs. In this talk we will consider a natural relaxation of cliques, namely, the so-called γ-complete graphs, which are obtained by relaxing the degree constraint on clique vertices. More precisely, G[S] is a γ-complete graph, with γ∈(0,1], if every vertex of S has degree at least γ(|S|-1) in G[S].
We will consider the problem of finding γ-complete subgraphs of maximum order (that is, maximum number of vertices) and study its complexity w.r.t. to γ. We will give an overview of the proofs of the complexities by explaining the various techniques used. In particular, we show that the problem is W[1]-hard for 0.5 ≤ γ < 1 when the parameter is the order of the subgraph. Moreover, we will show that the problem is fixed-parameter tractable (FPT) when parameterized by the h-index of the input graph, thus solving an open question in the literature.

• Salle: TeamsVendredi, le 09 avril 2021, de 11h à 12h00, Salle: Teams

Riccardo Guidotti: Evaluating Local Explanation Methods on Ground Truth
Résumé : Evaluating local explanation methods is a difficult task due to the lack of a shared and universally accepted definition of explanation. In the literature, one of the most common ways to assess the performance of an explanation method is to measure the fidelity of the explanation with respect to the classification of a black box model adopted by an Artificial Intelligent system for making a decision. However, this kind of evaluation only measures the degree of adherence of the local explainer in reproducing the behavior of the black box classifier with respect to the final decision. Therefore, the explanation provided by the local explainer could be different in the content even though it leads to the same decision of the AI system. We propose an approach that allows to measure to which extent the explanations returned by local explanation methods are correct with respect to a synthetic ground truth explanation. Indeed, the proposed methodology enables the generation of synthetic transparent classifiers for which the reason for the decision taken, i.e., a synthetic ground truth explanation, is available by design. Experimental results show how the proposed approach allows to easily evaluate local explanations on the ground truth and to characterize the quality of local explanation methods.

• Jeudi, le 01 avril 2021, de 13h00 à 14h00
(Colocated with Loria’s Colloquium, TEAMS)

Mathieu d’Aquin: Data and knowledge as commodities
Résumé :While data has become increasingly available in the last few years, those data and the models used to analyse them are becoming less and less interpretable. In other words, the challenge of turning such vast amounts of data into exploitable knowledge is still present. In this presentation, I aim to describe ongoing efforts to address this challenge by combining current data mining and machine learning techniques with traditional, symbolic methods for artificial intelligence based on explicit knowledge representations and inferences. In particular, taking examples from projects in education, smart cities and the digital humanities, I show how the legacy of the semantic web, especially web-scale knowledge graphs and ontologies, can support intelligent methods for data understanding and the interpretability of machine learning models.

• Jeudi le 25 mars 2021, de 11h à 12h00, Salle: Teams

João Bento: Explainability for Sequential Decision-Making
Résumé : Machine learning has been used to aid decision-making in several domains, from healthcare to finance. Understanding the decision process of ML models is paramount in high-stakes decisions that impact people’s lives, otherwise, loss of control and lack of trust may arise. Often, these decisions have a sequential nature. For instance, the transaction history of a credit card must be considered when predicting the risk of fraud of the most recent transaction Although RNNs are state-of-the-art models for many sequential decision-making tasks, they are perceived as black-boxes, creating a tension between accuracy and interpretability. While there has been considerable research effort towards developing explanation methods for ML, recurrent models have received relatively much less attention. Recently, Lundberg and Lee unified several methods under a single family of additive feature attribution explainers. From this family, KernelSHAP has seen a wide adoption throughout the literature; however, this explainer is unfit to explain models in a sequential setting, as it only accounts for the current input not the whole sequence. In this work, we present TimeSHAP, a model-agnostic recurrent explainer that builds upon KernelSHAP and extends it to sequences. TimeSHAP explains recurrent models by computing feature-, timestep-, and cell-level attributions, producing explanations at both the feature and time axes. As sequences may be arbitrarily long, we further propose two pruning methods that are shown to dramatically decrease TimeSHAP’s computational cost and increase its reliability. We validate TimeSHAP by using it to explain preSalle: Teamstwo RNN models in two real-world fraud detection tasks, obtaining relevant insights into these models and their predictions.

• Jeudi le 18 mars 2021, de 11h à 12h00, Salle: Teams

Pierre-Alexandre Murena: Perspectives on Minimum Complexity Analogies
Résumé : Analogies are 4-ary relations of the form “A is to B as C is to D”. When A, B and C are fixed, we call analogical equation the problem of finding the correct D. Even though this task has been shown to be simple for the human cognition, it remains extremely challenging for artificial agents. In this presentation, we introduce our recent advances in solving morphological analogies on words based on a principle of minimum of complexity. The idea of our method is to find a transformation from A to B which also applies to C and is maximally simple to describe algorithmically. We will show which new perspectives this principle of minimum complexity can open for the field of analogical reasoning: For that purpose, we demonstrate the flexibility of our approach toward various related domains, such as interactive AI, case-based reasoning, AI assistance or transfer learning. As a complement, we will discuss the underlying assumptions of our framework as well as the algorithmic challenges of such an approach, and show what these limitations imply for the application of this method to other problems.

• Jeudi le 11 mars 2021, de 14h00 à 15h00, Salle: Teams

Sébastien Destercke: Imprecision and learning
Résumé : In this presentation, we will focus on the treatment and modelling of imprecision in machine learning, and will survey different approaches proposed in the literature to tackle this issue. In particular, we will focus on imprecise data management, on the induction of sets of models rather than a single model, and on how to obtain non-punctual conservative predictions. We will devote particular attention to the latter aspect, which is particularly relevant for obtaining reliable predictive models.

• Jeudi le 04 mars 2021: Winter Pause
• Jeudi le 25 février 2021, de 11h à 12h00, Salle: Teams

Adrien Blanche: Methods to evaluate heterogeneous treatment effects
Résumé : Causal inference consists in the assessment of a relation of causality between two variables: a cause (named here the treatment) and its effect. A vast literature on how to evaluate the causal effect of a treatment now exists.
In this talk, we will present the basics of Salle: Teamsd approaches to estimate the treatment effect. We will particularly focus on the measure of heterogeneous treatment effects (or HTE), which is SalSalle: Teamsamsanding of the difference in a treatment effect regarding the differences across subjects’ characteristics. We will present methods based on decision trees (Causal Trees, Causal Forests and Generalized Random Forests) and some results of the Data Challenge of the Atlantic Causal Inference Conference 2019.

• Jeudi le 18 février 2021, de 11h à 12h00, Salle: Teams

Sanjeev Arora (Seminars MPML at IST): The quest for mathematical understanding of deep learning

• Jeudi le 11 février 2021, de 11h à 12h00, sur TEAMS:

Samantha Kleinberg (Seminars MPML at IST): Data, Decisions, and You: Making Causality Useful and Usable in a Complex World

• Jeudi le 04 février 2021, de 11h à 12h00, sur TEAMS (LORIA):

Pierre Monnin: DAGOBAH – An End-to-End Context-Free Semantic Annotation System for Tabular Data
Résumé : Large parts of both internal repositories of companies and Web data are represented in tabular formats. The ability to automatically annotate tables using knowledge graphs is crucial and paves the way towards new semantic-based services. In this talk, we will present the challenges of annotating tabular data for a company such as Orange. Orange’s research works on this task, gathered within a project named DAGOBAH, will then be introduced. We will especially focus on graph embedding techniques for column typing and cell disambiguation. Lastly, we will review the community efforts on this annotation task with the ISWC SemTab challenge.

• Jeudi le 28 janvier 2021, de 11h à 12h00, sur TEAMS (LORIA):

Fabíola Pereira: Knowledge Discovery from Temporal Social Networks
Résumé : Data is structured in the form of networks. And now? How to analyze them? Extracting knowledge of network data is not a simple task and requires the use of appropriate tools and techniques, especially in scenarios that take into account the volume and evolving aspects of the network. There is a vast literature on how to collect, process, and model social media data in the form of networks, as well as key metrics of centrality. However, there is still much to be discussed in relation to the analysis of the underlying network. In this talk, we consider that data has already been collected and is already structured as a network. The goal is to discuss techniques to analyze network data, especially considering time perspective. First, concepts related to problem definition, temporal networks and metrics for network analysis will be presented. Next, in a more practical aspect will be shown techniques of visualization and processing of temporal networks. In the end, applications with real data will be discussed, illustrating how network data knowledge extraction works from start to finish.

• Jeudi le 21 janvier 2021, de 11h à 12h00, sur TEAMS (LORIA):

Nacira Abbas: Link Key Selection Guided by Pairs of Classes
Résumé : In this talk, we will discuss link keys selection in RDF datasets. A link key is used to find identity links between two entities in the web of data. Such pair of entities determines a pair of classes which in turn can guide the selection of link keys of better quality. For this purpose, we have proposed an algorithm for selecting link keys based on their pairs of classes. We will present this algorithm and we will show its potential and effectiveness by reporting experiments on different datasets.

• Jeudi le 14 Janvier 2021, de 11h à 12h00, sur TEAMS (LORIA):

Walid Hafiane: Experiments on transfer learning architectures for biomedical relation extraction
Résumé : Relation extraction (RE) consists in identifying and structuring automatically relations of interest from texts. Recently, BERT improved the top performances for several NLP tasks, including RE. However, the best way to use BERT, within a machine learning architecture, and within a transfer learning strategy is still an open question since it is highly dependent on each specific task and domain. Here, we explore various BERT-based architectures and transfer learning strategies (i.e., frozen or fine-tuned) for the task of biomedical RE on two corpora. Among tested architectures and strategies, our *BERT-segMCNN with fine-tuning reaches performances higher than the state-of-the-art on the two corpora (1.73 % and 32.77 % absolute improvement on ChemProt and PGxCorpus corpora respectively). More generally, our experiments illustrate the expected interest of fine-tuning with BERT, but also the unexplored advantage of using structural information (with sentence segmentation), in addition to the context classically leveraged by BERT.

• Vendredi le 8 Janvier 2021, de 10h30 à 11h30, sur TEAMS (LORIA):

Miguel Couceiro: Discussions

• Jeudi le 10 Decembre 2020, de 11h à 12h00, sur TEAMS (LORIA):

Sébastien Destercke: Imprecise probabilities in supervised learning: principles and challenges
Résumé : In this talk, we will first introduce and motivate the very basic principles of imprecise probabilistic (IP) approaches, that replace precise-valued probabilities and expectations by bounded intervals. The talk will then focus on the use of IP in supervised classification, using the extension of the naive Bayes classifier as an illustration. Finally, we will briefly mention some challenges concerning the application of IP to supervised learning over combinatorial domains.

• Jeudi le 03 Decembre 2020, de 11h à 12h00, Orpailleur’s server (LORIA):

Alexandre Bazin: On Algorithmic Approaches to Recognising Causal Relations
Résumé: Inferring causal relations from observational data is a challenging, near impossible, problem. A vast literature on trying to find all or part of these relations now exists. Most of these approaches use probability-theoretic notions such as conditional independence of variables. In this talk, we will instead talk about algorithmic approaches based on approximations of Kolmogorov complexities. We will present SLOPE that uses MDL, and ERGO that uses cumulative and Shannon entropy. We will also discuss  the differences between univariate and multivariate causal relations in this framework.

• Jeudi le 26 Novembre 2020, de 11h à 12h00, Orpailleur’s server (LORIA):

Esteban Marquer: Towards a Deep Learning Approximation of FCA
Résumé : In recent years there has been an increasing interest in approaches to combine formal knowledge and artificial neural networks (NNs), called neuro-symbolic approaches. Formal concept analysis (FCA) is a powerful formal tool for understanding tabular data called formal contexts (FC). FCA can be used to generate a structured view of the data, typically an partially ordered set of formal concepts called concept lattice, that can be used as an ontology support. It can also discover implications and associations within the data, and that can be used to construct rule based decision support systems. However, there are scalability issues inherent to FCA.
Together with Ajinkya Kulkarni and Miguel Couceiro, we explore deep neural approaches to solve these scalability problems with the hope of revealing implicit information not expressed by FCA. Moreover, we investigate generative neural models, and in particular GraphRNN, to produce concept lattices from formal contexts. For that we proposed a data agnostic embedding model for formal concepts, called Bag of Attributes (BoA). We will present experimental results on generated and real-word data to support our conclusions, and to illustrate the feasibility and the limitations of this approach. This research project was financed by the Inria Project Lab HyAIAI.

• Jeudi le 19 Novembre 2020, de 11h à 12h00, Orpailleur’s server (LORIA):

Laura Zanella: Explainable Deep Learning based on Relevance
Résumé : Deep neural networks are one of the most popular techniques today to solve a wide variety of problems, e.g., in image recognition and text analysis, providing novel predictive models for complex systems. However, due to their nested complex and non-linear structure, this comes with the disadvantage of acting as a black box, not giving enough information about the internal reasoning. This black box does not allow acceptance in many application domains, where understanding the individual model predictions and trust in the model’s decision are important. In this talk we will see the description of Layer-wise Relevance Propagation (LRP), a method for explaining non-linear classifier decisions by decomposing the prediction function, which can be used as a tool for comparative analyses and for revealing the performance of different prediction strategies. Also, some applications of LRP will be presented, where a qualitative evaluation can be analyzed for better interpretation of the models.

• Jeudi le 12 Novembre 2020, de 11h à 12h00, Orpailleur’s server (LORIA):

Guilherme Alves: Towards fairer ML models: Beyond LimeOut and process fairness
Résumé: In this talk, we will revisit the workflow of « LimeOut » that was proposed to tackle process fairness, which essentially measures a model’s reliance on sensitive or discriminatory features. We will present and extension of LimeOut, namely FixOut, that is based on different explanation methods: here, we will focus on SHAP. We present different experiments on multiple datasets and several state of the art classifiers, which show that FixOut’s classifiers improve (or at least maintain) not only process fairness but also other fairness metrics such as individual and group fairness, equal opportunity, and demographic parity. We will also discuss ongoing research and implementation work that we are carrying out.

• Jeudi le 05 Novembre 2020, de 11h à 12h00, Orpailleur’s server (LORIA):

Felix Gaschi: Neighbourhood-sensitive maps: a lead towards more robust embedding alignments
Résumé : With the availability of large monolingual corpora and the scarcity of parallel bilingual ones, several unsupervised or weakly-supervised methods have been proposed to align language representations learned on separate monolingual data. Some popular unsupervised machine translation models learn a linear mapping between one source language and one target language. These methods work well with standard datasets. But it has been shown that they often fail outside a high-resource same-domain setting and for more dissimilar language pairs.
In this talk, we will introduce one lead for proposed namely by Ndapa Nakashole to make more robust embedding alignment: through a neighborhood-sensitive mapping. We will firstly review quickly the mapping-based embedding alignment approaches and their pitfalls before studying in detail the NORMA model proposed by Ndapa Nakashole and discuss how it would be relevant for our work in shared language representation for low-resource data in the medical domain.

• Jeudi le 29 Octobre 2020: BDA 2020 (registration and program)
• Jeudi le 22 Octobre 2020: Pause Toussaint
• Jeudi le 15 Octobre 2020, de 11h à 12h00, TBA (LORIA):

Everyone: Open discussion

• Jeudi le 08 Octobre 2020, de 11h à 12h00, Orpailleur’s server (LORIA):

Claire Theobald: A Bayesian Neural Network based on Dropout Regulation
Résumé : Bayesian Neural Networks (BNN) have emerged recently in the Deep Learning world for estimating uncertainty in classification tasks, and they are used in many application domains such as astrophysics and autonomous driving. BNNs assume a prior over the weights of an NN instead of point estimates, enabling in this way the estimation of both aleatoric and epistemic uncertainty of the model prediction. Moreover, a particular type of BNN, namely MC Dropout, assumes a Bernoulli distribution on the weights by using Dropout.
Several attempts to optimize the dropout rate exist, e.g., using a variational approach. In this paper, we present a new method called « Dropout Regulation » (DR), which consists of automatically adjusting the dropout rate during training using a controller as used in automation. DR enables a precise estimation of uncertainty that is comparable to the state-of-the-art while remaining simple to implement.

• Jeudi le 01 Octobre 2020, de 11h à 12h00, Salle B013 & Serveur Orpailleur:

Laurine Huber: Do sentence embeddings capture discourse properties of sentences from Scientific Abstracts?
Résumé: In this talk, we will present our work on whether sentence embeddings capture discourse properties of sentences, and present some preliminary results. We will first remind what is the discourse structure of a text and explain why it is useful to consider it in an accurate document representation, and thus sentence representation. We will then present tasks to evaluate whether discourse properties of sentences can be detected from their embeddings. We show that recent contextual sentence embeddings do not perform better than Bag-of-Words for the identification of three discourse relations commonly used in sentences from Scientific Abstracts.  We will discuss what these results underline, namely that these discourse relations are based on particular phrasing that allow non-contextual embeddings to perform well.

• Jeudi le 24 septembre 2020, de 11h à 12h00, Salle B013 & Serveur Orpailleur:

Tatiana Makhalova: Data Topology and Closure Structure
Résumé : The goal of Knowledge Discovery is to analyze and discover interesting and reusable patterns from data. For binary data one usually is interested in mining patterns, i.e. itemsets, implications, and association rules. Depending on the selected interestingness measure -quality or preference criteria- one obtains very different sets of patterns that summarize a given dataset. However, these resulting sets of patterns provide a specific data description but nothing about the « intrinsic structure » of the dataset. In this presentation, we introduce the notion of « closure structure » that represents the intrinsic structure underlying a dataset and provides a succinct representation of patterns. We will discuss the benefits of the closure structure in both theoretical and practical terms, presenting theoretical properties as well as practical uses of the closure structure in applications.

• Jeudi le 17 septembre 2020, de 11h à 12h00, Salle B013 & Serveur Orpailleur
(contacter miguel.couceiroATloria.fr en cas de difficulté):

Amedeo Napoli: Computing Functional and Similarity Dependencies in the Framework of FCA (continuation)
Résumé: In May 2020, we presented some elements about Knowledge discovery (KD) in complex datasets and the mining of interesting patterns in the framework of Formal Concept Analysis (FCA). The focus was on the discovery of functional dependencies in numerical datasets. This presentation is the continuation of the preceding one.
We will first recall important elements about the discovery of functional dependencies and then discuss an important variation which is the discovery of approximate dependencies. The discovery of functional dependencies is based on the discovery of implications and can be achieved either using plain FCA and a suitable « binarization » of the numerical dataset, or using the so-called « Partition Pattern Structures ». In particular, Partition Pattern Structures provide a general basis for the discovery of several types of dependencies in datasets, such as functional, similarity, order and gradual dependencies. These two ways of formalizing dependency discovery, namely « binarization » and « partition pattern structures » will be discussed during the presentation.
In addition, an interesting parallel that can be drawn between the discovery of functional dependencies and the discovery of approximate dependencies within the formalism of pattern structures will be also discussed.

• Jeudi le 10 septembre 2020, de 11h à 12h00, Salle C103 (LORIA):

Miguel Couceiro: LimeOut: An ensemble approach to improve process fairness
Résumé: In this talk  we address the question of « process fairness ».  More precisely, we consider the problem of making classifiers fairer by reducing their dependence on sensitive features while increasing (or, at  least, maintaining) their accuracy. To achieve both, we draw inspiration from « dropout » techniques in neural based approaches, and propose a  framework that relies on « feature drop-out » to tackle process fairness. We make use of « LIME Explanations » to assess a classifier’s fairness and  to determine the sensitive features to remove. This produces a pool of classifiers (through feature dropout) whose ensemble is shown empirically  to be less dependent on sensitive features, and with improved or no impact on accuracy.
This basic framework will be presented next week at XKDD 2020 (joint work with Vaishnavi Bhargava, Amedeo Napoli and myself), but we will also informally discuss several developments being carried out in collaboration with Guilherme Alves.

• Jeudi le 18 juin 2020, de 11h à 12h00, Serveur Orpailleur
(contacter miguel.couceiroATloria.fr en cas de difficulté):

Adrien Coulet: Contributions of the PractiKPharma project to knowledge extraction and comparison in pharmacogenomics
Résumé : PractiKPharma (2016-2020) is a CS project applied to the biomedical domain of pharmacogenomics (PGx) that studies the impact of genomics on drug response phenotypes. This seminar will present our efforts on extracting, structuring and comparing elements of PGx knowledge. These tasks are of interest especially because of the highly heterogeneous quality of the available knowledge in PGx. In particular, through sources, elements of knowledge have very different levels of validation and of completeness. I will present:
i) PGxCorpus, a manually annotated corpus, designed to train automated approach for knowledge extraction from texts;
ii) PGxLOD, a linked open data mash-up; and
iii) two complementary approaches (one symbolic, one numeric) enabling the comparison of elements of PGx knowledge of various provenance.
I will end with the remaining challenges.

• Jeudi le 11 juin 2020, de 11h à 12h00, Serveur Orpailleur
(contacter miguel.couceiroATloria.fr en cas de difficulté):

Brieuc Conan-Guez: Hierarchical Community Analysis in Large Networks
Résumé : In this work, we propose a new efficient agglomerative algorithm for hierarchical clustering analysis (HCA) of large networks. This algorithm is an adaptation of one proposed in 2011 that was originally envisioned for HCA of pairwise dissimilarities. Our adaptation performs a greedy optimization of modularity, a widely used measure for network partitioning. We show that, thanks to adapted data structures, it achieves lower running time than other agglomerative algorithms, while producing comparable solutions.
We will also present its comparison to well known methods, such as that of Louvain, both on simulated and real-world data sets. For both methods, we will investigate improved solutions obtained through single-level and multilevel refinements.

• Jeudi le 04 juin 2020, de 11h à 12h00, Serveur Orpailleur
(contacter miguel.couceiroATloria.fr en cas de difficulté):

Pierre Monnin: Knowledge-Based Matching of n-ary Tuples
Résumé : An increasing number of data and knowledge sources are accessible by human and software agents in the expanding Semantic Web. Sources may differ in granularity or completeness, and thus be complementary. Consequently, they should be reconciled in order to unlock the full potential of their conjoint knowledge. In particular, units should be matched within and across sources, and their level of relatedness should be classified into equivalent, more specific, or similar. This task is challenging since knowledge units can be heterogeneously represented in sources (e.g., in terms of vocabularies). In this presentation, we focus on matching n-ary tuples in a knowledge base with a rule-based methodology. To alleviate heterogeneity issues, we rely on domain knowledge expressed by ontologies. We tested our method on the biomedical domain of pharmacogenomics by searching alignments among 50,435 n-ary tuples from four different real-world sources. These results highlight noteworthy agreements and particularities within and across sources.

• Jeudi le 28 mai 2020, de 11h à 12h00, Serveur Orpailleur
(contacter miguel.couceiroATloria.fr en cas de difficulté):

Amedeo Napoli: Computing Functional and Similarity Dependencies in the Framework of Formal Concept Analysis
Résumé : Knowledge discovery (KD) in complex datasets and, especially, the mining of interesting patterns can be performed using several frameworks. In this presentation, we will discuss the framework of Formal Concept Analysis (FCA) and focus on the discovery of functional dependencies in numerical data sets.
FCA starts with a binary context and outputs a concept lattice, which can be visualized, navigated and interpreted by human agents, as well as processed by software agents. FCA can be extended into Pattern Structures, where objects may have complex descriptions such as numbers, sequences, trees, and graphs. Descriptions can be compared and a pattern concept lattice can be built accordingly.
The discovery of functional dependencies is based on implication discovery and can be achieved either by using plain FCA and a suitable « binarization » of the numerical dataset, or by using partition pattern structures. In particular, Partition Pattern Structures provide a general basis for discovering several types of dependency in datasets, such as functional, similarity, order and gradual dependencies. These two ways of formalizing dependency discovery will be surveyed and discussed during the presentation.

• Mardi le 19 mai 2020, de 11h à 12h00, Serveur Orpailleur
(contacter miguel.couceiroATloria.fr en cas de difficulté):

Frédéric Pennerath: An Introduction to Bayesian Deep Learning
Résumé : Bayesian Deep Learning (BDL) fills an important gap in the current deep neural networks, no matter powerful they are: in figurative terms, one could say BDL gives to AI the introspective ability to assess its own level of ignorance due to a lack of observations. In more technical terms, BDL adopts the view of Bayesian statistics by replacing the weights of neural networks by distributions. While this idea is nothing new, BDL has recently undergone new developments, thanks in particular to the seminal work of Y. Gal.
This presentation is designed to be a gentle introduction of the main concepts of BDL. It introduces the required notions of Deep Learning and Bayesian statistics before developing the BDL framework. It will not assume any particular piece of knowledge, but some general notions in machine Learning and Neural Networks.

• Jeudi le 14 mai 2020, de 11h à 12h00, Serveur Orpailleur
(contacter miguel.couceiroATloria.fr en cas de difficulté):

Alain Gély: On representations of lattices
Résumé : A lattice representation is a formalism to specify a given lattice. Classical representations are include formal contexts or implicational bases. In this talk we discuss several representations suitable for particular lattice classes, starting from Birkhoff’s representation theorem of distributive lattices as well as certain generalizations of this theorem, such as that based on the notion of « core of a lattice » by V. Duquenne, or that based on the notion of « coloured poset » by L. Nourine. We will also discuss Guigues-Duquenne implicationnal bases, which constitute another noteworthy representation of lattices.

• Mercredi le 6 mai 2020, de 11h à 12h00, Serveur Orpailleur
(contacter miguel.couceiroATloria.fr en cas de difficulté):

Nyoman Juniarta: Chordalysis, a scalable method to discover multi-way associations among features.
Résumé : In this talk, we consider datasets where objects are described by a set of attributes. It could be interesting to find some correlations among attributes, either to predict the value of an attribute, or to eliminate some attributes. Log-linear analysis (LLA) is one of the methods that discover these correlations, by creating the simplest log-linear model, i.e. a model that can describe the whole dataset with a smallest number of correlations.
In this talk, we will revisit Chordalysis, originally proposed by Petitjean et al. (2013). This method was motivated by a limitation of basic LLA, which is the exponential number of operations since it considers every combination of attributes. Chordalysis solves this problem by focusing on a special case of the log-linear model: the decomposable model, whose graph is chordal (hence the name). By focusing on this model, Chordalysis can scale LLA to dataset with thousands of attributes.

• Jeudi le 30 avril 2020, de 11h à 12h00, Serveur Orpailleur
(contacter miguel.couceiroATloria.fr en cas de difficulté):

Miguel Couceiro: Analogical proportions: from Boolean to nominal and beyond…
Résumé : Analogical proportions provide a tool for formalizing analogical inference in a way that has been empirically verified to be efficient in various classification and reasoning tasks. Analogical proportions are statements of the form a is to b as c is to d, where a, b, c, d are tuples of attribute values describing objects, instances, etc. The mechanism of analogical inference started to be better understood when the characterization of the class of classifiers with which the analogical inference always agrees was established for Boolean attributes. This result was recently extended to nominal attributes.
The purpose of this talk is to discuss some key cognitive aspects related with analogical proportions (inference and creativity), recent applications in AI and ML, and present some characterization results, namely, the description of classifiers compatible with the analogical principle in the nominal case. If time allows, we will also discuss some open problems and potentially beneficial cross-field exchanges.

• Jeudi le 23 avril 2020, de 11h à 12h00, Serveur Orpailleur
(contacter miguel.couceiroATloria.fr en cas de difficulté):

Laurine Huber: Leveraging discourse structures for text mining
Résumé : In this talk, we will present little-known structures of texts that exist above the level of the sentence, namely discourse and argumentation structures. We will survey several formalisms and annotated corpus and will discuss how they could be leveraged to learn new knowledge from texts. We will also present some work on the alignments between two formalisms: Rhetorical Structure Theory and the macro-structure of argumentation.

• Jeudi le 16 avril 2020, de 11h à 12h00, Serveur Orpailleur
(contacter miguel.couceiroATloria.fr en cas de difficulté):

Esteban Marquer: Structured data representation for Deep Learning
Résumé : In this talk we will speak of the problem of data representation for Deep Learning (DL), focusing in particular on the representation of lattices and concept lattices as graphs. We will survey recent approaches to graph representation and present some peculiarities of lattices when thought of as graphs. As we will see, this will lead to an approach to efficiently represent lattices for DL. If time allows, we will also briefly discuss the problem of representing formal knowledge, namely, that of embedding knowledge and distributional information in the same space to improve the performance downstream tasks.

• Mardi le 10 mars 2020, de 13h à 14h00, Salle B013 (LORIA):

Georgios Zervakis: Integration of symbolic knowledge into DL
Résumé : In the first part of the talk we will present the problem of integrating symbolic knowledge into deep learning approaches, including a starting experimental scenario and objectives. The second part will be dedicated to related work. We will point out some methods for improving word representations derived from deep learning systems, with the addition of external knowledge. We will then explore strategies to i) learn embeddings of actions in images, and ii) use these embeddings to do inference by analogy in image retrieval tasks.

• Mardi le 25 février 2020, de 13h à 14h00, Salle B013 (LORIA):

Alexandre Bazin: Recent results and ongoing works on formal concept analysis, feature selection and knowledge representation
Résumé : In this talk, we will briefly present the contents of unpublished, unfinished or unbegun papers. Topics will vary between formal concept analysis, feature selection/interpretation and knowledge representation. The non-exhaustive list includes algorithms for covering the relations in a formal contexts with « good » concepts, bounds on the maximal number of n-dimensional concepts in a context, a condensed representation of multidimensional association rules, an approach for selecting the most predictive and discriminant features in a dataset, a multidimensional generalisation of the relational concept analysis framework, a constraint programming approach to enumerating hypergraph minimal transversals and an approach for reconstructing the knowledge in a UML diagram with a patchwork of existing ontologies.

• Mardi le 11 février 2020, de 13h à 14h00, Salle B013 (LORIA):

Claire Theobald: Measuring Uncertainty in Deep Learning Models with Bayesian Deep Learning
Résumé : Over the past years, advances in Deep Learning models have been nothing short but spectacular. Nowadays, Neural Networks are able to learn patterns from huge amounts of data, and are able to predict such patterns with incredible accuracy. However, in most real-world applications, measuring the degree of confidence over a prediction is often necessary and it is lacking in conventional Deep Learning models. For instance, predictions over noisy or unknown data should be analyzed more carefully before making a decision. Bayesian Deep Learning (BDL) offers a framework to measure such a uncertainty in Deep Learning Models. In BDL, instead of having simple point estimates of the model’s parameters, probability distributions are placed over them, which enables the use of inference techniques to compute the uncertainty that the model has over a prediction. It also allows to separate the uncertainty due to the noise inherent to the data (aleatoric uncertainty), from the uncertainty due to the model itself (epistemic uncertainty). Finally, BDL offers a framework for Active Learning, which is a learning technique where the model itself choses which data are the most informative for learning, instead of feeding data to the model randomly. Such a technique increases the efficiency of the learning process, which is useful in certain real-world applications where data is scarce.

• Mardi le 28 janvier 2020, de 13h à 14h00, Salle A006 (LORIA):

Vaishnavi Bhargava: Automatic Neighbourhood design using localized model interpretation
Résumé : In the presentation, I would be discussing the work I did during my internship at LACODAM team. Machine Learning is an emerging field and has got a lot of applications today. However, A model is too complex to be readable by a human being because the function learned by this model is itself complex. Thus, it is unreasonable to seek a simple explanation of the model as a whole. On the other hand, when we look more locally (at the neighborhood of an example), the model is more likely to be simple. This hypothesis is at the root of approaches such as LIME and Anchors which respectively look for a local linear approximation of the model, and a neighborhood of the target example on which the value predicted by the model is constant. Both approaches currently lack a process to derive the appropriate neighborhood. On the one hand, LIME lets the user search for himself the concept of a neighborhood that suits him best (without any tool defining « better »). On the other hand, Anchors build a neighborhood in which the function learned is constant, but it loses information on the conditions that would bring out of this constant area. The main aim of this project was to improve upon the LIME approach by developing a metric that could best describe the neighborhood, and find out the instances for which LIME could not find accurate explanations.

Laura Zanella: Mining PubMed abstracts on rare diseases
Résumé : Rare Diseases (RDs) are considered those that affect fewer than 1 in 2,000 persons. Today there are between 5,000 to 7,000 RDs affecting about 30 million European Union citizens (prevalence at population level is 3 to 4%). Understanding RDs is a big challenge. They represent a major public health problem. Indeed, for most of these diseases there is no available cure, they are often chronic, progressive, degenerative, disabling and life-threatening. Moreover, they involve very young children in more than 50% of cases, accounting for more than one third of all deaths occurring during infancy. However, RDs can be considered as an opportunity to better understand non-rare diseases, the role of genomics, and find new perspectives in drug usages, especially, in the context of personalized medicine. The main goal of this project is to extract and structure different features concerning a rare disease: symptoms, diseases related to the main disease, treatments and possibly the evolution of the disease. Texts that will be analyzed are coming from PubMed, a platform collecting millions of publications in the medical domain. The project aims at developing new methods and tools for supporting knowledge discovery from textual data by combining methods from Natural Language Processing (NLP) and Knowledge Discovery in Databases (KDD). A key idea is to design an interacting and convergent process where NLP methods are used for guiding text mining and KDD methods are used for analyzing textual documents. Such a synthesis can be very useful for documenting the disease and allows to consider wider collections of documents. This will be a very useful complement to the manual process which is used at the moment for documenting a rare disease.

• Mercredi le 15 janvier 2020, de 12h à 13h00, Salle B013 (LORIA):

Guilherme Alves: Similarity Measure Selection for Categorical Data Clustering
Résumé : Data clustering is a well-known task in data mining and it often relies on distances or, in some cases, similarity measures. The latter is indeed the case for real world datasets that comprise categorical attributes. Several similarity measures have been proposed in the literature, however, their choice depends on the context and the dataset at hand. In this paper, we address the following question: given a set of measures, which one is best suited for clustering a particular dataset? We propose an approach to automate this choice, and we present an empirical study based on categorical datasets, on which we evaluate our proposed approach.

• Mardi le 22 octobre 2019, de 11h à 12h00, Salle A006 (LORIA):

Rosana Veroneze: Enumerative algorithms for biclustering: expanding and exploring their potential in bioinformatics and neuroscience
Résumé : Biclustering is a data analysis technique successfully applied in various domains. Biclustering, Frequent Pattern Mining (FPM), Formal Concept Analysis and Graph Theory (in part., the problem of finding bicliques in a bipartite graph) are closely related to each other. In these areas, we have many algorithms for enumerating all maximal biclusters of ones in a binary dataset that have four interesting properties: (1) efficiency (have time complexity linear in the number of biclusters and polynomial in the input size), (2) completeness (find all maximal biclusters), (3) correctness (all biclusters obey the user-defined measure of internal consistency), and (4) non-redundancy (all the obtained biclusters are maximal and the same bicluster is not enumerated more than once). Since her PhD, Rosana Veroneze has been pursuing these four properties in the development of enumerative biclustering algorithms for numerical (not only binary, but also integer or real-valued) data matrices. In this talk we will address challenges that are intrinsic to the enumerative algorithms. Among these challenges, we can highlight: the constant pursuit for better algorithms in terms of runtime and memory usage; the selection of relevant biclusters (or ranking biclusters) since enumerative algorithms can return a huge amount of patterns and many of them are irrelevant; the provision to the expert (eg a physician) of a compact set (or list) of biclusters that are both very relevant and poorly redundant; and data pre-processing and parameter setting. We will also survey the interplay between biclustering and associative rules in the context of supervised descriptive pattern mining and associative classification. Lastly, we will dscuss applications of biclustering enumerative algorithms in real-world problems, specially in the analysis of brain activity data and gene expression data, in connection with the Brazilian Institute of Neuroscience and Neurotechnology (BRAINN).

Du mardi le 03.09.2019 au jeudi 05.09.2019 en Salle A008 ou à l’amphithéâtre (LORIA)

• Vendredi le 03 mai 2019, de 11h00 à 12h00, Salle A008 (LORIA):

Abdelkader Ouali: Pattern set mining using integer linear programming
Résumé : In this seminar, I will present a hybrid approach based on Integer Linear Programming that responds effectively to different queries of pattern set mining. The interest of this approach is illustrated by two well-known problems in data mining: conceptual clustering and tiling. Finally, I will present some perspectives on how to use constraint programming to address some multicriteria tasks in data mining.

• Jeudi le 02 mai 2019, de 11h00 à 12h00, Salle A008 (LORIA):

Alexandre Bazin: Rules in Symbolic Data
Résumé : Rules are one way of expressing the regularities found in data. They can either be analysed by humans, in order to better understand the situation modelled by the data, or used by machines to perform inference. In this presentation, we will talk about the rules that can be found in symbolic (binary) data. We will discuss how to reduce their numbers both for human analysis (with loss of information) and for inference (without loss of information).

• Mardi le 23 avril 2019, de 11h00 à 12h00, Salle A008 (LORIA):

Valia Mitsou: Limitation of treewidth for problems beyond NP
Résumé : In this seminar, we take a closer look at the parameterized complexity of problems belonging in $\Sigma^p_2$ and $\Pi^p_2$, the second level of the polynomial hierarchy. We provide tight fine-grained bounds on their complexity with respect to the most important structural graph parameter, the treewidth. We observe that these problems exhibit similar behavior: we show that a variety of diverse problems including ∃∀SAT, Choosability, as well as various problems from AI such as Abduction and Abstract Argumentation, while they admit a $2^{2^{O(tw)}}$ algorithm, they cannot be solved in time $2^{2^{o(tw)}}$ under the Exponential Time Hypothesis.

• Jeudi le 04 avril 2019, de 14h00 à 15h00, Salle C005 (LORIA):

Lydia Boudjeloud-Assala: Human in the loop: Visual -massive and temporal- data mining
Résumé : I will present a review of my research on cooperative approaches combining interactive visualization methods and automatic methods (attribute selection, clustering, biclustering, and outlier detection) for knowledge discovery in massive and temporal data. I will present different ways to involve the user in the data mining process in order to improve his confidence and understanding of the models and their results. New challenges for these methods, which must be able to handle not only increasing amounts of data but also data that can change over time, will also be presented and discussed.

• Mercredi le 03 avril 2019, de 11h00 à 12h00, Salle B013 (LORIA):

Samir Loudni: From Skypatterns to Skypatterns cube
Résumé : Discovering useful patterns from data is an important field in data mining for data analysis and is used in a wide range of applications. Many approaches have promoted the use of constraints to focus on the most promising knowledge according to a potential interest given by the final user. Using the dominance relation is a recent trend in constraint-based data mining to produce useful pattern sets.
The notion of skyline queries [Börzsönyi et al., 2001] has been recently integrated into the pattern discovery paradigm to mine skyline patterns (henceforth called skypatterns) [Soulet et al., 2011]. Given a set of measures, skypatterns are based on a Pareto-dominance relation for which no measure can be improved without degrading the others. Skypatterns are highly interesting since they do not require thresholds on the measures and the dominance relation gives them a global interest with a semantics easily understood by the user. In this talk, I will present a flexible and efficient approach to mine skypatterns using the constraint programming framework.
In practice, users do not know the exact role of each measure and it is difficult to select beforehand the most appropriate set of measures. Users would like to keep all potentially useful measures, look what happens on skypattern sets by removing or adding a measure, thus, evaluating the impact of measures and then converge to a convenient skypattern set. To address this issue we introduce the notion of Skypattern Cube which is the lattice of all possible measure combinations associated with their skypattern sets. By comparing two neighboring nodes (differentiated by a single measure), users can observe new skypatterns and those that die out, which greatly contributes to better understand the role of the measures.

• Mercredi le 27 mars 2019, de 11h00 à 12h00, Salle B013 (LORIA):

Saïd Jabbour: Data mining by constraints
Résumé : In this presentation, I will address the issue of data mining by constraints. I will present some contributions on declarative approaches for different data mining tasks including frequent itemset mining and its various condensed forms, association rules, sequential pattern mining, uncertain frequent itemset mining and gradual itemset mining. To highlight the cross-fertilization between symbolic AI and data mining, I will show how linear constraints can be used to summarize large graphs, how data mining techniques can be exploited to compress Boolean formulas and finally how the concept of symmetry, widely explored in SAT/CP, is extended to pattern mining.

• Jeudi le 28 février 2019, de 11h00 à 12h00, Salle A008 (LORIA):

Marc Plantevit : Pattern Mining in Augmented Graphs
Résumé : Graphs are a powerful mathematical abstraction that enables to depict many real world phenomena. Vertices describe entities and edges identify relations between entities. Such graphs are often augmented with additional pieces of information. For instance, the vertices or the edges are enriched with attributes describing them and are called vertex (respectively edge) attributed graphs. Graphs can also be dynamic, i.e., the structure and the values of vertex attributes may evolve through time. The discovery of patterns in such graphs may provide actionable insights and boost the user knowledge. In this talk, I will discuss the different pattern domains for augmented graphs I contributed to define. This includes the discovery of exceptional attributed subgraphs in edge or vertex attributed graphs. Then, I will discuss how to find patterns of higher interest by taking into account the domain knowledge, user feedback and user’s prior knowledge through different examples.

• Lundi le 02 juillet 2018, de 14h00 à 15h00, Salle B013 (LORIA):

Martin Trnecka : Boolean Matrix Factorization from Formal Concept Analysis Perspective
Résumé : Boolean matrix factorization (BMF) is becoming an established method for analysis and preprocessing of data. The existing BMF methods are based on various types of heuristics because the main computational problems involved are known to be provably hard. The heuristics employed, however, use only a limited theoretical insight regarding BMF. This talk attempts to show that a better understanding of the geometry of Boolean data, provided by formal concept analysis, results in a better understanding of BMF, theoretically justiﬁed heuristics, and better algorithms.

• Jeudi le 07 juin 2018, de 10h30 à 12h30, Salle B011 (LORIA):

10h30-11h15. Nyoman Juniarta: Sequential Pattern Mining using FCA and Pattern Structures for Analyzing Visitor Trajectories in a Museum
Résumé : The presentation is about mining visitor trajectories in Hecht Museum (Haifa, Israel), within the framework of CrossCult European Project about cultural heritage. We present a theoretical and practical research work about the characterization of visitor trajectories and the mining of these trajectories as sequences. The mining process is based on two approaches in the framework of FCA, namely the mining of subsequences without any constraint and the mining of frequent contiguous subsequences. Both approaches are based on pattern structures. In parallel, a similarity measure allows us to build a hierarchical classification which is used for interpretation and characterization of the trajectories w.r.t. four well-known visiting styles.

11h15-12h00. Tatiana Makhalova: A first study on what MDL can do for FCA
Résumé : Formal Concept Analysis can be considered as a classification engine able to build classes of objects with a description or concepts and to organize these concepts within a concept lattice. The concept lattice can be navigated for selecting significant concepts. Then the problem of finding significant concepts among the potential exponential number of concepts arises. Some measures exist that can be used for focusing on interesting concepts such as support, stability, and other. MDL (minimum description length) is also a good candidate that was rarely used in FCA by now for such objective. In this talk it will be argued that MDL can give FCA practitioners a good measure for selecting significant and representative concepts.

12h00-12h30. Quentin Brabant: Extracting Decision Rules from Qualitative Data  via Sugeno Utility Functionals.
Résumé : Sugeno integrals are qualitative aggregation functions. They are used in decision making, for combining several evaluations of an alternative (with respect to different criteria, or made by several agents) into a unique value. The combination of a Sugeno integral with a « rescaling » function on each criterion is called a Sugeno utility functionals (SUF). Any SUFs can be represented by a set of multi-threshold decision rules. However, not all sets of multi-threshold rules can be represented by a single SUF. I will present functions defined as the minimum or the maximum of several SUFs. These so-called max-SUFs and min-SUFs are equivalent to sets of multi-threshold decision rules. As we will see, we can make use of these max-SUF and min-SUF as intermediary models for extracting accurate and parcimonious rule sets from empirical datasets.

• Mardi le 22 mai 2018, de 11h00 à 12h00, Salle B013 (LORIA):

Kevin Dalleau: Unsupervised Extremely Randomized Trees
Résumé : Many unsupervised learning algorithms rely on a (dis)similarity measure to evaluate the pairwise distance between samples. Despite the large number of measures already described in the literature, in many applications, the set of available metrics is reduced by intrinsic characteristics of the data and of the chosen algorithm. In this talk, I will present a novel method based on extremely randomized trees. This method, UET (Unsupervised Extremely Randomized Trees), is based on the work of T.Shi and S.Horvath and can be generalized to heterogeneous data, enabling the clustering of datasets with mixed-types variables. The evaluation of the methods shows promising results, both in terms of discrimination of instances and in terms of running time.

• Mardi le 17 avril 2018, de 11h00 à 12h00, Salle B013 (LORIA):

Paul Dorbec : Autour des ensembles dominants: des graphes et des jeux
Résumé : Dans cet exposé, nous partirons d’un des principaux problèmes de théorie des graphes, le problème de l’ensemble dominant. Nous évoquerons d’abord quelques questions classiques sur le problème, notamment au sujet des graphes cubiques. Nous en verrons ensuite une variante avec un phénomène de propagation, motivée par la question de la maintenance de réseaux électriques. Nous évoquerons enfin une variante ludique, ce qui devrait nous amener à évoquer succinctement la théorie des jeux combinatoires.

• Jeudi le 12 avril 2018, de 11h00 à 12h00, Salle B013 (LORIA):

Samir Loudni : Approches déclaratives pour la fouille de données orientée motifs
Résumé : La programmation par contrainte (PPC) et la programmation linéaire en nombres entiers (PLNE) offrent un cadre générique et flexible pour résoudre des problèmes d’optimisation sous contraintes. L’utilisation de la PPC/PLNE pour exprimer des problèmes de fouille de données possède de nombreux avantages. Le premier est d’offrir à l’utilisateur un moyen simple et déclaratif pour modéliser ses problèmes. Le second est de proposer une approche générique de résolution qui permet à l’utilisateur de ne plus devoir se préoccuper de l’écriture d’un algorithme spécifique pour chaque tâche de fouille. Dans cet exposé, je vais présenter quelques contributions concernant les apports de la PPC pour la fouille de données orientée motifs et plus particulièrement sur l’extraction de motifs séquentiels et de motifs Pareto. Ensuite, je vais montrer comment la PLNE peut être exploitée pour modéliser de tâches d’optimisation en fouille de données, comme le problème du clustering conceptuel, ou encore le tiling. Enfin, je terminerai par nos travaux en cours sur la prise en compte du principe d’équité pour générer des clusterings équilibrés en utilisant des fonctions d’agrégation collectives de type OWA (Ordered Weighted Averages).

• Mercredi le 12 mars 2018, de 11h00 à 12h00, Salle B013 (LORIA):

Yann Rivault : Le Web des données pour l’élagage et la sélection de règles d’association en pharmacovigilance
Résumé : Avec l’accroissement des données cliniques disponibles apparaissent des besoins méthodologiques pour l’aide à la décision, la prescription ou encore la surveillance. Le Web des données et les standards du web sémantique offrent des opportunités pour répondre à ces besoins. Dans le cadre d’une collaboration avec le CHRU de Nancy, nous nous intéressons à des stratégies permettant l’élagage et le classement de règles d’association. En effet, l’extraction de règles d’association voit son utilité limitée par le très grand nombre potentiel de règles extraites. Nous faisons l’hypothèse que l’apport de connaissances issues du Web des données pourrait contribuer à l’identification des règles d’association les plus pertinentes pour un système de surveillance en pharmacovigilance. Un enrichissement des données de prescriptions et des diagnostics issus de la base de données hospitalière pourrait permettre de cibler les règles d’association déjà connues des professionnels (e.g. les prescriptions indiquées pour un état de santé), ou encore de souligner les règles inattendues (e.g. les contre-indications médicamenteuses). L’objectif est ainsi de pouvoir distinguer les associations triviales des associations pertinentes. Tout d’abord nous étudions les sources de connaissances du Web des données qui pourraient permettre un élagage de règles triviales et un ciblage des règles inattendues. Enfin nous investiguons l’adaptation de mesures d’intérêt des associations dans le but d’établir un classement des règles retenues.

• Mercredi le 14 février 2018, de 11h00 à 12h00, Salle C103 (LORIA):

Joël Legrand : TreeLSTM and cross-corpus training for extracting pharmacogenomic relationships from text
Résumé : A key aspect of machine learning algorithms for relationship extraction in textual data is the availability of sufficiently large training data. Manually annotated corpora are valuable resources for this task, but the time and expertise required for their development explains why only few annotated corpora are freely available. For tasks related to precision medicine, most of them are rather small (i.e., hundreds of sentences) or they only focus on specialized relationships (e.g., drug-drug interactions) that rarely fit what one wants to extract. In this talk, I will present two lines of research that I pursued to overcome the lack of annotated data for the task of pharmacogenomic relation extraction. I will first present a TreeLSTM-based transfer learning method that allows to achieve high performance for the extraction of biomedical relationships from text, for which initial resources are scarce. Then, I will present the PGxCorpus, a manually annotated training corpus designed for the supervised extraction of pharmaco-genomic relationships from text.

• Mardi le 13 février 2018, de 11h00 à 12h00, Salle B013 (LORIA):

Michel Habib : Graph classes defined via vertex orderings avoiding patterns
Résumé : Consider a graph H given with a fixed ordering on its vertices. Then a graph G will belong to the class if and only if there exists an ordering of its vertices such that no subgraph of G induces a copy of H with the given fixed ordering. Let us give first a very classical example : chordal graphs. These are defined as graphs which do not contain any induced cycle of length at least 4. Is is well known (Dirac 1961) that chordal graphs are exactly graphs which admit a simplicial elimination ordering, that is an ordering on the vertices such that the neighbors of a vertex that are placed before it in the ordering induce a clique. This corresponds to the previous definition when H is the path on 3 vertices where the middle vertex is placed last in the ordering. There exists several results of this type : proper interval graphs, interval graphs, permutation graphs, co-comparability graphs, …, and it usually gives very good recognition algorithms, whereas in general forbidden subgraph characterization of a given class does not provide good algorithms.

• Mardi le 23 janvier 2018, de 14h30 à 15h30, Salle B013 (LORIA):

Alain Gély : Distributivité d’un treillis de concepts pour les arbres phylogénétiques
Résumé : Nous nous intéressons aux treillis distributifs dans le cadre de l’analyse formelle de concepts (FCA). La motivation primitive vient de la phylogénie et des graphes médians pour représenter les dérivations biologiques et les arbres parcimonieux. La FCA propose des algorithmes efficaces de construction de treillis de concepts. Cependant, un treillis de concepts n’est pas en correspondance avec un graphe médian sauf s’il est distributif, d’où l’idée d’étudier la transformation d’un treillis de concepts en un treillis distributif. Pour ce faire, nous nous appuyons sur le théorème de représentation de Birkhoff qui nous permet de systématiser la transformation d’un contexte quelconque en un contexte de treillis de concepts distributif. Ainsi, nous pouvons bénéficier de l’algorithmique de FCA pour construire mais aussi visualiser les treillis de concepts distributifs, et enfin étudier les graphes médians associés.

• Vendredi le 15 décembre 2017, de 11h00 à 12h00, Salle B013 (LORIA):

Tatiana Makhalova : Formal Concept Analysis: some applications in computational learning theory and machine learning
Résumé : Formal Concept Analysis and closed itemsets proved to be of big impor-tance for knowledge discovery, both as a tool for concise representation of asso- ciation rules and a tool for constructing domain taxonomies and ontologies.
In this talk, FCA will be considered in an unconventional way. We will discuss the connection of FCA with Computational Learning Theory. In particular, we will present an approach to obtaining the tight theoretical bounds of probability of overfitting and discuss why the precise bounds are not applicable in practice.The second part will be devoted to a more practical question, that is rule-based classifiers. We will see how random forests relate to frequent closed itemsets. Numerous experiments have proved that the quality of the random forests depends on such parameters as the size of bootstrap samples, the minimal size of leafs or the depth of trees as well as the size of subsets of features used to grow a tree. In the talk we will examine the question of computing a set of rules in a deterministic way in order to build an ensemble of classifiers (i.e. closed itemsets) having accuracy comparable to the random forests.

• Mardi le 05 décembre 2017, de 14h00 à 15h00, Salle A006 (LORIA):

Emmanuel Jeandel : Computability in closure spaces
Résumé : In many instances in logic or computable algebra, classical theorems show that many problems are undecidable for general structures, but become de-cidable if some rigidity is imposed on the structure. This is the case for example for groups (the rigidity criterion is then simpleness), first order logic (complete-ness), or symbolic dynamics (minimality). We will present a topological frame-work based on closure spaces and Galois connections to show that many of these proofs can be obtained in a similar setting. We will show in particular that these statements can be generalized to cover arbitrary structures, with no finite or recursive presentation/axiomatization.

• Mardi le 03 octobre 2017, de 11h00 à 12h00, Salle C103 (LORIA):

Jimmy Devillet : Enumerating quasitrivial semigroups
Résumé : We investigate the class of binary associative and quasitrivial operations on a given finite set. Here quasitriviality (also known as conserva-tiveness) means that the operation always outputs one of its input values. We also examine the special situations where the operations are commutative and nondecreasing. In the latter case, these operations reduce to discrete uninorms, which are discrete fuzzy connectives that play an important role in fuzzy logic. As we will see nondecreasing, associative and quasitrivial operations are chara-cterized in terms of total and weak orderings through the so-called single-peakedness property introduced in social choice theory by Duncan Black. This will enable visual interpretaions of the above mentioned algebraic properties. Motivated by these results, we will also address a number of counting issues: we enumerate all binary associative and quasitrivial operations on a given finite set as well as of those operations that are commutative, are nondecreasing, have neutral and/or annihilator elements. As we will see, these considerations lead to several, previously unknown, integer sequences.

• Vendredi le 23 juin 2017, de 11h30 à 12h30, Salle C103 (LORIA):

Victor Codocedo : On Locality Sensitive Hashing for Sampling Extent Generators
Résumé : In this talk we introduce a method for sampling formal concepts using locality sensitive hashing (LSH). LSH is a technique used for finding approximate nearest neighbours given a set of hashing functions. Through our approach, we are able to predict the probability of an extent in the concept lattice given a set of objects and their similarity index, a generalization of the Jaccard similarity betwe- en sets. Our approach allows defining a lattice-based amplification construction to design arbitrarily discriminative sampling settings.

• Mardi le 20 juin 2017, de 14h30 à 18h00, Salle B013 (LORIA):

14h30 – 15h00 Xavier Mazur. Clustering de séries temporelles

15h15 – 15h45 Doryan Kaced. Une approche hybride pour l’identification de biomarqueurs dans des données métabolomiques

16h00 – 16h30 Yassine Namir. D’un treillis quelconque vers un treillis distributif

16h30 – 17h00 Ilias Benjelloun. Apprentissage Relationnel Statistique :
représentation de connaissances en milieu incertain

17h00 – 17h30 Nicolas Warnet. Personnalisation d’un épisode de planification de soin : Application à la prise en charge de l’insomnie

17h30 – 18h00 Tristan Gillard. Apprentissage de connaissances d’adaptation à l’aide d’exemples positifs et négatifs

• Mardi le 20 juin 2017, de 10h00 à 12h15, Salle B011 (LORIA):

10h00 – 10h30 Quentin Brabant. From meaningful orderings in the web of data to multi-level pattern structures

10h45 – 11h15 Justine Reynaud. A proposal for classifying the content of the web of data based on FCA and pattern structures

11h30 – 12h00 Pierre Monnin. Using FCA for checking the structure of an ontology in LOD: the example of DBpedia

• Lundi le 12 juin 2017, de 16h00 à 17h00, Salle B013 (LORIA)

Matěj Stehlík: Topological bounds on the chromatic number
Résumé : The chromatic number is arguably the most studied invariant in graph theory. We are going to present some general upper and lower bounds on the chromatic number, before introducing a lower bound based on ideas from algebraic topology. This bound originates from Lovasz’s celebrated proof of Kneser’s conjecture in 1977. We will explore a new approach to these bounds using what we call higher-dimensional projective quadrangulations. Joint work with Tomas Kaiser.

• Jeudi le 08 juin 2017, de 14h00 à 15h00, Salle C103 (LORIA)

Pauli Miettinen: Hyperbolic Communities: Modelling Communities Beyond Cliques
Résumé : Finding and studying communities in (social) networks is one of the key problems in (social) network analysis. Much of this research models the commu-nities as (quasi-)cliques. In recent years, there has been an increasing interest to go « beyond cliques », and consider a more varied collection of community patterns. In this talk, I will present our recent hyperbolic community model that, on the one hand, generalises the classical core-periphery community model from sociology, and on the other hand, contains graph patterns such as cliques, power-laws, and stars as special cases. The empirical evidence shows that our model fits very well to many real-world graphs, and that it also allows us to understand the shape of the communities in that graph. I will also discuss our ongoing work on finding these communities using novel methods based on tropical matrix factorisations and on our novel characterisation of nested matrices.

• Jeudi, le 06 avril 2017, de 14h00 à 15h00, Salle: C103 (LORIA):

Lydia Boudjeloud: Approches coopératives et semi-interactives pour le traitement de données massives et temporelles
Résumé :  Je présenterai un bilan de mes travaux de recherche sur les approches coopératives combinant des méthodes interactives de visualisation et des méthodes automatiques de sélection de variables, de clustering et Bi-clustering, et de détection d’outlier pour l’extraction de connaissances à partir de données temporelles et de grande taille. Je présenterai différentes possibilités d’impliquer l’utilisateur dans le processus de fouille afin d’améliorer sa confiance et sa compréhension dans les modèles ou résultats obtenus. De nouveaux challenges pour ces méthodes, qui doivent pouvoir traiter non seulement des quantités de plus en plus importantes de données mais aussi des données qui peuvent changer dans le temps, seront également présentés et discutés.

• Vendredi, le 23 mars 2017, de 14h00 à 15h00, Salle C103 (LORIA):

Thomas Kipf: Deep Learning on Graphs with Graph Convolutional Networks (P. 2)
Résumé : Neural networks on graphs have gained renewed interest in the machine learning community. Recent results have shown that end-to-end trainable neural network models that operate directly on graphs can challenge well-established classical approaches, such as kernel-based methods or methods that rely on graph embeddings (e.g. DeepWalk). In this talk, I will introduce two extensions to the basic framework of defining neural networks on graphs: graph auto-encoders [1] and relational graph convolutional networks (R-GCNs) [2]. While graph auto-encoders provide a new way of approaching problems like link prediction and clustering, R-GCNs allow for efficient modeling of directed, relational graphs, such as knowledge bases (e.g. DBPedia). If time permits, I will talk about some recent work that combines such graph-based approaches with deep learning methods from the field of natural language processing, to achieve new state-of-the-art results on the task of semantic role labeling.
[1] TN Kipf and M Welling, Variational Graph Auto-Encoders, NIPS Bayesian Deep Learning Workshop (2016)
[2] M Schlichtkrull, TN Kipf, P Bloem, R vd Berg, I Titov and M Welling, Modeling Relational Data with Graph Convolutional Networks, arXiv:1703.06103 (2017)

• Mercredi, le 15 mars 2017, de 11h à 12h, Salle: B013 (LORIA):

Maria Boritchev: On Politics and Argumentation
Résumé : Argumentation mining is a relatively new topic in the field of corpus-based discourse analysis. This talk presents a work which primarily concerns with the identification of argumentative structures that can be found within dialogical discourse. Starting from data from 2016 U.S. presidential elec- tions debates, the aim is to automatically mine intertextual correspondences between political argumentation and social media reactions. Identifying the most efficient (here, reaction-rising) political discourse techniques can not only give the politicians the possibility to adjust their discourse but also give an attentive audience clues to critical thinking. During this talk I will present the current state of this work, introducing the chosen argumentation mining techniques and showing some intermediate results. Then I will finish with a discussion of the future work on intertextual correspondence mining.

• Vendredi, le 10 mars 2017, de 14h00 à 15h00, Salle B013 (LORIA):

Jan Volec: Graph limit approach in extremal combinatorics
Résumé : In the last decade, various theories of combinatorial limits emerged, and attracted a substantial attention. In this talk, I will focus on applications of combinatorial limits and the so-called flag algebra method to problems in extremal combinatorics. I will present the main ideas behind these methods and then use them to answer a problem of Erdos on pseudo-random hypergraphs, solve a conjecture of Erdos and Sos concerned with recursive graph constructions, and to study robust versions of Turan-type problems.

• Mercredi, le 8 mars 2017, de 11h00 à 12h00, Salle C003 (LORIA):

Abdallah Saffidine: The (Parameterized) Complexity of (Positional) Games
Résumé : Classical Computational Complexity is a popular tool from Theoretical Computer Science used to characterize the difficulty of solving decision problems. I will present how a couple intuitive criteria allow one to guess the most likely complexity class of games of various sort. We’ll then move to Parameterized Complexity which addresses some of the practical shortcomings of classical com- putational complexity and has seen increased adoption for studying graph-theoretic problems. I will describe ongoing work on building a better under- standing of the Parameterized Complexity of Games. As an application, we’ll show that Short Generalized Hex is W[1]-complete parameterized by the number of moves. This disproves a conjecture from Downey and Fellows’ influential list of open problems from 1999.

• Vendredi, le 14 octobre 2016, de 13h à 14h, Salle: B011 & B013 (LORIA):

Joel Legrand: Word Sequence Modeling using Deep Learning
Résumé : For a long time, natural language processing (NLP) has relied on generative models with task specific and manually engineered features. Recently, the rapidly growing interest for deep learning has led to state-of-the-art results in various fields such as computer vision, speech processing and natural language processing. The central idea behind these approaches is to learn features and models simultaneously, in an end-to-end manner, and making as few assumptions as possible. In NLP, word embeddings, mapping words in a dictionary on a continuous low-dimensional vector space, have proven to be very efficient for a large variety of tasks while requiring almost no a-priori linguistic assumptions. In this talk, I will present the results of my research on continuous representations of segments of sentences for the purpose of solving NLP tasks that involve complex sentence-level relationships. I will first introduce the key concepts of deep learning for NLP. I will then focus on two recent empirical studies concerning the tasks of syntactic parsing and bilingual word alignment. For each of them, I will present the main challenges as well as the deep learning-based solutions used to overcome them.

• Mercredi, le 5 octobre 2016, de 15h à 16h, Salle: B013 (LORIA):

François Pirot: The bounds for the distance-t chromatic number, and the effect of removing cycles
Résumé : It is well known that you can colour a graph G of maximum degree d greedily with d+1 colors. Moreover, this bound is tight, since it is reached for example by the cliques. Johansson proved with a pseudo-random colouring scheme that you can colour triangle-free graphs of maximum degree d with no more than O(d/log d) colours. This is tight up to a multiplicative constant, since you can pseudo-randomly construct a family of graphs of maximum degree d, arbitrary large girth, and chromatic number bigger than c(d/log d), for some constant c. When you introduce the distance-t chromatic number of a graph G of maximum degree d, which is simply the chromatic number of G^t, you can be interested in finding similar bounds, and proving that they are tight. It is straight-forward that d^t is an upper bound for the distance-t chromatic number, but determining how tight this bound is is far from being an easy problem. You may also wonder what happens to the bound of the distance-t chromatic number when you forbid some fixed lengths of cycles in your graph. The goal of my talk will be to give some answers to these natural questions.

• Vendredi 3 juin 2016, de 11h à 12h00, Salle: B013 (LORIA):

Pauli Miettinen: Matrix factorization models for patterns beyond blocks
Résumé : Finding patterns, or regularities, from data and using the patterns to summarise the data are in the core of data mining. In many cases, these pattens are « block-like »: when the data can be expressed as a table of values, or as a matrix, the patterns are sub-tables (or sub-matrices) that are somehow different from the other parts of the data. Clustering, community detection, and frequent itemset mining can all be considered to find « block-like » patterns, to name a few examples. Block-like patters also fit well to the language of linear algebra, where we can model the block-like patterns as (approximate) rank-1 matrices, and data summarisations as matrix factorisations. Not all patterns or summaries in data mining are easy to express as matrix factorisations, though, and in recent years, patterns that go « beyond blocks » have become under active investigation. In this talk, I will discuss why being able to express the data mining task in matrix factorisation terms is important and cover my recent and ongoing work on different types of matrix factorisations that can be used to model patterns beyond blocks.

• Mercredi, le 18 mai 2016, de 14h à 15h00, Salle: B013 (LORIA):

Gerasimos Meletiou: Secret Sharing Through Multivariate Birkhoff’s Interpolation
Résumé : The problem of finding secret sharing schemes with multilevel structures and partially ordered sets of levels of participants is faced. We investigate the idea of generalizing previous schemes (Shamir’s, Tassa’s) through multivariate Birkhoff’s interpolation.

•  Jeudi, le 12 mai 2016, de 11h à 12h00, Salle: B013 (LORIA):

Sergei O. Kuznetsov: Optimizing Therapy in Subgroups of Patients with Pattern Structures
Résumé : The effectiveness of a therapy of serious lethal diseases is measured by survival curves. A therapy is better than another one if its survival curve lies strictly above. For standard randomisation strategies accepted in medical science, two therapies may be of same quality, however if we consider subgroups of patients defined by common physiological features, one treatment can be strictly better than another for certain subgroups. Closed descriptions is a very natural way for concise representation of subgroups, so we describe an efficient approach for finding closed descriptions of groups where some treatments are definitely preferred to others. The approach can be applied in other domains like optimal credit scoring.

• Symposium Mathematics for Decision and Discovery (M4D2)

Mercredi, 11 mai 2016 (B013, LORIA): Program

• Vendredi, le 15 avril 2016, de 14h à 15h00, Salle: B013 (LORIA):

Benjamin Momège: Autour de la connexité dans les graphes avec conflits
Résumé : Nous nous intéresserons aux graphes avec conflits (un conflit est une paire d’arêtes ne pouvant pas simultanément faire partie d’un sous-graphe), dans lesquels nous étudierons différents types de problèmes, de nature aussi bien algorithmique que combinatoire, notre ligne directrice étant la notion de connectivité. Nous verrons que plusieurs résultats, simples sans conflit, ne le sont plus lors de l’ajout de conflits. Nous présenterons : des algorithmes exacts (non polynomiaux), des résultats de NP-complétude, et des conditions suffisantes assurant l’existence de certains objets (arbre couvrant, chemin et cycle Hamiltonien) sans conflits.

• Vendredi, le 15 avril 2016, de 11h à 12h00, Salle: B013 (LORIA):

Bruno Teheux: Automates, mots et décision
Résumé : Fonctions qui transforment les mots, les automates, les problèmes de décision. Dans ce séminaire, nous mettrons en évidence des liens entre ces diffé- rents ingrédients avec une équation très connue comme pivot: l’équation d’asso-ciativité.

• Mardi, le 9 février 2016, de 11h30 à 12h30, Salle: B013 (LORIA):

Résumé : Let A be a finite set. A partial clone on A is a set of partial functions closed under composition and containing all projection functions on A. We survey some results in the theory of partial clones. In particular: (1) we show the link between The Erdös-Faber-Lovsáz conjecture for graphs and combinatorial descriptions of some maximal partial clones, and (2) we give a complete classification of certain intervals of partial clones, that solves an open problem by D. Lau. These results were obtained in collaboration with (1) C. Tardif and (2) M. Couceiro, K. Schölzel and T. Waldhauser.

• Lundi, le 7 décembre 2015, de 11h30 à 12h30, Salle: B011 (LORIA):

Fabien Labernia: Conditional preferences network : an overview
Résumé : Let V be a set of binary variables. Elicitating preferences over the set of all possible assignments for V to obtain a partial (or total) order can be a very long and difficult task. Different models have been proposed to 1) help the expert to elicit these preferences (ceteris paribus preferences), 2) represent in a compact way this exponential set (conditional preference rules) and 3) catch in a formal way some preference properties (such as importance of a variable). In this presentation we will survey different models like CP-nets, TCP-nets, CP-theories and CI-nets (the different properties they catch, active learning of these models), and discuss some current research directions on these combinatorial structures.

• Mardi, le 1 décembre 2015, de 11h à 12h00, Salle: B013 (LORIA):

Bruno Teheux: A modal logic for games with lies
Résumé : In the Rényi-Ulam game, a liar picks out a number in a given search space. A second player has to guess this number by asking Yes/No questions to the liar who is allowed to lie a maximal given number of times. It is known that states of the game can be encoded in the algebraic language using MV-algebras (which are to Łukasiewicz logic what Boolean algebras are to classical propositional logic). The aim of the talk is to present an (n+1)-valued version of Propositional Dynamic Logic (PDL) that is suitable to model the interactions between players in the Rényi-Ulam game. The ingredients of this model are modal expansions of Łukasiewicz finitely-valued logics, algebras of regular programs and many-valued Kripke models. Our main result will be a completeness result. We will also discuss the gain of expressive power allowed by the new many-valued dynamic language.

• Mardi, le 24 novembre 2015, de 13h à 14h00, Salle: B013 (LORIA):

Kevin Dalleau: Suggesting valid pharmacogenes by mining linked data
Résumé : A standard task in pharmacogenomics research is identifying genes that may be involved in drug response variability, i.e., pharmacogenes. Because genomic experiments tended to generate many false positives, computational approaches based on the use of background knowledge have been proposed. Until now, those have used only molecular networks or the biomedical literature. Here we propose a novel method that consumes an eclectic set of linked data sources to help validating uncertain drug–gene relationships. One of the advantages relies on that linked data are implemented in a standard framework that facilitates the joint use of various sources, making easy the consideration of features of various origins. Consequently, we propose an initial selection of linked data sources relevant to pharmacogenomics. We formatted these data to train a random forest algorithm, producing a model that enables classifying drug–gene pairs as related or not, thus confirming the validity of candidate pharmacogenes. Our model achieves the performance of F-measure=0.92, on a 100 folds cross-validation. A list of top candidates is provided and their obtention is discussed.

• Symposium on instant data mining and learning preferences for interactive data mining (PEPS Préfute + Approppre)

Jeudi, 19 Novembre 2015 (C005, LORIA): Program

Vendredi, 20 Novembre 2015 (B013, LORIA):

Victor Codocedo: When Cyberathletes Conceal Their Game: Clustering Confusion Matrices to Identify Avatar Aliases

• Jeudi, le 1 octobre 2015, de 11h à 12h00, Salle: B013 (LORIA):

Renato Vimieiro: Mining disjunctive emerging patterns in high-dimensional data using hypergraphs
Résumé : Pattern mining is one of the most known sub-fields of data mining. Its main goal, as originally proposed by Agrawal et al., is to identify sets of attributes that are frequent amongst objects in a dataset. Most of the applications of (frequent) pattern mining involve the analysis of market basket data. In this type of data the attributes are binary and the data table is a matrix where objects are rows, attributes columns and cells indicate whether an object has an attribute. Since the beginning, the area has been extensively investigated and a wealth of efficient algorithms proposed. Although the original proposition was very successful in many domains, it was not suitable for the analysis of labelled data, i.e., data where objects had class labels. Thus one extension to frequent pattern mining, to cope with this type of data, was proposed in 1999 by Dong and Li. Their idea, named emerging patterns, was to extract patterns that were frequent amongst objects with one class label and infrequent amongst the others. Emerging patterns showed to be effective in many applications involving labelled data, particularly those in bioinformatics. In this presentation a new extension to emerging patterns will be briefly discussed. An overview of disjunctive emerging patterns will be given and an efficient algorithm based on hypergraphs and their minimal transversals for mining these patterns will be introduced.

• Lundi, le 30 juin 2015, de 12h00 à 13h00, Salle B013 (LORIA):

12h00 – 12h30 Kenny Rivalin: Systèmes de recommandation dans le cadre de l’analyse de concepts formels

12h30 – 13h00 Benjamin Maurice: Extraction de cliques et de quasi-cliques maximales sur des graphes creux

• Mardi, le 29 juin 2015, de 11h00 à 12h00, Salle C005 (LORIA):

Maurice Pouzet:  Boolean dimension of a graph and inversion index of a tournament
Résumé : The Boolean dimension of a graph G (undirected with no loops) is the least integer m for which there exist m subsets X1, … , Xm of the vertex set V(G) of G such that the edges of G are exactly the 2-element subsets e of V(G) which are included into an odd number of the Xi‘s. If G has n vertices, its Boolean dimension is at most n-1; it is n-1 if and only if G is a path. The inversion of a subset X of the vertex set V(T) of a tournament T consists to invert all arcs of T between two vertices of X. The inversion index of T is the least number, denoted by i(T), of subsets of V(T) to invert in order to transform T into an acyclic tournament. I will present the main results and two tantalizing problems: the value of the inversion index of paths of strong connectivity, a majoration of the size of obstructions of the class I(n) of finite tournaments whose inversion index is a most n (the fact that there is a bound follows from techniques of well quasi ordering). Results presented here originate in the thesis of H. Belkhechine [1]. They have been obtained in collaboration with H. Belkhechine, M. Bouaziz and I.Boudabbous and announced in [2].
[1] H. Belkhechine, Indécomposabilité des graphes et des tournois, thèse de doctorat, 15 juillet 2009, Université Claude-Bernard et Université de Sfax.
[2] H. Belkhechine, M. Bouaziz, I. Boudabbous, M. Pouzet, Inversion dans les tournois, C. R. Acad. Sci. Paris, Ser. I 348 (2010) 703-707.

• Vendredi, le 5 juin 2015, de 11h00 à 12h00, Salle B013 (LORIA):

Abdallah Saffidine: Solving Games and all of that
Résumé : Efficient best-first search algorithms have been developed for deterministic two-player games with two outcomes. We present a formal framework to represent such best-first search algorithms. The framework is general enough to express popular algorithms such as Proof Number Search, Monte Carlo Tree Search, and the Product Propagation algorithm. We then show how a similar framework can be devised for two more general settings: two-player games with multiple outcomes, and the model checking problem in modal logic K. This gives rise to new Proof Number and Monte Carlo inspired search algorithms for these settings.Similarly, the alpha-beta pruning technique is known to be very important in games with sequential actions. We propose an extension of this technique for stacked-matrix games, a generalization of zero-sum perfect information two-player games that allows simultaneous moves.

• Jeudi, le 4 juin 2015, de 11h00 à 12h00, Salle A117 (LORIA):

11h00 – 11h30. Justine Reynaud: Définition d’un langage de transformation de requêtes SPARQL

11h30 – 12h00. Quentin Brabant: Axiomatisation des intégrales de Sugeno k-maxitives

• Mardi, le 5 mai 2015, de 14h00 à 15h00, Salle B011 (LORIA):

Erkko Lehtonen:  Homomorphisms of labeled posets
Résumé : A partially ordered set labeled with k labels (a k-poset) is a triple (P, ≤, c), where (P, ≤) is a partially ordered set and c : P → {0, …, k − 1} is a labeling function. A homomorphism of a k-poset (P, ≤, c) to another k-poset (P′, ≤′, c′) is a map h : P → P′ that preserves both the order and the labels, i.e., h(x) ≤′ h(y) whenever x ≤ y and c = c′ ⚬ h. The set of all k-posets is partially ordered by the existence of homomorphism. We present some results concerning the structure of this homomorphism order. In particular, for k ≥ 2, the homomorphism order of finite k-posets is a distributive lattice that is universal for countable posets. We also discuss the complexity of certain decision problems involving labeled posets. This talk is partly based on joint work with Léonard Kwuida.

• Mardi, le 28 avril 2015, de 12h00 à 13h00, Salle B013 (LORIA):

Michel Grabisch:  Games on concept lattices
Résumé : We introduce cooperative TU-games on concept lattices, where a concept is a pair (S,S’) with S being a subset of players or objects, and S’ a subset of attributes. Any such game induces a game on the set of players/objects, which appears to be a TU-game whose collection of feasible coalitions is a lattice closed under intersection, and a game on the set of attributes. We investigate the geometrical properties of the core (non-emptiness, boundedness, pointedness, extremal rays). In particular, we derive the equivalence of the intent and extent core for the class of distributive concepts.