+ All documents
Home > Documents > Relational concept analysis: mining concept lattices from multi-relational data

Relational concept analysis: mining concept lattices from multi-relational data

Date post: 04-Dec-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
28
Ann Math Artif Intell (2013) 67:81–108 DOI 10.1007/s10472-012-9329-3 Relational concept analysis: mining concept lattices from multi-relational data Mohamed Rouane-Hacene · Marianne Huchard · Amedeo Napoli · Petko Valtchev Published online: 21 March 2013 © Springer Science+Business Media Dordrecht 2013 Abstract The processing of complex data is admittedly among the major concerns of knowledge discovery from data (kdd). Indeed, a major part of the data worth analyzing is stored in relational databases and, since recently, on the Web of Data. This clearly underscores the need for Entity-Relationship and rdf compliant data mining (dm) tools. We are studying an approach to the underlying multi-relational data mining (mrdm) problem, which relies on formal concept analysis (fca) as a framework for clustering and classification. Our relational concept analysis (rca) extends fca to the processing of multi-relational datasets, i.e., with multiple sorts of individuals, each provided with its own set of attributes, and relationships among those. Given such a dataset, rca constructs a set of concept lattices, one per object sort, through an iterative analysis process that is bound towards a fixed-point. In doing that, it abstracts the links between objects into attributes akin to role restrictions from description logics (dls). We address here key aspects of the iterative calculation such as evolution in data description along the iterations and process termination. We describe implementations of rca and list applications to problems from software and knowledge engineering. M. Rouane-Hacene (B ) · P. Valtchev Dépt. Informatique, UQÀM, CP 8888, succ. CV, Montréal, Canada, H3C 3P8 e-mail: [email protected] P. Valtchev e-mail: [email protected] M. Huchard LIRMM (CNRS – Université de Montpellier), 161 rue Ada, 34095 Montpellier Cedex 5, France e-mail: [email protected] A. Napoli LORIA (CNRS – INRIA – Université de Lorraine), B.P. 239, 54506 Vandœuvre-lès-Nancy, France e-mail: [email protected]
Transcript

Ann Math Artif Intell (2013) 67:81–108DOI 10.1007/s10472-012-9329-3

Relational concept analysis: mining concept latticesfrom multi-relational data

Mohamed Rouane-Hacene · Marianne Huchard ·Amedeo Napoli · Petko Valtchev

Published online: 21 March 2013© Springer Science+Business Media Dordrecht 2013

Abstract The processing of complex data is admittedly among the major concernsof knowledge discovery from data (kdd). Indeed, a major part of the data worthanalyzing is stored in relational databases and, since recently, on the Web of Data.This clearly underscores the need for Entity-Relationship and rdf compliant datamining (dm) tools. We are studying an approach to the underlying multi-relationaldata mining (mrdm) problem, which relies on formal concept analysis (fca) as aframework for clustering and classification. Our relational concept analysis (rca)extends fca to the processing of multi-relational datasets, i.e., with multiple sortsof individuals, each provided with its own set of attributes, and relationships amongthose. Given such a dataset, rca constructs a set of concept lattices, one per objectsort, through an iterative analysis process that is bound towards a fixed-point.In doing that, it abstracts the links between objects into attributes akin to rolerestrictions from description logics (dls). We address here key aspects of the iterativecalculation such as evolution in data description along the iterations and processtermination. We describe implementations of rca and list applications to problemsfrom software and knowledge engineering.

M. Rouane-Hacene (B) · P. ValtchevDépt. Informatique, UQÀM, CP 8888, succ. CV, Montréal,Canada, H3C 3P8e-mail: [email protected]

P. Valtcheve-mail: [email protected]

M. HuchardLIRMM (CNRS – Université de Montpellier), 161 rue Ada,34095 Montpellier Cedex 5, Francee-mail: [email protected]

A. NapoliLORIA (CNRS – INRIA – Université de Lorraine), B.P. 239,54506 Vandœuvre-lès-Nancy, Francee-mail: [email protected]

82 M. Rouane-Hacene et al.

Keywords Formal concept analysis · Relational data · Relational concept analysis ·Concept lattices · Knowledge representation · Description logics

Mathematics Subject Classifications (2010) 06-A99 · 06-B99 · 68-R99

1 Introduction

Knowledge discovery from data (kdd) is the process of distilling useful facts from adataset. A kdd task is a multi-step process in which, beside the central step of datamining (dm) which focuses on the mechanics of pure discovery, also covers the priordata preparation as well as the subsequent interpretation of mining results [14]. Asa scientific discipline, primary concerns in dm are speed and scalability of methodsand the versatility of the practical tools, then proper processing of structure in data,reflecting the available domain knowledge in the mining process, etc. Furthermore,kdd addresses the intelligibility of discovered patterns and integration of newlydiscovered knowledge with existing sources.

Among the current research topics in dm, the work presented here pertains totwo major trends: (1) processing of data of ever increasing structural complexity and(2) exploration of domain knowledge, in particular, meta-data about the semanticsof the data, to achieve more topical and precise analysis. For instance, alternatives tointroduce structure into the basic dm data format—transactions with descriptors thatare valued attributes or set of atomic items—have been made. On one hand, varioustopological structures on the set of items, e.g., sequences [2] or general graphs [9, 36],have been examined with the corresponding pattern languages. On the other hand,descriptions of patterns based on first-order logic (fol), i.e., with variables for items,predicates and quantifiers, have been studied within the relational learning and datamining [10, 13] trend. Variants thereof comprise Datalog [11], description logics(dls) [20], and even datasets corresponding to entire relational databases [12]. Thelatter approach, called multi-relational data mining (mrdm), is particularly appealingsince a large proportion of the data worth analyzing is nowadays stored in relationaldatabases. Recently, such data has been gradually pumped into the Web, where var-ious datasets are published and interlinked with other published data using a graph-based format, the Resource Description Framework (rdf), and the encompassingLinked Data technology.1

Domain knowledge, in turn, has typically been incorporated in the mining processas a taxonomy on items, thus giving rise to generalized patterns [32]. Further popularformats for expressing such knowledge are sets of logical rules and ground facts [10](ch. 5), or, alternatively, a fully-blown domain ontology [1]. It is noteworthy thatthe availability of taxonomies or richer ontologies is independent to the presenceof structure in the data. Within a multi-relational dataset, an important part of thebackground knowledge is the underlying data schema. Moreover, when mining theindividuals of a particular relation, the links to individuals of other relations providefurther knowledge about the context of the mined data.

Recently, we designed a mrdm framework, on top of the standard formal conceptanalysis (fca) [16]. In fca, the input data, a [individuals × attributes] cross-table or

1www.w3.org/standards/semanticweb/data

Relational concept analysis 83

context, is transformed into a hierarchy of all conceptual abstractions combining setsof individuals with the sets of shared attributes. Through the unique structure ofits output, fca simultaneously addresses two dm tasks: (conceptual) clustering andpattern mining. Various attempts have been made at extending the fca to data ofcomplex structure: In [18, 21], the formal objects are graphs, whereas in [15, 29] theyare described by logical formulae.

In our own approach, called relational concept analysis (rca) [27], we focus ondataset compatible with the Entity-Relationship model or, alternatively, with the Re-source Description Framework (rdf). In rca, the input is made of several contexts,each corresponding to a specific type of individuals. Moreover, additional cross-tables describe [individuals × individuals] binary relations. Our analysis methodextends the basic fca mechanisms in many respects starting with the processingof multiple sorts of individuals. In doing that, the links between individuals arepropositionalized—in a way inspired by dls roles—and end up describing theoutput concepts much like the initially available attributes. Thus, new attributesare gradually inferred from links at several levels of depth. The resulting iterativeanalysis process deals smoothly with the circular links in the input data, a knownsource of difficulties for dm methods. In summary, rca is an original frameworkfor extracting conceptual knowledge from multi-relational data which successfullyaddresses problems such as dealing with arbitrary relations (incl. circular ones),effective construction and maintenance of the concept lattices (Hasse diagramsthereof) and intelligible representation of the extracted knowledge.

This paper extends previous work [17] with emphasis put on theoretical andalgorithmic aspects of rca including the dynamics of contexts, the concept formationprinciples, and the properties of the overall analysis process. It thus provides formaldefinitions for scaling operators that help propositionalize the initial relationaldescriptions of the formal objects. The iterative analysis process is then analyticallyexpressed and its convergence demonstrated.

The remainder of the paper is organized as follows. First we recall basic notionsfrom fca (Section 2). Then the basic rca framework is described in Section 3,whereas the iterative analysis process and the Multi-FCA method are introducedin Section 4. Related work is discussed in Section 5 while a conclusion with futurework ends the paper.

2 Formal concept analysis

fca provides a framework for designing concepts and conceptual hierarchies relatedto a context [16]. Below we recall the fundamental notions of fca and pinpoint somedifficulties in processing relations in data.

2.1 Motivating example

Throughout the paper, we use a pharmacovigilance dataset as a running example.Pharmacovigilance as a discipline studies the adverse reactions to drugs (adr)by analysing data collected by reporting systems and stored in case report data-bases [22]. A pharmacovigilance case database contains a collection of adr reports,also known as case reports. Generally, a well-documented case report captures

84 M. Rouane-Hacene et al.

Table 1 A fragment of a case report database in pharmacovigilance showing aids patients with theobserved adverse reactions that appeared after taking anti-hiv drugs

Patient Age Gender Observed adverse drug reactions

Farley 63 Male Oedema, Hives, Headache, Nausea,Heart failure, Hair loss

Lane 27 Female Fatigue, Oedema, Hives, Hair loss,Bleeding

Shana 33 Female Fatigue, Oedema, Hair lossTrudy 41 Male Fatigue, Breath disorder, Nausea,

Heart failure, Bleeding, Vomiting

suspected and concomitant product therapy details, description of the adr or diseaseexperience, including set of signs or symptoms. Further information covers thepatient characteristics, including demographic information (e.g., age, race, gender),relevant family history of diseases, baseline medical condition prior to producttherapy, etc. Table 1 depicts a fragment of a hypothetical case report databaserepresenting aids patients and the anti-hiv drugs they have taken.

Healthcare providers involved in the treatment of hiv patients examine theresponse to therapy in different categories of patients in order to successfully devise anew therapy. An analysis of the above table may yield combinations of several drugswith a single adr, called drug interactions. Drug protocols, i.e., concomitant use ofdrugs that may cause several reactions, can be discovered as well. In our example,the known adr of anti-hiv drugs are provided in Table 2 below. For instance, anti-retroviral Dactinomycin and Isentress are known to cause {Breathdisorder,Fatigue, Hairloss} and {Diarrhoea, Headache, Nausea}, respectively. In thefollowing section, we introduce fca as a method for analyzing such data and drugsinteractions.

2.2 FCA basics

fca is a mathematical framework for designing conceptual descriptions from a set ofindividuals described by unary attributes (similar to unary predicates). Formally, a

Table 2 Context KD of anti-hiv drugs involved into haart therapy with the associatedexpected adr

Actinomycin

Efavirenz

Maraviroc

Raltegravir

Ritonavir

Tenofovir

Breathdisorder

Diarrhea

Fatigue

Hairloss

Headache

HeartFailure

LiverDamage

Nausea

Rash

Vomiting

Dactinomycin × × × ×Isentress × × × ×Kaletra × × × × × × × ×Selzentry × × ×Sustiva × × × × ×Viread × × × × ×

Relational concept analysis 85

formal one-valued context K associates a set of objects (O) to a set of attributes (A)through an incidence relation I ⊆ O × A. A sample context,2 called KP, is depictedon Table 2 where formal objects are anti-hiv drugs, and attributes their expected adrsuch as Nausea, Headache, Hairloss, Rash, Liverdamage, etc. Further attributesrepresent the respective active agents behind the drugs.

A pair of derivation operators, both denoted ′, are applied to objects and at-tributes. On objects, the operator ′ is defined on ℘(O) → ℘(A) such as for X ∈℘(O), X ′ = {a ∈ A | oIa for all o ∈ X}. On attributes, the operator ′ is definedon ℘(A) → ℘(O) such as for Y ∈ ℘(A), Y ′ = {o ∈ O | oIa for all a ∈ Y}. Forinstance, in the context of drugs KD depicted on Table 2, {Sustiva,Viread}′ ={Diarrhea,Vomiting} and {Nausea, Headache}′ = {Isentress,Kaletra}.

The ′ operators induce a Galois connection [6] between ℘(O) and℘(A), whence the compound operators ′′ are closure operators over ℘(O)

and ℘(A). For instance, in KD depicted on Table 2 {HeartFailure}′′ ={HeartFailure,Maraviroc,Rash} whereas {Sustiva,Viread}′′ = {Sustiva,

Viread,Kaletra}. Thus, ′ operators induce two closure families Co ⊆ ℘(O) andCa ⊆ ℘(A) which, provided with set-inclusion order, form two complete latticeswhich are dually-isomorphic.

A pair (X, Y) of sets where X ⊆ O, Y ⊆ A, X = Y ′, and Y = X ′, is called a(formal) concept [37] with X is called extent and Y is called intent. For example, in thedrug contextKD, ({Sustiva,Kaletra}, {Diarrhea,Fatigue,Rash,Vomiting}) isa formal concept. The set CK of all concepts from K ordered by extent inclusionforms a complete lattice, LK = 〈CK,≤K〉, which is termed the concept lattice [37] ofthe context (or the Galois lattice [6] of the binary relation I).

The lattice LK is represented by a Hasse diagram where nodes represent theconcepts provided with the respective intent and extent, while segments representthe specialization relation (or subsumption) between concepts. It is noteworthy thatthe lattice represents in an exhaustive manner the structure sharing among objectsof the context: two objects “join” each other in a non-trivial concept extent iff theyshare at least one attribute.

A labeling mode of the diagram known as simplif ied or reduced labeling is used,where each attribute (resp. object) appears once, in the maximal (resp. minimal)concept—in term of size of extent—having that attribute (resp. object). The conceptlattice LD of context KD is drawn in Fig. 1. The full extent (resp. intent) of a conceptis the union of all objects (resp. attributes) whose labels are reachable starting fromthis concept and by taking downward paths (resp. upward). Thus, the conceptsrepresent all maximum factorizations of the attributes between objects. For example,the formal concept c#5 represents the nrti drug class grouping Kaletra and Vireadthat are known to cause severe liver problems, nausea, diarrhea and vomiting.

As many practical applications involve richer data descriptions, many-valuedcontexts were introduced. In such a context K = (O, A, V, J), an object o is describedby a set of attribute value pairs (a, v) where v ∈ V. Thus, J is a ternary relation

2Adapted from MedEffect, the dataset of Canada pharmacovigilance database.

86 M. Rouane-Hacene et al.

Fig. 1 Concept lattice LD of the drug context given in Table 2. The reduced intents and extents aredrawn on both sides of a node representing a concept

J ⊆ O × A × V. Table 1 illustrates a many-valued context, called KP, encoding aset of aids patients and their features such as age, gender, and adverse reactions toanti-hiv drugs.

Before being processed using standard fca algorithms, a many-valued contexthas to be transformed into a one-valued context, a transformation known as scaling[16]. Basically, the scaling encodes a many-valued attribute as a set of predicatesranging on attribute values. For example, the “Age” attribute, whose values arenumerical, can be substituted by the attributes Adult (predicate Age ≤ 60, ordinalscaling in [16]) and Senior (predicate Age > 60). Hence, the object Farley isassigned Senior while other objects are assigned Adult, as illustrated in Table 3.Similarly, Gender in Table 1 is nominally scaled with predicates Female and Male.The predicates on a many-valued attribute compose a scale. One-valued context KP

showing the scaling on patients is illustrated in Table 3 while the correspondinglattice is depicted in Fig. 2. For example, the formal concept c#9 = ({Farley, Trudy},{Male, Nausea, HeartFailure}) represents male patients having nausea and heartdisorder.

Relational concept analysis 87

Table 3 One-valued context KP encoding aids patients with the observed adr (discrepancies withTable 1 persist due to the small size of the patient sample)

Senior

Adult

Male

Female

Bleeding

Breathdisorder

Fatigue

Hairloss

Headache

HeartFailure

Hives

Nausea

Oedema

Vomiting

Farley × × × × × × × ×Lane × × × × × × ×Shana × × × × ×Trudy × × × × × × × ×

3 Relational concept analysis

rca was designed for the construction of formal concepts on top of multiple objectsets described by both proper attributes and relations [17, 27]. Below we recall theformal background of rca, including representational and analytical aspects.

Fig. 2 Concept lattice LP of context of aids patients given in Table 3

88 M. Rouane-Hacene et al.

3.1 Bringing relations to fca

The ever increasing complexity of data to be analyzed has rapidly pushed the stan-dard context-based conceptual methods to their limits. As an illustration, considerthe lattices in Figs. 1 and 2 which represent all possible attribute sharing betweenobjects in the respective contexts. They provide useful insights on what might be theclasses of anti-hiv drugs and profiles of aids patients. Yet an integrated approach tothe management of both populations of patients and anti-hiv drugs, i.e., for bettertargeted therapies, requires that profiles of patients are linked to the appropriateclasses of drugs. Conversely, the drug class is more useful when it reflects theobserved adr depending on the profile of the target patients.

The mutual dependency between patients and anti-hiv drugs, triggers the in-troduction of links between the corresponding objects into the analysis process.This specific type of data, known as relational data as it connects objects amongthemselves, is not accessible for mainstream concept analysis.

Indeed, a thorough processing of relations between different sorts of objects chal-lenges several core aspects of the fca task, among them the use of a unique context,the static set of attributes in a context, the one-shot lattice construction process,etc. These limitations of core fca motivate the design of specific methodologies forconcept analysis of relational data. As an illustration of the target knowledge struc-tures, consider the analysis of the pharmacovigilance dataset illustrated in Tables 1and 2. Using a tool that properly processes relational links between data items, thefollowing remarkable fact could be gleaned: The combination of Dactinomycin andIsentress causes several new adr that need to be investigated such as Bleeding,Vomiting, and HeartFailure, although both drugs share no direct adr in Table 2.This fact is beyond the reach of existing statistics- and fca-based analysis methods:Its discovery would require a multi-step process, involving human experts to connectdifferent knowledge sources. With rca-based analysis, this is a basic building blockof the final result that could be explored in the design of new combined medicationsthat cause minimal interactions or adverse reactions.

In what follows, we present our own approach to the above issue, which iscalled relational concept analysis (rca), and provide some fundamental results onthe method and on its computational mechanisms.

3.2 Input data format

In rca, input data is organized as a pair made of a set of objects-to-attributes contextsK = {Ki}i=1,...,n and a set of objects-to-objects binary relations R = {rk}k=1,...,m. Here,a relation r ∈ R links two object sets from two contexts, i.e., there are i1, i2 ∈{1, . . . , n} (possibly i1 = i2) such that r ⊆ Oi1 × Oi2 . Both contexts from K andrelations from R are introduced as cross-tables.3 The resulting structure, called arelational context family (rcf) can be likened to a relational database with a schema-like part comprising context, attribute and relation names, and an instance-like partcomprising the objects and the inter-object links that populate contexts and relations,respectively.

3Although this might be the result of scaling upon some of the attributes.

Relational concept analysis 89

Table 4 Left: Binary relation takes linking aids patients to anti-hiv drugs

Dactinomycin

Isentress

Kaletra

Selzentry

Sustiva

Viread

TakesFarley × ×Lane × ×Shana × ×Trudy × ×

Interacts withDactinomycinIsentressKaletra × ×Selzentry ×Sustiva × ×Viread ×

The relation is taken by (henceforth referred to as itb) is derived from takes (takes−1).Right: Binary relation interacts with (iw) that models interactions among drugs

Definition 1 (Relational Context Family (RCF)) An rcf is a pair (K, R) where:

– K = {Ki}i=1,...,n is a set of contexts Ki = (Oi, Ai, Ii) and– R = {rk}k=1,...,m is a set of relations rk where rk ⊆ Oi1 × Oi2 for some i1, i2 ∈

{1, . . . , n}.

It is noteworthy that, in the above definition, all object sets Oi (i ∈ {1, . . . , n}) arepair-wise disjoint. Moreover, Oi1 (domain of rk) and Oi2 (range of rk) are the objectsets of the contexts Ki1 and Ki2 , respectively.

Table 4 depicts the takes and is taken by (itb) relations on patient and drugobjects, as well as the relation interacts with (iw) that models antiretroviralinteractions within the drug context. Drug-drug interactions occur when two ormore drugs react with each other. This drug-drug interaction may cause patient toexperience an unexpected adverse reaction. Relations takes, itb and iw togetherwith the respective contexts KD (Table 2) and KP (Table 3) form the rcf of ourrunning example.

For practical reasons, we shall process a relation r ⊆ Oi1 × Oi2 in the form of aset-valued function r : Oi1 → ℘(Oi2). Moreover, we introduce here some auxiliaryfunctions to support relation-centric reasoning. First we define formally the domainand the range of a function from the rcf.

Definition 2 (The dom(r) and ran(r) functions for relations) Let (K, R) be an rcf. Apair of functions map relations in R to respective domain and range object sets fromthe object set family O = {O|K = (O, A, I) ∈ K}.– The domain function is dom : R → O where dom(r) = Oi1 iff for all (x, y) ∈ r,

x ∈ Oi1 ,

90 M. Rouane-Hacene et al.

– The range function is ran : R → O where ran(r) = Oi2 iff for all (x, y) ∈ r, y ∈ Oi2 .

Next, we provide a way to gather functions w.r.t. their domain, i.e., the contextfunction rel.

Definition 3 (The rel(K) function for contexts) The family of relations originating ata given context is defined by:

– rel : K → ℘(R) where rel(K = (O, A, I)) = {r ∈ R | dom(r) = O}.

Before addressing the key question of how inter-object links can be put intoconcept intents, some important conventions must be adopted. These are mainlyabout identifying the elements of an rcf along the analysis process. In fact, unlikecore fca, in rca some of the manipulated entities evolve. Evolution is broughtby the transformation of links into descriptors for formal objects: The sets Ai ofattributes in the contexts of a rcf are extended by new members. Hence, strictlyspeaking, the resulting contexts are not the same constructs as the initial ones, just asa scaled context is not equivalent to its original many-valued counterpart. In contrast,respective object sets Oi remain unchanged, so they provide a basis for identifyingthe contexts: We consider the extended attribute sets to simply define a new versionof the same contexts that share the Oi object sets. Thus, unless an explicit distinctionis necessary, all of these versions will be denoted by Ki.

Similarly, concepts c = (X, Y) from the initial contexts of the rcf, are to beopposed to concepts from the respective extended versions of these contexts. Asit will become clear in Section 3.3.4, there is a significant continuity between theformer and the latter, due to a correspondence based on extent preservation.Consequently, the concepts from different versions of a context having the sameextent are here considered to be the subsequent versions of the same concept.They are therefore assigned the same identifier in all the respective lattices (see anexample in Section 3.3.4).

To sum up, we identify the contexts by their respective object sets and conceptsby their extents. In the following, we use simple indexes for contexts and concepts asidentifiers. These are capital letters and numbers, respectively.

3.3 Scaling upon relations

The question of how to factor in relations when constructing concept descriptionsadmits a variety of answers that fit our conceptual analysis goals to diverging degrees.In the following, we first motivate our particular way of turning links into attributesand then provide a formal expression of the corresponding constructs.

3.3.1 A naive approach

A straightforward, and somewhat naïve, way of reflecting object links would beto assimilate links to standard one-valued attributes. In other words, for eachK = (O, A, I) ∈ K, we extend A with attributes ar,o corresponding to the pairsmade of a relation originating at the context, r ∈ rel(K), and an object o from itstargets, i.e., (o, o) ∈ r for some o ∈ O. To illustrate this construct, observe the contextin Table 5.

Relational concept analysis 91

Table 5 One-valued context KP encoding aids patients with the observed adr and drugs taken

Senior

Adult

Male

Female

Bleeding

Breathdisorder

. . . Vomiting

Dactinomycin

Isentress

Kaletra

Selzentry

Sustiva

Viread

Farley × × . . . × ×Lane × × × . . . × ×Shana × × . . . × ×Trudy × × × × . . . × × ×

It should be noticed that to increase readability within the above table, we haveskipped the name of the relation, takes, in the names of the new attributes. Thus,they are only identified by the target objects, i.e., the drugs taken by a patient.As it is the only relation that starts at the contexts of patients, this does not leadto any confusion. However, with more than one such relations going to the samerange context, the object names are not enough to uniquely define the correspondingattributes.

The concept lattice corresponding to Table 5 is given in Fig. 3. A quick exam-ination of this lattice and the lattice in Fig. 1 indicates that the former containsthree extra concepts. In particular, the patient Shana is now located in a conceptthat is immediately above the bottom concept. This corresponds to the intuitiveidea that the larger the attribute set of the context the (potentially) preciser theconceptual structure induced by them. While the new concepts constitute a cleargain with respect to Fig. 1, the abstraction potential of the underlying attributes israther limited.4 Indeed, in order for two patients to share such an attribute, bothof them must have taken the underlying drug. However, drugs, even without beingcompletely identical, may have similar behaviour, e.g., in terms of adr.

To translate the intake of similar drugs—instead of identical ones—a higher-levelabstraction would be necessary that allows for some properties to be shared amongthe drugs taken by each patient from a given set X. Hence, a more purposefulprocessing of the links might seem to lay in the transfer of properties from the targetobjects to the source ones, i.e., here, from drugs to patients. Spelled differently, whilethe Table 5 is a mere join of Table 3 and the one in the left-hand-side of Table 4, onemay imagine a more sophisticated procedure of bringing the attributes of the drugcontext into the patient contexts while using the takes relation as a filter. While sucha solution has its merits, it is still insufficient for our purposes, in particular, sinceit would work poorly with: (i) chains of relations between a larger set of contexts(e.g., with a context of drug producers in our example), and (ii) relations of identicaldomain and range contexts (e.g., interacts-with for drugs).

We therefore turn to the knowledge representation languages to look for ways ofexpressing links and find suitable abstractions that are both precise and shareable

4As an indication of the limits, compare the lattice here to the one drawn in Fig. 5.

92 M. Rouane-Hacene et al.

Fig. 3 The lattice of patients with drugs translated as one-valued attributes

between the source objects of a relation. Such expressions are common in the logicalrepresentation formalisms as we discuss below.

3.3.2 Role restrictions, from dls to rca

Description logic-based formalisms [5] are nowadays the de facto standard forknowledge representation, especially in ontology engineering and on the semanticweb. Such formalisms help to organize the knowledge about a domain by focusingon relevant domain abstractions, i.e., concepts and their relations called roles. Thedls offer a collection of constructors to express relational information on the schemalevel. Beside defining domains and ranges for roles, such constructors allow finerconstraints on the outgoing links of data objects to be asserted such as the exact typeof the target objects, the number of links per object, etc.

Technically speaking, relational information is modelled upon role restrictionsfrom dls [5]. In dls, two types of knowledge are expressed at the schema level,concepts and roles. By properly restricting the roles of a given concept, one caneasily define a sub-concept thereof. Typical constructs for role restriction includeexistential quantification (∃R.C), universal quantification (∀R.C), strict universalquantification (∀∃R.C), cardinality restriction (≥n R), qualified cardinality restric-tion (≥n R.C), etc. Here R stands for a role, i.e., the equivalent of a relation from anrcf, whereas C is a conceptual expression that might be either a name or a formulainvolving logical connectors on sub-formula(s). Moreover, C works like a filter on

Relational concept analysis 93

the range of the role R: the role restriction as a whole is a predicate satisfied bythe objects in the domain of R that are appropriately connected (depending on therestriction constructor) to objects from the extension (aka interpretation) of C.

Following strictly the dls definitions of role restriction constructors, r(o), theimage by r of an object from Oi1 = dom(r), must be compared to the extent of someconcept c from Ki2 = (Oi2 , Ai2 , Ii2) where Oi2 = ran(r). Some constructors can beeasily envisaged in rca:

(i) universal—r(o) is included in the extent of c,(ii) existential—r(o) has a non-void intersection with the extent of c.

Further ones are discussed below. These constructors yield a set of new descriptorsthat apply to all objects from the range context of r and behave like standard formalattributes. Since the underlying encoding of links is kin to scaling of categoricalattributes, it was named relational scaling and the constructors scaling schemes.

Such an approach is beneficial in many respects, in particular, it requires no newdefinitions of concepts nor lattices, hence all the results from fca, both theoreticaland algorithmic hold.

3.3.3 Scaling a relation from an rcf

From a mathematical point of view, the scaling of Ki along r ∈ rel(Ki) with ran(r) =Oi2 and with respect to a lattice Li2 extends Ai by adding a set of new attributes andcompletes Ii accordingly. In order to formalize this, we first introduce the auxiliaryfunction ρ that maps each relation to a given scaling operator. It is defined as ρ : R →Q with Q = {∃,∀,∀∃,≥,≥q,≤, ≤q}. Now, each new attribute has the form “q r:c”where r is the relation (name), c stands for a concept from Li2 and q is the relationalconstructor associated to r, i.e., ρ(r). We chose the “:” separator instead of “.”, theone used in dls, in an analogy to standard type definitions for variables from object-oriented languages. For example, assume a ∃-based scaling on r, i.e., ρ(r) = ∃. Theresulting operator S(r,∃),Li2

, has the following effect: For each object o from dom(r)and each concept c, o is incident to the attribute “∃r : c” whenever r(o) shares at leastone object with the extent of c.

Definition 4 (The existential scaling operator) Given K = (O, A, I) and r ∈ rel(K),let ir be such that ran(r) = Oir where Kir = (Oir , Air , Iir ). Let also Lir be the lattice ofKir . The existential scaling operator S(r,∃),Lir

maps K into the derived context K+ =(O+, A+, I+), where:

– O+ = O– A+ = {∃r : c | c ∈ Lir }, where each ∃r : c is a relational attribute– I+ = {(o, ∃r : c) | o ∈ O, c ∈ Lir , r(o) ∩ Ext(c) = ∅}

In other terms, scaling a context along a relation consists in integrating thisrelation to the context in the form of one-valued attributes using a scaling operator.For instance, let us detail how the context of anti-hiv drugs KD can be scaledalong relation itb given in the left-hand side of Table 4 with respect to the latticeof patients depicted in Fig. 2. The anti-hiv drug object Kaletra—whose imageupon the relation itb is {Farley,Shana}—will have the attributes ∃itb:ci, with i ∈{0, 2, 4, 5, 6, 7, 9} in the lattice LP (Fig. 2), since extents of concepts c0, c2, c4, c5, c6,

94 M. Rouane-Hacene et al.

Table 6 The existential scaling of the anti-hiv drug context KD along the relation itb using thelattice LP of aids patients (Fig. 2)

∃itb

:c0

∃itb

:c2

∃itb

:c3

∃itb

:c4

∃itb

:c5

∃itb

:c6

∃itb

:c7

∃itb

:c8

∃itb

:c9

∃itb

:c10

Dactinomycin × × × × ×Isentress × × × × ×Kaletra × × × × × × ×Selzentry × × × × × × ×Sustiva × × × × × × ×Viread × × × × × × ×Observe that ∃ itb:c1 is skipped as c1 is the bottom concept whose extent is void

c7 and c9 have a non-empty intersection with the image set itb (Kaletra). Table 6provides the result of scaling existentially the drug context KD upon itb .

Similarly, we define the universal scaling operator with respect to r and Lir ,denoted by S(r,∀),Lir

. The difference is that instead of non-empty intersection, theobject image r(o) must be completely included in the extent of c in order for o to getthe relational attribute ∀r : c:

I+ = {(o,∀r : c)|o ∈ O, c ∈ Lir , r(o) ⊆ Ext(c)}.

For instance, universal scaling of relation itb using the concept lattice given in Fig. 2will assign relational attributes ∀itb:c4 and ∀itb:c6 to the drug object Kaletra sincethe respective extents of both concepts c4 and c6 comprise {Farley,Shana}, theimage set of Kaletra for itb.

Yet the above definition presents on the well-known empty image problem:objects with no links for a relation r get all universally quantified attributes with r. Toillustrate this, consider a universal scaling of the relation iw (see the right-hand sideof Table 4) with the concept lattice of drugs in Fig. 1. Now observe that both drugobjects Isentress and Dactinomycin have empty-set image following the relationiw. Thus, a universal scaling of iw will assign every one of the resulting relationalattributes to Isentress and Dactinomycin.

Clearly, the mere universal quantification is of little practical use. Therefore,we introduced a more constrained version called strict universal scaling whichadditionally requires that at least one link of r starts at the source object o. Thescaling scheme, denoted S(r,∀∃),Lir

under the above assumptions, comprises a slightlymodified definition for the attributes ∀∃r : c:

I+ = {(o,∀∃r : c)|o ∈ O, c ∈ Lir , r(o) ⊆ Ext(c) and r(o) = ∅}.

It is noteworthy that ∀∃r : c corresponds to the dls expression ∀R.C � ∃R. Basedon the role restrictions from dls, further scaling operators are defined. Table 7presents the current set of such operators together with their notation, the form ofthe generated relational attributes and the condition for assigning them to an object.We assume a relation r and a lattice L on its range set of objects.

Relational concept analysis 95

Table 7 Operators for the relational scaling in rca

Operator Notation Attribute form Constraint in I+

Universal (wide) S(r,∀),L ∀ r : c r(o) ⊆ Ext(c)Existential S(r,∃),L ∃ r : c r(o) ∩ Ext(c) = ∅Universal strict S(r,∀∃),L ∀∃ r : c r(o) ⊆ Ext(c) and r(o) = ∅Cardinality restriction (max) S(r,≥),L ≥ n r : �L |r(o)| ≥ nCardinality restriction (min) S(r,≤),L ≤ n r : �L |r(o)| ≤ nQualified card. restriction (max) S(r,≥q),L ≥ n r : c r(o) ⊆ Ext(c) and |r(o)| ≥ nQualified card. restriction (min) S(r,≤q),L ≤ n r : c r(o) ⊆ Ext(c) and |r(o)| ≤ n

3.3.4 Scaling upon all outgoing relations of a context

In rca, at each step, a context K is scaled upon all the relevant relations which weassume here to be all those that originate at the context, i.e., the respective set rel(K).To formally express the context obtained by augmenting K with all the resultingrelational attributes, what we call the complete relational extension of K, one needsto factor in the available lattices on the respective range contexts of the relations inrel(K). Let the set of lattices corresponding to the contexts in K be denoted simplyby L. Let also rel(K) = {rl}l=1,...,mK and, for each rl , let Lil ∈ L be the lattice on Oil =ran(rl). Under these assumptions, the complete relational extension of K with respectto ρ and L, denoted Eρ,L, is defined as the apposition of K with the respective resultsof the scaling upon each r from rel(K). The apposition operator on two contexts K1

and K2, as defined in [16], requires that these share the same set of objects whereasits result, denoted K1|K2, is a context over these objects where the attribute set andthe incidence are obtained by union of the respective components of the argumentcontexts. Formally:

Definition 5 (Complete relational extension of a context) Given a rcf (K, R), with aset of lattices L, a scaling constructor mapping ρ, and a context K ∈ K with rel(K) ={rl}l=1,...,mK , the complete relational extension of K w.r.t. ρ and L is

Eρ,L(K) = K | S(r1,ρ(r1)),Li1(K) | . . . | S(rmk ,ρ(rmk )),Limk

(K)

For instance, consider the result of the complete extension of the drug contextKs

D = Eρ,L(KD) where ρ ={(itb,∃), (iw,∃), (takes,∃)} and L = {LP,LD}. The cor-responding context Ks

D may be obtained by putting side-by-side the initial drugcontext KD (Table 2), and the respective results of existentially scaling its relationsis-taken-by and interacts-with. These are given in Tables 6 and 8, respec-tively. Observe that in Table 8, the attribute ∃ iw:c1 is missing from the table. In fact,c1 is the bottom concept and here it is of an empty extent, hence no object could getthe attribute.

The lattice LsD corresponding to the above context is given in Fig. 4. A first

observation is that it is strictly larger than the initial lattice LD in Fig. 1 as no extentfrom the latter is missing in it. The intuitive explanation is that by adding someattributes to a context, one cannot destroy the extents from the initial context. Thus,it is by no means a coincidence, that concepts with the same identifier in both figureshave the same extents (e.g., c8 whose extent comprises Selzentry, Sustiva andKaletra).

96 M. Rouane-Hacene et al.

Table 8 The result of an existential relational scaling applied to the anti-hiv drug context KD uponthe relation iw and using its own lattice LD (Fig. 1)

∃iw

:c0

∃iw

:c2

∃iw

:c3

∃iw

:c4

∃iw

:c5

∃iw

:c6

∃iw

:c7

∃iw

:c8

∃iw

:c9

∃iw

:c10

∃iw

:c11

∃iw

:c12

∃iw

:c13

∃iw

:c14

∃iw

:c15

Dactinomycin × × ×Isentress × × × × ×Kaletra × × × × × × × × × ×Selzentry × × ×Sustiva × × × × × × ×Viread × × × × × ×The attribute ∃ iw:c1 is skipped as c1 is the bottom concept

Let us elaborate on the relationship between an initial context and its extendedversion as well as between their respective lattices. Thus, let—under the hypothesesof Definition 5—Ks = Eρ,L(K) for some K from the rcf and consider the lattices Land Ls. Since Ks is an apposed context of K, it has a larger set of attributes yet thesame set of objects. In [33], it was shown that each extent of K is also an extent of Ks

which means the lattice L is “blended” in Ls. Formally,

Fig. 4 The lattice LsD of the relationally extended Drug context Ks

D (quantifiers omitted inrelational attributes)

Relational concept analysis 97

Property 1 (Lattices of apposed contexts [33]) Given contexts K, K1 and K2 suchthat K = K1|K2, the respective lattices L, L1 and L2 satisfy that for each concept(X, Y) ∈ Li (i = 1, 2), there is an equivalent concept (X, Z ) ∈ L where Y ⊆ Z .

This basically means that by extending a context through relational scaling, itslattice expands while preserving its initial conceptual structure as a sub-order. Thissub-order is witnessed by concept extents: whatever the extension of the context,its concept set will always comprise, for each concept from the initial context K,a concept with the same extent, yet of a possibly larger intent. We therefore shallestablish concept identities throughout evolving versions of the same initial contexton the basis of their respective entities: two concepts from different versions whoseextents coincide will be considered two versions of the same concept.

3.3.5 Scaling the entire rcf

The above extension operator may now be expanded to reach over the entire set ofcontexts of an rcf. Here again, ρ and L are parameters of the multi-context operatoron the set K which works as follows:

Definition 6 (Complete relational extension of an rcf) Given a rcf (K, R) whosecontext set is K = {K1, . . . ,Kn} and whose set of lattices is L, and a scaling con-structor mapping ρ, the complete relational extension of K is a set of contextsdefined as

E∗ρ,L(K) = {Eρ,L(K1), . . . , Eρ,L(Kn)}.

For instance, the result of the above operator on our initial rcf unfolds wheneverone adds to the above three tables representing the complementary parts of the Drugcontext, the initial Patient context in Table 3 and its respective image by existentialrelational scaling w.r.t. takes (Table 9).

Clearly, applying E∗ to a context set yields a set of extended contexts where eachindividual context is an attribute-wise extension of its counterpart in the initial set.Then, to each of the extended contexts corresponds a potentially larger lattice. Inour example, the lattice of Ks

P, the Patient context completely extended w.r.t. theinitial lattice on drugs LD, is drawn in Fig. 5.

Let us notice that the lattice in Fig. 5 is substantially different from the onein Fig. 3. For instance, the extent of concept c12 from Fig. 5 groups Shana andTrudy while there is no such extent in Fig. 3: the smallest extent comprising thesepatients is in c9 and has also Lane. What distinguishes both former patients w.r.t.

Table 9 The existential scaling of KP upon takes and the lattice LD (Fig. 1)

∃tak

es:c

0

∃tak

es:c

2

∃tak

es:c

3

∃tak

es:c

4

∃tak

es:c

5

∃tak

es:c

6

∃tak

es:c

7

∃tak

es:c

8

∃tak

es:c

9

∃tak

es:c

10

∃tak

es:c

11

∃tak

es:c

12

∃tak

es:c

13

∃tak

es:c

14

∃tak

es:c

15

Farley × × × × × × × × × × ×Lane × × × × × × × ×Shana × × × × × × × × × × ×Trudy × × × × × × ×

98 M. Rouane-Hacene et al.

Fig. 5 The lattice LsP of the relationally extended Patient context Ks

P (quantifiers omitted)

the latter one is that each of them took at least one drug from the extents of drugconcepts c2 and c4 (see Fig. 1). These concepts cover the drugs causing Headacheand Nausea, respectively. Conversely, one could observe that any extent fromFig. 3 is also present in Fig. 5. We feel that the additional conceptual knowledgegained through relational scaling vindicates the technique as well as the entire rcaapproach.

3.3.6 Ref ining the scales

The above Property 1 states that the lattices of the relationally extended contexts,albeit different conceptual structures, encompass all the conceptual information fromthe respective lattices of the initial contexts. Indeed, comparing Ls

D from Fig. 4 toLD (see Fig. 1), one notices that eight new concepts have emerged as a result ofthe relational scaling, e.g., concept c17 that groups Selzentry and Sustiva. Nowa natural question arises: Given that there might be additional concepts in some ofthe expanded lattices, i.e., concepts without a counterpart in the initial lattice, shouldnot these be reused in a new round of scaling? In other terms, should one proceed byrepeating the scaling step, this time with a more thorough conceptual information touse as scaling basis? In our example, knowing Ls

D could help make the scaling upontakes even more precise. Indeed, beside the attributes corresponding to (extents of)concepts from LD (see Table 9), eight new attributes translating the aforementionedadditional concepts are yielded by the scaling. These attributes are shown in Table 10.

Relational concept analysis 99

Table 10 The existential scaling of KP upon takes and the lattice LsD (Fig. 4)

∃tak

es:c

16

∃tak

es:c

17

∃tak

es:c

18

∃tak

es:c

19

∃tak

es:c

20

∃tak

es:c

21

∃tak

es:c

22

∃tak

es:c

23

Farley × × ×Lane × × × × ×Shana × × ×Trudy × × × × × ×Only the differential part w.r.t. Table 9 is given

Using the total set of attributes from Table 9 and from Table 10 clearly results ina further extension of the KP that has a strictly larger set of attributes than Ks

P, andhence, a potentially larger set of concepts. This amounts to inserting the previouslydiscovered conceptual knowledge into the analysis process in order to refine itsresults (by refining the lattice-shaped scales).

Obviously, the process needs not stop at step two and could potentially go to anarbitrary number of such scaling/lattice construction iterations. Then, one needs toconsider the termination problem for the iterative process that arises in this way:Could one guarantee that the process would converge into a well-defined result?

Further to the idea of iterating upon ever expanding contexts and lattices, we definea general method that constructs the ultimate result of the analysis task on an rcf.

4 Context dynamics and the iterative RCA method

4.1 Iterative context expansion

As stated previously, we capture the evolution of each context Ki ∈ K in a sequenceKp

i whose zero member K0i = (O0

i , A0i , I0

i ) is the input context. From there on, eachmember of the sequence is obtained from the previous one by a complete relationalexpansion upon the relations r from rel(Ki) while using the respective constructorsfrom ρ, i.e., ρ(r), and the lattices of the p-th iteration in Lp. The set of contexts in thep-th iteration may be recursively defined as follows:

Definition 7 Given a rcf (K, R) and a mapping ρ, the vector of contexts at the stepp + 1 is

K0 = K

Kp+1 = E∗ρ,Lp(Kp)

Now each of the contexts from the fix point member K∞ is similarly defined:

Definition 8 Given a rcf (K, R), a mapping ρ, and some i ∈ {1, . . . , |K|} the i-thcontext at the step p + 1 is the extension of the context of the same rank at step p:

K0i = Ki

100 M. Rouane-Hacene et al.

Kp+1i = Eρ,Lp(Kp

i )

A first observation is that the fix point members of the context sequence are well-defined. In other terms, for each i ∈ {1, . . . , |K|} the context K∞

i is the fixpoint of thenon-decreasing sequence (Kp

i ).

Lemma 1 Given a rcf (K, R) and a mapping ρ, for each i ∈ {1, . . . , |K|} and p ≥ 0,let Kp

i = (Opi , Ap

i , I pi ) and Kp+1

i = (Op+1i , Ap+1

i , I p+1i ), then

Opi = Op+1

i , Api ⊆ Ap+1

i and I pi ⊆ I p+1

i .

To prove the above fact, we need an intermediate step that focuses on lattices:

Lemma 2 Given a rcf (K, R) and a mapping ρ, for each i ∈ {1, . . . , |K|} and p ≥ 0

∀(X, Y) ∈ Lpi , ∃(X, Z ) ∈ Lp+1

i .

The fact that contexts can only grow or remain the same is readily proved byinduction on p where the basic case is expressed as in Property 1.

This now allows us to assert that the relational attributes in Kpi for some i and

p chosen as above, are included in the relational attributes of Kp+1i . Indeed, as

indicated at the end of Section 3.2, we consider the concepts (X, Y) and (X, Z ) fromLemma 2 to be identical, which, given that only the shared extent X is used in thescaling process, constitutes no real limitation.

To demonstrate that each context sequence converges, it is now enough to observethat for a lattice sequence Lp

i , there is a natural upper bound represented by theBoolean lattice5 of Oi. And since the set of all extents from Lp

i does not decreasewhen p grows up to ∞ (as shown in Lemma 2), it necessarily converges. We can nowstate the convergence of the context sequence:

Theorem 1 Given a rcf (K, R) and a mapping ρ, the sequence (Kp) converges towarda well-def ined set of maximally extended contexts K∞.

To sum up, K∞ is obtained from K0 through an iterative process of conceptconstruction and attribute generation using the data in R and a user-provided set of(relation,constructor) pairs in ρ. Moreover, being a fix point of an expanding process,it is the closure of K0 for the implicit closure operator associated to the method.

4.2 The Multi-FCA method

The Multi-FCA method describes the step-wise construction of the fix point solutionfrom the initial rcf. The logic of our analysis method is iterative one: Wheneverthe contexts of a rcf are extended, their corresponding lattices expand as well.

5Thus, in the worst case L∞i may comprise all subsets of Oi as extents.

Relational concept analysis 101

Moreover, it represents a computational schema rather than a precise algorithm asmany algorithmic choices are left to the analyst.

Algorithm 1 Producing a lattice for each context in an rcf1: proc MULTI-FCA(2: In: (K, R) an RCF, ρ a constructor mapping3: Out: L array [1, . . . , n] of lattices)4: p ← 0 ; halt ← false5: for i from 1 to n do6: K0

i ← SCALE(Ki)7: L′

〉 ← BUILD-LATTICE(K0i )

8: while not halt do9: p = p + 110: for i from 1 to n do11: Kp

i ← EXTEND-CONTEXT(Kp−1i , ρ, Lp−1)

12: Lpi ← UPDATE-LATTICE(Kp

i ,Lp−1i )

13: halt ← ∧i=1,...,n ISOMORPHIC(Lp

i ,Lp−1i )

Here we provide a detailed description of the algorithm.At the initialization step (lines 4–7), each context K0

i is obtained from Ki byapplying a conceptual scaling to the many-valued attributes in Ki using the primitiveScale (line 6). The lattices L0

i are constructed using the primitive Build-Lattice(line 7). At the step p and for each relation rk ⊆ Oi × Oj, the lattice Lp−1

j of therange context is used to extend the domain context Kp

i using the primitive Extend-Context and then update the lattice of the domain context Lp

i using the primitiveUpdate-Lattice (lines 8–13). For both primitives of lattice construction and latticeupdate, the choice of the exact algorithms is free. The process ends whenever at twosubsequent steps all the pairs of corresponding lattices Ln

i and Ln+1i are isomorphic

(checked by the primitive Isomorphic, line 13).For example, the Multi-FCA method applied to the rcf composed of the two

contexts KP and KD, and the relations takes, itb and iw, all of them scaled withan existential constructor, yields the final lattices L2

P and L2D, shown in Figs. 7 and 6,

respectively. These have been obtained after two iterations.It is noteworthy that unlike concepts in Figs. 5 and 4, the ones of the final lattices

have undergone an additional round of label reduction. The underlying simplificationamounts to skip from the reduced intent of a concept relational attributes “q r:c1”such that there is another attribute “q r:c2” with c2 ≤ c1. The intuition behind thisreduction is that whatever the constructor q, for any object o from the domain of r,whenever o satisfies “q r:c2”, it will necessarily satisfy “q r:c1” as well. For instance,the intent of patient concept c14 in Fig. 5 comprises the takes:c4 attribute that hasbeen removed in the intent of the same concept in Fig. 7. A simple check in Fig. 1reveals that the drug concept c2 is a sub-concept of c4. In actuality, this amountsto attribute-clarifying the parts of the fix-point context generated by scaling uponindividual relations while keeping for each set of equivalent attributes the onescorresponding to minimal concepts from the relation range.

An important question focuses on the overall cost of the relational analysis ofa dataset, especially given the iterative nature of the Multi-FCA method. In fact,in many cases there will be no necessity of iterating: a proper ordering of analysis

102 M. Rouane-Hacene et al.

Fig. 6 The final lattice of drugs (L∞D ). Quantifiers are omitted in relational attributes due to

visualization limitation of Galicia

tasks for individual contexts should help avoid it. To see this, one should imagine the“schema” of an rcf, i.e., the digraph made of contexts as vertexes and relations asdirected edges. Now, provided this graph is a DAG (no circuits), a topological sortof the contexts would provide a total ordering that is compatible with the relation-induced dependencies among contexts. Thus, analysing the contexts according to thisorder will ensure a context K is only processed whenever all the lattices required forscaling the relations in rel(K) have already been constructed up to their fixed point form.

Yet in order to cover the entire spectrum of possible rcf, inclusive circular ones,we preserve the more general expression of Multi-FCA that is bound toward a fix point.

4.3 Interpreting relational concepts

New abstractions emerge in a relational lattice characterizing the sharing of links be-tween the objects (as already discussed in Section 3.1). For instance, the drug conceptc#20 in L∞

D (Fig. 6, labeled ({},{∃ itb : c10}) does not belong to the initial lattice L0D

(Fig. 1). It is a new relational concept whose objects, namely Dactinomycin andIsentress, are described in a purely relational way indicating that both drugs weretaken by patients who experienced common adverse reactions, e.g., breath disorder,heart failure, etc. (patient concept c#10 in LP).

In addition, some intents from L0D are extended by a relational part hence refining

the description of the member objects. In this way, the concept c#5 in L0D (Fig. 1)

represents nrti drug class that causes liver disorder. In the final lattice of drugs L∞D

(Fig. 6), the intent of the same concept c#5, enriched with the relational attributes

Relational concept analysis 103

Fig. 7 The final lattice of patients (L∞P ). Quantifiers are omitted in relational attributes because of

visualization limitations of Galicia

∃iw : c13, ∃itb : c0 and ∃itb : c11 indicates that both nrti drugs Kaletra and Vireadinteract with Sustiva and cause some unexpected adverse reactions includinghair loss and oedema. More generally, the relational lattices of patients and drugsprovided by rca link profiles of patients (case reports) with classes of drugs.

4.4 Complexity issues

The Multi-FCA method iterates over a set of tasks for individual contexts: latticeconstruction/maintenance, relational scaling, termination tests. Beside their individ-ual complexity estimations, potential other factors include the number of iterationsand the size of the set K.

Lattice construction is a well-known listing problem. Unlike decision problemswhich focus on a single yes/no solution, listing problems enumerate all such solutions.As the number of such solutions may vary significantly among the instances of theproblem, it is more appropriate to assess the average cost of producing a solutionor, alternatively, to include the output size as a factor. The lowest known worst-casetime complexity for the concept lattice construction is O(nc ∗ no ∗ na) where nc, na

and no are the number of the concepts in the lattice, the number of the attributes andthe number of the objects in the context [19]. To get the overall cost for the latticeconstruction in Multi-FCA, one could merely multiply by the iteration number.Yet, if incremental construction is performed instead of batch one (recall algorithmicchoices are left to the user), the construction of the fix point lattice of a context could

104 M. Rouane-Hacene et al.

be seen as a single task, split across the iterations. Its basic step is the incrementalinsertion of a single attribute into a context followed by the lattice update which,as argued in [34], amounts to a batch lattice construction at the same total cost ofO(nc ∗ no ∗ na).

A subtle difference here is that in Multi-FCA the fix point lattices are computedfrom the fix point contexts rather that the initial ones of the rcf. In a way, the fixpoint contexts are part of the rca output and there is no simple way of estimatingtheir sizes. We therefore rewrite the overall complexity of the lattice constructionpart (inclusive the initializing part, line 7 of Algorithm 1) as follows. Let the size ofthe largest lattice be ncm formal concepts, the maximal number of attributes in a fixpoint context be nam , and the maximal number of objects in a context be nom . Theoverall construction time is thus O(ncm ∗ nam ∗ nom). Notice that the above contextnumber factor was skipped as it is expected to be modest (typically, less than 7).Furthermore, this computation offsets the necessity to factor in—hence to know—the number of iterations: only the set of attributes in the fix point context matters,the way their creation is split among the iterations is immaterial.

The cost factor due to scaling is assessed similarly: First, a concept c = (X, Y) fromthe output contributes to attribute construction only at the iteration following itscreation. With the incremental approach, at subsequent iterations the scaling wouldignore c. Its contribution to scaling effort depends on the number of relations r whoseran(r) is object set behind X, the size of each dom(r) and the cost of the incidencetest for qr : c and an object o ∈ dom(r). The first one is small (in the low digits) andhence vanishes. The third factor is linear in the sizes of r(o) and X and hence ofran(r). This puts the complexity estimation of the scaling at O(ncm ∗ n2

om).

Finally, the termination (line 13 of Algorithm 1) is efficiently tested by simplycomparing the sizes of lattices on two subsequent iteration (due to the sub-orderproperty). Thus, its cost factor could be neglected.

In summary, when lattice construction and scaling are performed by incrementalalgorithms, the overall time complexity function is in O(ncm ∗ nom ∗ (nam + nom)).

The other important aspect of Multi-FCA is its convergence, i.e., the number ofsteps till the fix point is reached. Intuitively, the process halts whenever there are nomore attributes from the original sets Ai that can be propagated backwards along thechains of relational links, e.g., known adr of a drug to the patient taking that drug.Thus, the maximal length of such a chain is an indicator of the iterations Multi-FCAmight need to converge. In fact, absent circuits in these chains, a proper orderingof the contexts enables a single step construction of the fix point. However, thecalculation of the exact number in the general case remains a challenging problem.Indeed, with circularity in the links, the propagation could loop over a circuit for anumber of iterations (that depends on the length of the circuit). In the extreme cases,i.e., with particularly unlucky configurations of the links in the dataset, the numberof iterations is bound from above by the product of the length of some comparablecircuits.6 Fortunately, such cases are extremely rare: circuits, whenever present, tendto be of minimal size (two or three).

6The precise function is the least common multiple (lcm) of the circuit lengths, hence the worst-of-the-worst case is with circuits of unique lengths corresponding to prime numbers.

Relational concept analysis 105

5 Related work

Our rca approach follows several trends within the fields it contributes to. First,as an fca-based framework, rca relates to work on extending the standard inputdata format by integrating non atomic attributes in formal contexts. This trend datesback to the end of 90s, when studies on logical formulas as formal attributes wereinitiated, in particular, description logic expressions as in [24]. Simultaneously, therepresentation of relational information in fca has been standardized in the form ofa power context family (pcf) [25]. Compared to the multi-step mechanics of rca, thecore method here amounts to a static, i.e., single-step, scaling.

Independently, in [29], the relational exploration method has been defined thatexpands the basic method towards a full set of dls constructors, inclusive universaland existential quantifiers for role restrictions. A similarly founded approach hasbeen adopted in [15] within the Logical Concept Analysis trend. Indeed, despite someterminological differences the basic hypotheses and mechanisms remain essentiallythe same. Indeed, the relational descriptors in concept intents are borrowed from dlswhereas the main result relies on reasoning about syntax with role depth providingthe guidelines. The substantial difference with our approach is the relational formatof data that is taken into account and the resulting relational lattices.

In [18, 21], the processing of formal objects with graph-based descriptions isexamined. Notwithstanding the effective extension of the fca machinery to a varietyof data formats matching graphs, e.g., chemical compound models, the resultingmethods are limited to output exclusively graph-based concept intents. Thus, theycannot be easily tuned to produce the entity-centered concept descriptions studiedhere.

Finally, rca shares a name with the relational fca approach towards analysisof lexical and semantic relationships between terms (synonymy, hyponymy, hyper-onymy, etc.), as introduced in [26]. In this approach, relationships hold on entireconcepts rather than on individual objects. In contrast, rca processes relations thatmaterialize as inter-object links and lifts them to the abstract level in the form ofrelational attributes that are generated by a scaling mechanism.

rca framework has been implemented within the open source platforms Galicia7

and eRCA.8 rca has been successfully applied in software engineering and inontology engineering. In software engineering, rca is used in the analysis of UMLartifacts [3, 17], detection and correction of design defects [23], model transformationlearning from transformation examples [30], and Web service classification and com-position [4]. In ontology engineering, rca has been applied for the construction [8]and the refactoring [28, 31] of domain ontologies.

6 Conclusion and future work

rca provides a relational framework that deals with datasets where object descriptionis not restricted to one-valued contexts. The framework allows the derivation of

7http://www.iro.umontreal.ca/∼galicia8http://code.google.com/p/erca/

106 M. Rouane-Hacene et al.

potentially useful abstractions based on the inter-objects links. Using an iterativegeneralization process, these links are propagated on the level of concepts thusyielding to relational abstractions which describe in a generic way the links betweenobjects. In comparison to other extensions of fca, these links are considered atthe launch of the concept formation process thanks to the original mechanism ofrelational scaling whose scales come from concept lattices. We have shown how theprocess converges and a description of the concrete method that implements it aswell as an analytical expression of the obtained solution were proposed.

rca approach has been implemented in the Galicia and eRCA platforms and iscurrently operational for various applications. Scalability is an issue since the size ofthe lattices can grow rapidly due to the cross-fertilization of contexts. Algorithmicissues are our current primary concern as further progress can be realized with per-formances through carefully combined techniques for iterative lattice maintenanceinstead of plain construction.

At each step of the multi-FCA process, the concepts of a particular context, sayKi, or its current relational extension Kp

i , are translated into additional attributesinvolving the relations whose range is comprised in Ki. The new attributes go to thecontexts K j that constitute the respective relation domains. There they get assignedto the local objects o from O j through the corresponding images r(o). Therefore,a natural question to elucidate would be the exact dependency between objectdescription and relational attribute, i.e., how do the former define the latter? Or,alternatively, what properties of the former are reflected in the latter? In other terms,one need to examine the correctness of the iterative method, i.e., is every relationalattribute rooted in an element of the initial rcf, and its completeness, i.e., is everyrelation between objects reflected by at least one such relational attribute.

Another challenging research track lays in the combination of rca with other fcaparadigms such as fuzzy fca [7] and closed pattern mining [35].

References

1. Adda, M., Valtchev, P., Djeraba, C., Missaoui, R.: Toward recommendation based on ontology-powered web-usage mining. IEEE Internet Computing 11(4):45–52 (2007)

2. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the 11th Intl. Conf. onData Engineering (ICDE’95), pp. 3–14. IEEE Computer Society (1995)

3. Arévalo, G., Falleri, J.-R., Huchard, M., Nebut, C.: Building abstractions in class models: formalconcept analysis in a model-driven approach. In: Nierstrasz, O., Whittle, J., Harel, D., Reggio, G.(eds.) Proc. of the 9th Intl. Conf. MoDELS. LNCS, vol. 4199, pp. 513–527. Springer (2006)

4. Azmeh, Z., Driss, M., Hamoui, F., Huchard, M., Moha, N., Tibermacine, C.: Selection of compos-able web services driven by user requirements. In: ICWS, pp. 395–402. IEEE Computer Society(2011)

5. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P. (eds.): The DescriptionLogic Handbook. Cambridge University Press, Cambridge (2003)

6. Barbut, M., Monjardet, B.: Ordre et Classification: Algèbre et Combinatoire, vol. 2. Hachette(1970)

7. Belohlavek, R.: Concept lattices and order in fuzzy logic. Ann. Pure Appl. Logic 128, 277–298(2004)

8. Bendaoud, R., Napoli, A., Toussaint, Y.: Formal concept analysis: a unified framework forbuilding and refining ontologies. In: Gangemi, A., Euzenat, J. (eds.) Knowledge Engineering:Practice and Patterns, Proc. of the 16th Intl. Conf. EKAW, LNCS 5268, pp. 156–171 (2008)

9. Cook, D., Holder, L.: Mining Graph Data. Wiley-Interscience (2006)10. De Raedt, L.: Logical and Relational Learning (Cognitive Technologies), 1st edn. Springer

(2008)

Relational concept analysis 107

11. Dehaspe, L., Toivonen, H.: Discovery of frequent datalog patterns. Data Mining and KnowledgeDiscovery 3, 7–36 (1999)

12. Džeroski, S.: Multi-relational data mining: an introduction. SIGKDD Explor. Newsl. 5, 1–16(2003)

13. Džeroski, S., Lavrac, N. (eds.): Relational Data Mining. Springer (2001)14. Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.): Advances in Knowledge

Discovery and Data Mining. AAAI/MIT Press (1996)15. Ferré, S., Ridoux, O., Sigonneau, B.: Arbitrary relations in formal concept analysis and logical

information systems. In: Mugnier, M.-L., Dau, F., Stumme, G. (eds.) Proc. of the 13th Intl. Conf.on Conceptual Structures (ICCS’05), Kassel, Germany. LNCS, vol. 3596, pp. 166–180. Springer(2005)

16. Ganter, B., Wille, R.: Formal Concept Analysis, Mathematical Foundations. Springer (1999)17. Huchard, M., Rouane-Hacene, M., Roume, C., Valtchev, P.: Relational concept discovery in

structured datasets. Ann. Math. Artif. Intell. 49(1–4), 39–76 (2007)18. Kuznetsov, S.O.: Learning of simple conceptual graphs from positive and negative examples.

In: Zytkow, J.M., Rauch, J. (eds.) Proc. of the Third European Conf. PKDD’99, Prague, CzechRepublic, vol. 1704, pp. 384–391 (1999)

19. Kuznetsov, S.O., Obiedkov, S.A.: Comparing the performance of algorithms for generatingconcept lattices. J. Exp. Theor. Artif. Intell. 14(2–3), 189–216 (2002)

20. Lehmann, J., Hitzler, P.: Concept learning in description logics using refinement operators.Mach. Learn. 78, 203–250 (2010)

21. Liquière, M., Sallantin, J.: Structural machine learning with Galois lattice and graphs. In: Proc.of the 15th Intl. Conf. on Machine Learning (ICML’98), pp. 305–313 (1998)

22. Mann, R.D., Andrews, E.B. (eds.): Pharmacovigilance. Wiley (2002)23. Moha, N., Rouane-Hacene, M., Valtchev, P., Guéhéneuc, Y.-G.: Refactorings of design defects

using relational concept analysis. In: Medina, R., Obiedkov, S. (eds.) Proc. of the 6th Intl. Conf.on Formal Concept Analysis (ICFCA’08). LNCS, vol. 4933, pp. 289–304. Springer (2008)

24. Prediger, S., Stumme, G.: Theory-driven logical scaling. In: Proc. 6th Intl. Workshop KnowledgeRepresentation Meets Databases, Heidelberg. CEUR Workshop Proc., pp. 46–49 (1999)

25. Prediger, S., Wille, R.: The lattice of concept graphs of a relationally scaled context. In: Proc. ofthe 7th Intl. Conf. on Conceptual Structures (ICCS’99), pp. 401–414. Springer (1999)

26. Priss, U.: Efficient implementation of semantic relations in lexical databases. Comput. Intell. 15,79–87 (1999)

27. Rouane-Hacene, M., Huchard, M., Napoli, A., Valtchev, P.: A proposal for combining formalconcept analysis and description logics for mining relational data. In: Kuznetsov, S., Schmidt, S.(eds.) Proc. of the 5th Intl. Conf. on Formal Concept Analysis (ICFCA’07). LNCS, vol. 4390,pp. 51–65. Springer (2007)

28. Rouane-Hacene, M., Valtchev, P., Nkambou, R.: Supporting ontology design through large-scaleFCA-based ontology restructuring. In: Proc. of the 19th International Conference on ConceptualStructures, ICCS 2011—Conceptual Structures for Discovering Knowledge—, Derby, UK, 25–29July 2011, pp. 257–269 (2011)

29. Rudolph, S.: Exploring relational structures via FLE. In: Pfeiffer, H.D., Wolff, K.E., Delugach,H.S. (eds.) Proc. of the 12th Intl. Conf. on Conceptual Structures, ICCS 2004, Huntsville (AL).LNAI, vol. 3127, pp. 196–212. Springer, Berlin (2004)

30. Saada, H., Dolques, X., Huchard, M., Nebut, C., Sahraoui, H.A.: Generation of operationaltransformation rules from examples of model transformations. In: France, R., Kazmeier, J., Breu,R., Atkinson, C. (eds.) Proc. 15th Intl. Conf. MODELS’12, Innsbruck (AT). Lecture Notes inComputer Science, vol. 7590, pp. 546–561. Springer (2012)

31. Shi, L., Toussaint, Y., Napoli, A., Blansché, A.: Mining for reengineering: an application tosemantic wikis using formal and relational concept analysis. In: Antoniou, G., Grobelnik, M.,Simperl, E., Parsia, B., Plexousakis, D., Pan, J., De Leenheer, P. (eds.) Proc. of the 8th ExtendedSemantic Web Conf. (ESWC’11). LNCS, vol. 6644, pp. 421–435. Springer (2011)

32. Srikant, R., Agrawal, R.: Mining generalized association rules. In: Proc. of the 21st Intl. Conf. onVery Large Databases (VLDB’95), Zurich (CH) (1995)

33. Valtchev, P., Missaoui, R., Lebrun, P.: A partition-based approach towards building Galois(concept) lattices. Discrete Math. 256(3), 801–829 (2002)

34. Valtchev, P., Rouane-Hacene, M., Missaoui, R.: A generic scheme for the design of efficienton-line algorithms for lattices. In: de Moor, A., Lex, W., Ganter, B. (eds.) Proc. of the11th Intl. Conf. on Conceptual Structures (ICCS’03). LNCS, vol. 2746, pp. 282–295. Springer(2003)

108 M. Rouane-Hacene et al.

35. Valtchev, P., Missaoui, R., Godin, R.: Formal concept analysis for knowledge discovery and datamining: the new challenges. In: Eklund, P. (ed.) Proc. of the 2nd Intl. Conf. on Formal ConceptAnalysis (ICFCA’04), Sydney (AU). LNCS, vol. 2961, pp. 352–371. Springer (2004)

36. Washio, T., Motoda, H.: State of the art of graph-based data mining. SIGKDD Explor. Newsl.5(1), 59–68 (2003)

37. Wille, R.: Restructuring the lattice theory: an approach based on hierarchies of concepts. In:Rival, I. (ed.) Ordered Sets, pp. 445–470. Reidel, Dordrecht-Boston (1982)


Recommended