|
Complex Networks constitute a convenient tool to model real-world complex systems. For this reason,
they have become very popular in the last decade. It is possible to encode various types of information
in a complex network, depending on the studied system and modeling objectives and needs. In its most
basic form, a complex network contains only nodes and links (plain network). But it is also possible to
enrich it by setting link directions (directed network), associating attributes to nodes (attributed network)
or links, or introducing a temporal dimension (dynamic networks).
Many tools exist to study complex networks. Among them, community detection is one of the most
important. A community is roughly defined as a group of nodes more connected internally than to the rest
of the network. In the literature, this intuitive definition has been formalized in many ways, leading to
countless different methods and variants to detect communities. In the large majority of cases, the result
of these methods is set of node groups in which each node group corresponds to a community. From
the applicative point of view, the meaning of these groups is as important as their detection. However,
although the task of detecting communities in itself took a lot of attraction, the problem of interpreting
them has not been properly tackled until now.
In this thesis, we see the interpretation of communities as a problem independent from the community
detection process, consisting in identifying the most characteristic features of communities. We
break it down into two sub-problems: 1) finding an appropriate way to represent a community and 2) objectively
selecting the most characteristic parts of this representation. To solve them, we take advantage
of the information encoded in dynamic attributed networks. We propose a new representation of communities
under the form of temporal sequences of topological measures and attribute values associated
to individual nodes. We then look for emergent sequential patterns in this dataset, in order to identify the
most characteristic community features.
Our solution to the interpretation problem takes the form of a five-stepped framework. First, we
detect the communities and process the nodal topological measures. Second, we create a database of
nodal sequences, thanks to these measures as well as the nodal attributes originating from the studied
real-world system. Third, we identify the sequential patterns contained in this database. Fourth, we filter
them using some objective criteria related to the notions of prevalence (support measure), emergence
(growth rate) and closeness. Fifth, we finally select the most characteristic patterns covering each community.
Our framework produces the characteristic sequential patterns and also allows identifying outlier
nodes.
We perform a validation of our framework on artificially generated dynamic attributed networks. At
this occasion, we study its behavior relatively to changes in the temporal evolution of the communities,
and to the distribution and evolution of nodal features. We also apply our framework to real-world
systems: a DBLP network of scientific collaborations, and a LastFM network of social and musical
interactions. Our results show that the detected communities are not completely homogeneous, in the
sense several node topic or interests can be identified for a given community. Some communities are
composed of smaller groups of nodes which tend to evolve together as time goes by, be it in terms of
individual (attributes, topological measures) or relational (community migration) features. The detected anomalies generally fit some generic profiles: nodes misplaced by the community detection tool, nodes relatively similar to their communities, but also significantly different on certain features and/or not
synchronized with their community evolution, and finally nodes with completely different interests. |