labelshilt.blogg.se

A synthetic data generator for online social network graphs
A synthetic data generator for online social network graphs











Also, there is a lack of systems which facilitate the work of a data analyst in anonymizing this type of data structures and performing empirical experiments in a controlled manner on different datasets. Thus, improving this aspect will have a high impact on the data utility of anonymized social networks. Current anonymization techniques are good as identifying risks and minimizing them, but not so good at maintaining local contextual data which relate users in a social network. However, when data is anonymized to make it safe for publication in the public domain, information is inevitably lost with respect to the original version, a significant aspect of social networks being the local neighborhood of a user and its associated data.

a synthetic data generator for online social network graphs

On the other hand, there are many risks for user privacy, as information a user may wish to remain private becomes evident upon analysis. Also, data analysts have found a fertile field for analyzing user behavior at individual and collective levels, for academic and commercial reasons. In recent years, online social networks have become a part of everyday life for millions of individuals.

a synthetic data generator for online social network graphs

The data generator is also highly configurable, with a sophisticated control parameter set for different “similarity/diversity” levels. A good match is obtained between the generated data and the target profiles and distributions, which is competitive with other state of the art methods. The empirical tests confirm that our approach generates a dataset which is both diverse and with a good fit to the target requirements, with a realistic modeling of noise and fitting to communities. In the following work, we present and validate an approach for populating a graph topology with synthetic data which approximates an online social network.

#A synthetic data generator for online social network graphs series#

However, this presents a series of challenges related to generating a realistic dataset in terms of topologies, attribute values, communities, data distributions, correlations and so on. One possible solution to both of these problems is to use synthetically generated data. Here, a network sociogram may be comprised of extracted Twitter information and the relationships (edges or ties) from reply-to, re-tweeting, follower-following, and other types of interrelationships on that microblogging platform.Two of the difficulties for data analysts of online social networks are (1) the public availability of data and (2) respecting the privacy of the users. In this latter type, the social network may not be focused around on ego (person or organization) node. The second type of sociogram is a network sociogram. Then, a target case node is selected as the focal one around which all extant relationships are drawn. These nodes must have some defined relationships. To create an egocentric sociogram, there must first be nodes that are case nodes (representing individuals or groups, or "egos" or "entities"). The egocentric sociogram depicted is a one-degree ego-neighborhood one (with direct ties between the focal node and the other vertices in the network). undirected networks in which the edges do not have arrows).

a synthetic data generator for online social network graphs

In this type of network, the individual case is at the center of the network, and the networks are directional ones (vs. An egocentric sociogram is based on a particular focal case node (representing an individual or organization) and shows all links to that particular ego-based node, such as other individuals, themes, locations, and other mixes of data.











A synthetic data generator for online social network graphs