Evaluating modelling tools for Software Ecosystems

Introduction

As described by recent research, software organizations can gain a lot from being aware of their Software Ecosystem (SECO) and their position within the ecosystem. To gain that knowledge a thorough understanding of its SECO should be obtained. Especially in large organizations this can be difficult, because of the many actors the organization is involved with. Therefore using a good modelling tool to get an overview of the ecosystem is necessary.

There are currently several network modelling tools available, but one specially designed for modelling SECOs does not exist. Although using general network modelling tools is a possibility, SECO modelling requires a different functionality. For example the emphasis within SECOs is with flows between actors. Therefore, being able to model these flows is necessary whereas in, for example, social network analysis this is not that relevant.

The aim of this research was to identify the specific functionalities required for SECO modelling and find the modelling tool that has implemented these functionalities in the best way possible.

Method

The research consisted of three phases. First the tools to be analyzed were selected. A first list was created by doing literature research and talking to experts. That list was then further narrowed down to a second list. That process is shown in the figure below. The tools are shown that have been eliminated together with the reasons why.

Figure 1

Figure 1: Tool selection

The second phase consisted of doing literature research in as well software ecosystems as network analysis. A thorough understanding of these subjects resulted in a clear list of the important SECO characteristics and the proper network analysis techniques to be used to model these characteristics. That resulted in a clear list of functionalities that the modelling tool should possess in order to be suitable for modelling SECOs.

The feature list:

  • General
    • SECO size calculation
    • Role count calculation
    • Multiple views
  • Nodes
    • Color
    • Size
    • Visibility
    • Labeling
    • Shape
    • Range
    • Customized properties
    • Connecting nodes based on their properties
  • Edges
    • Weight
    • Color
    • Label
    • Multiple labels
    • Different kinds of labels
    • Shape
    • Visibility
    • Direction
    • Customized properties
  • Clustering
    • Clusters/Containers
    • Overlapping clusters
    • Subgroups
  • Scope
    • Zooming
    • Flow paths
    • Flowpaths: Choice
  • Statistics
    • Centrality
    • Density
    • Distance between nodes
    • Connectedness
    • Hierarchy
    • Egocentric networks
    • Structural equivalence
    • Degree

The third phase was the actual tool analysis. A data set consisting of 398 dutch software vendors and their connections to their partners and to each other was used to analyze the tool. For each tool individually the data set was used to test all the desired functionalities. While doing that quality measures such as reliability, efficiency and usability were judged.

Results

As explained before the three tools that were thoroughly analyzed were Pajek, Gephi and Cytoscape. Before showing the results a short explanation of each tool will be provided.

Pajek is a freeware tool created in Slovenia and it is designed for general network analysis. According to the developers Pajek’s strenghts are abstraction, visualization and efficient data analysis. Gephi is an open source tool which also focusses on general network analysis. Usage analysis showed that Gephi is currently the most popular tool for network analysis. Cytoscape was originally designed for analyzing molecular interaction networks, but it has evolved to a more general open source tool.

Pajek

For visualization Pajek distinguishes itsself by using several different data types for their analysis instead of using one network file that stores all the information. By using different files the tool is flexible, because multiple different files for every aspect can be loaded. On the other hand it decreases efficiency, because the different files have to be loaded and saved separately.

Functionality of Pajek is quite good, 27 out of the 35 features were implemented in a sufficient way. However, Pajek lacks some quality. The usability is old fashioned with ugly and unclear visualizations and efficiency could also be better whereas Pajek uses a separate visualization window which forces users to continuously switch between windows in order to make changes to the visualization. The figures below show the main window of Pajek with the general user interface and the different data types as well as a screenshot from the data set viualized in Pajek.

Figure 2

Figure 2: Main window Pajek     

Figure 3

Figure 3: Screenshot Pajek

The biggest advantages advantages and disadvantages for using Pajek are summarised in the table below.

Table 1

Table 1: Results Pajek Summarised

Gephi

Gephi has a pleasant appearance with a modern graphical user interface which is shown in figure 3.9. It is clearly noticeable that effort has been invested in giving the tool a good look. Furthermore, they focused on the navigation by making sure most features are reachable in one click ensuring options to be at hand without having to go through many menus.

The visualized graphs in Gephi have a modern, smooth appearance. Differences in weight and node sizes are subtle, but easy noticeable and the colors used are carefully chosen. With the toolbar at the bottom of the graph window most common settings like enabling labels and changing sizes of components can easily be modified. The graphs are automatically visualized when files are being opened. Furthermore, the visualization automatically adapts when changing settings or running algorithms. These algorithms are executed live on screen showing every step taken in the algorithm. Besides showing what the algorithm is doing it is also possible to pause or stop the algorithm at the point the user is satisfied with the result. One of the visualizations of the data set is shown in the figure below.

Figure 4

Figure 4: Visualization Gephi

Gephi has implemented 28 out of the 35 features of the feature list. Most important feature that is missing is that it is not possible to use node attributes in calculations. The results are summarised in the table below.

Table 2

Table 2: Summarised results Gephi

Cytoscape

It is clear Cytoscape has put effort in optimizing the visualization. The networks are automatically visualized when a file is loaded. Several layout algorithms are available to nicely position the nodes, which together with the many style settings that can be altered ensures a nice representation of the network.

Cytoscape does a remarkable job in visualizing edges. Even when the layout is far from ordered it is often still good visible what nodes have most connections and where they are going. That is shown in the figure below which shows a visuzalization in which the nodes are very close together and the edges all go through one another, still it is very clear what nodes have most connections and most edges can easily be followed to its destination.

Figure 5

Figure 5: Screenshot Cytoscape

Furthermore, Cytoscape can visualize multiple networks at the same time. Cytoscape allows users to select nodes and create new networks with one press of a button. These networks can then be visualized right next to the original network instead of replacing the original network. That makes it for example possible to keep parts of a layout as a second network before using different layouts. In that way it is easier to combine layouts and, hence, get more out of the visualization. In SECO that could, for example, be used to compare two actors and its connections or even to compare two or more different SECOs.

Unfortunately Cytoscape lacks some functionality, only 24 of the desired features are implemented. Most important features that are missing are good edge labels and the possibility to alter/use single nodes. Some other pros and cons are shown in the table below, which summarises the results for Cytoscape.

Table 3

Table 3: Summarised results Cytoscape

Conclusion

Overall the conclusion is clear: Pajek lacks quality, Cytoscape lacks functionality and, therefore, Gephi is the tool best suitable for analyzing SECOs. On the other hand the conclusion can also be drawn that Gephi is not yet totally fit for the job and more research has to be done, especially on usability and expandability, to ensure Gephi will be totally suitable for modelling SECOs.

When that is done Gephi can be used in general SECO research to help in data analysis, SECO governance and comparing SECOs. Doing that, hopefully, new insights are gained in how to manage a SECO and how software organizations can create their ideal SECO around them.