Evaluating modelling tools for Software Ecosystems

Introduction

As described by recent research, software organizations can gain a lot from being aware of their Software Ecosystem (SECO) and their position within the ecosystem. To gain that knowledge a thorough understanding of its SECO should be obtained. Especially in large organizations this can be difficult, because of the many actors the organization is involved with. Therefore using a good modelling tool to get an overview of the ecosystem is necessary.

There are currently several network modelling tools available, but one specially designed for modelling SECOs does not exist. Although using general network modelling tools is a possibility, SECO modelling requires a different functionality. For example the emphasis within SECOs is with flows between actors. Therefore, being able to model these flows is necessary whereas in, for example, social network analysis this is not that relevant.

The aim of this research was to identify the specific functionalities required for SECO modelling and find the modelling tool that has implemented these functionalities in the best way possible.

Method

The research consisted of three phases. First the tools to be analyzed were selected. A first list was created by doing literature research and talking to experts. That list was then further narrowed down to a second list. That process is shown in the figure below. The tools are shown that have been eliminated together with the reasons why.

Figure 1

Figure 1: Tool selection

The second phase consisted of doing literature research in as well software ecosystems as network analysis. A thorough understanding of these subjects resulted in a clear list of the important SECO characteristics and the proper network analysis techniques to be used to model these characteristics. That resulted in a clear list of functionalities that the modelling tool should possess in order to be suitable for modelling SECOs.

The feature list:

  • General
    • SECO size calculation
    • Role count calculation
    • Multiple views
  • Nodes
    • Color
    • Size
    • Visibility
    • Labeling
    • Shape
    • Range
    • Customized properties
    • Connecting nodes based on their properties
  • Edges
    • Weight
    • Color
    • Label
    • Multiple labels
    • Different kinds of labels
    • Shape
    • Visibility
    • Direction
    • Customized properties
  • Clustering
    • Clusters/Containers
    • Overlapping clusters
    • Subgroups
  • Scope
    • Zooming
    • Flow paths
    • Flowpaths: Choice
  • Statistics
    • Centrality
    • Density
    • Distance between nodes
    • Connectedness
    • Hierarchy
    • Egocentric networks
    • Structural equivalence
    • Degree

The third phase was the actual tool analysis. A data set consisting of 398 dutch software vendors and their connections to their partners and to each other was used to analyze the tool. For each tool individually the data set was used to test all the desired functionalities. While doing that quality measures such as reliability, efficiency and usability were judged.

Results

As explained before the three tools that were thoroughly analyzed were Pajek, Gephi and Cytoscape. Before showing the results a short explanation of each tool will be provided.

Pajek is a freeware tool created in Slovenia and it is designed for general network analysis. According to the developers Pajek’s strenghts are abstraction, visualization and efficient data analysis. Gephi is an open source tool which also focusses on general network analysis. Usage analysis showed that Gephi is currently the most popular tool for network analysis. Cytoscape was originally designed for analyzing molecular interaction networks, but it has evolved to a more general open source tool.

Pajek

For visualization Pajek distinguishes itsself by using several different data types for their analysis instead of using one network file that stores all the information. By using different files the tool is flexible, because multiple different files for every aspect can be loaded. On the other hand it decreases efficiency, because the different files have to be loaded and saved separately.

Functionality of Pajek is quite good, 27 out of the 35 features were implemented in a sufficient way. However, Pajek lacks some quality. The usability is old fashioned with ugly and unclear visualizations and efficiency could also be better whereas Pajek uses a separate visualization window which forces users to continuously switch between windows in order to make changes to the visualization. The figures below show the main window of Pajek with the general user interface and the different data types as well as a screenshot from the data set viualized in Pajek.

Figure 2

Figure 2: Main window Pajek     

Figure 3

Figure 3: Screenshot Pajek

The biggest advantages advantages and disadvantages for using Pajek are summarised in the table below.

Table 1

Table 1: Results Pajek Summarised

Gephi

Gephi has a pleasant appearance with a modern graphical user interface which is shown in figure 3.9. It is clearly noticeable that effort has been invested in giving the tool a good look. Furthermore, they focused on the navigation by making sure most features are reachable in one click ensuring options to be at hand without having to go through many menus.

The visualized graphs in Gephi have a modern, smooth appearance. Differences in weight and node sizes are subtle, but easy noticeable and the colors used are carefully chosen. With the toolbar at the bottom of the graph window most common settings like enabling labels and changing sizes of components can easily be modified. The graphs are automatically visualized when files are being opened. Furthermore, the visualization automatically adapts when changing settings or running algorithms. These algorithms are executed live on screen showing every step taken in the algorithm. Besides showing what the algorithm is doing it is also possible to pause or stop the algorithm at the point the user is satisfied with the result. One of the visualizations of the data set is shown in the figure below.

Figure 4

Figure 4: Visualization Gephi

Gephi has implemented 28 out of the 35 features of the feature list. Most important feature that is missing is that it is not possible to use node attributes in calculations. The results are summarised in the table below.

Table 2

Table 2: Summarised results Gephi

Cytoscape

It is clear Cytoscape has put effort in optimizing the visualization. The networks are automatically visualized when a file is loaded. Several layout algorithms are available to nicely position the nodes, which together with the many style settings that can be altered ensures a nice representation of the network.

Cytoscape does a remarkable job in visualizing edges. Even when the layout is far from ordered it is often still good visible what nodes have most connections and where they are going. That is shown in the figure below which shows a visuzalization in which the nodes are very close together and the edges all go through one another, still it is very clear what nodes have most connections and most edges can easily be followed to its destination.

Figure 5

Figure 5: Screenshot Cytoscape

Furthermore, Cytoscape can visualize multiple networks at the same time. Cytoscape allows users to select nodes and create new networks with one press of a button. These networks can then be visualized right next to the original network instead of replacing the original network. That makes it for example possible to keep parts of a layout as a second network before using different layouts. In that way it is easier to combine layouts and, hence, get more out of the visualization. In SECO that could, for example, be used to compare two actors and its connections or even to compare two or more different SECOs.

Unfortunately Cytoscape lacks some functionality, only 24 of the desired features are implemented. Most important features that are missing are good edge labels and the possibility to alter/use single nodes. Some other pros and cons are shown in the table below, which summarises the results for Cytoscape.

Table 3

Table 3: Summarised results Cytoscape

Conclusion

Overall the conclusion is clear: Pajek lacks quality, Cytoscape lacks functionality and, therefore, Gephi is the tool best suitable for analyzing SECOs. On the other hand the conclusion can also be drawn that Gephi is not yet totally fit for the job and more research has to be done, especially on usability and expandability, to ensure Gephi will be totally suitable for modelling SECOs.

When that is done Gephi can be used in general SECO research to help in data analysis, SECO governance and comparing SECOs. Doing that, hopefully, new insights are gained in how to manage a SECO and how software organizations can create their ideal SECO around them.

Advertisements

Multihoming in Software Ecosystems

Multihoming in Software Ecosystems

Open source software ecosystems generally have a large number of developer contributors. These relationships are not exclusive: developers may come and go, or work at different (even competing) ecosystems at the same time (multihoming) instead of limiting themselves to a single ecosystem (singlehoming). Mulithoming can improve the popularity of the code; participating in two ecosystems means more users, which is an advantage for the developer. Little is known about the effects of multihoming on software ecosystems. This paper describes a study of open source developers in the Eclipse ecosystem to determine if they work on various projects in the ecosystem or work on comparable projects in other ecosystems.

A tool named ‘Darkharvest’ is created in Ruby to mine information about the Eclipse ecosystem via the GitHub API. The source of Darkharvest is available through a link in the paper we wrote on this subject. DarkHarvest collects the meta-data (id, name, description) of all public repositories managed by the Eclipse Foundation on Github. Next, of all the Eclipse repositories, all contributors are identified and their public meta-data (id, name) gathered. Finally, Darkharvest collects all personal, publicly available repositories of these contributors.

Piechart

Piechart that shows the multihoming combinations and their share of all repositories. Please note, that we also identified triplehomers, but only a small number.

Darkharvest found 565 Eclipse Foundation repositories in the initial step. Of these repositories, we collected the proles of a total of 736 contributors. These contributors controlled a total of 9756 repositories (excluding the initial 565 Eclipse repositories) for a grand total of 10321 repositories in our data-set.

Most multihoming developers (87.79%) only work in one competing ecosystem besides Eclipse. A minority of multihoming developers (12.21%) work in two competing ecosystems besides Eclipse, while no developers contribute to four or more ecosystems. The average number of ecosystems in which a multihoming Eclipse developer works is found to be at 2.12.

The data furthermore showed two correlations: one between repositories at the Aptana ecosystem and at the Eclipse ecosystem, being .574 with two-tailed significance of .000, the other between repositories at the Eclipse ecosystem and the Textmate ecosystem, being .356 with a .012 significance. It is unknown why those correlations exist.

Next, multihoming behavior is analyzed: the results show that more active developers tend to have more repositories outside of Eclipse. However, if only the ratio is calculated between Eclipse and other ecosystems to which a developer contributes, it is the other way around: for every Eclipse repository active developers have less non-Eclipse repositories than less active developers.

Multihoming ratio

Multihoming ratio (Number Eclipse repos divided by number of non-Eclipse repos) per amount of total repositories

To identify relationships, we created a Gephi visualization. The colors identify the different ecosystems, the nodes display the repositories, and the size of the node grows if more developers work at the repository. Lines are users which work at both repositories; which are thicker when more developers share the same two repositories.

Ecosystem Color
Eclipse Yellow
Aptana Red
Intellij Green
Sublime Cyan
Textmate Blue
Vim Orange
Visual Studio Grey
Gephi visualization

The Gephi visualization, displaying repositories as nodes and users as lines between nodes

Some interesting ‘islands’ are visible. Some of them exist of only one developer being very active (the blue cluster to the east), while other islands exist of multiple developers working at one cluster.

Because the resolution of the screen matters a lot for the readability and the visualization has one of 3000×3000 pixels, we provided multiple sizes.

Link to visualization Resolution
Link 3000×3000 pixels (large)
Link 2000×2000 pixels (medium)
Link 1500×1500 pixels (smaller)
Link 1000×1000 pixels (small)

The Perceived Dependence of Software Developing Organizations on Open Source Rich Client Platforms

This post presents the main findings of the paper ‘The Perceived Dependence of Software Developing Organizations on Open Source Rich Client Platforms’ by Fortuin and Moret (2014).

In this paper the authors investigate how developing organizations with commercial purposes perceive dependence on an open source rich client platform (RCP). Performing a qualitative analysis, employees of 13 commercial software developing organizations using the Eclipse Rich Client Platform were interviewed. All the researched companies are small to medium sized (1 to 25 FTE).

Results demonstrate the pros and cons of commercial software development on a rich client platform in the Eclipse ecosystem. From the results can be inferred that all participants are pleased with the Eclipse platform, only two of the participants mentioned that the platform did not fully meet their expectations. Furthermore, interviewees mentioned many more advantages than disadvantages. The research provides both developers and the Eclipse community with opportunities and threats resulting from the implementation of the Eclipse Rich Client Platform for commercial product development.

Table 5 and 6 below present the main findings: the perceived advantages and disadvantages by the interviewed companies. The numbers represent the different companies and an X implies that the advantage or disadvantage was mentioned by the company. The shown results are common denominators derived from all provided answers.

In total, 14 different advantages were mentioned divided by 13 companies.

Eclipse RCP disadvantages

Eclipse RCP advantages

 

In total, 11 different disadvantages were mentioned divided by ten different companies.

Eclipse RCP disadvantages

Eclipse RCP disadvantages

 

Striking is that most of the mentioned disadvantages are only mentioned  once, whereas most of the advantages are mentioned multiple times. This could be an indication that most of the indicated disadvantages are product specfi c, whereas the advantages are more generic.

Additionally, this study uncovered that, (I) most of the interviewees indicated that alternatives were not considered within their company, (II) these alternatives were rejected because the Eclipse RCP outperformed them on different areas, (III) only two interviewees mentioned that the expectations regarding the Eclipse RCP were not met within their company, (IV) not a single interviewee stated that replacing the Eclipse RCP is being considered within their company, and (V) most of the contribution takes place in the form of bug reporting, bug fixing, plug-in development, membership, committers or writing blogs.

If you are interested in reading more in-depth of the research and results, please contact one of the authors.

Participation in the Eclipse ecosystem: What drives strategic members?

The phenomenon of software ecosystems (SECO) is relatively new and evolves rapidly in the field of software engineering. SECO is a co-innovation approach by developers, software organizations, and third parties that share common interest in the development of software technology.Consequently, such systems drive on strategic organizations that join as members. Leading software companies such as IBM, Google, Oracle and SAP are members in the Eclipse Ecosystem.

In the Eclipse Ecosystem strategic members are the highest type of membership organizations can opt for. This specific group of members potentially has the biggest impact on the health and direction of the Eclipse ecosystem. A relevant question that pops up is what companies move to participate in the Eclipse ecosystem. To our knowledge, studies are lacking with regard to this particular matter.

Therefore, the research aim of this paper was to identify the key drivers for strategic Eclipse members to participate in the open source Eclipse software ecosystems from a partner perspective. Specifically, the study investigated why strategic partners participate in the Eclipse Ecosystem, and what benefit(s) they obtain from a partnership with Eclipse. Following a literature review, a semi-structured interview was conducted among partners currently involved in the Eclipse ecosystem.

Membership benefit Type of membership benefit relates to
Eclipse is a technical platform and community All membership types
Eclipse functions as a marketing channel All membership types
Seat in board of directors and Requirements Council Strategic membership
Devoting developers to Eclipse is beneficial to the company Strategic membership
Close contact with management helps to improve business Strategic membership

Table 1. Derived Eclipse membership benefits

In accordance with the findings of this study we can conclude that strategic partners are driven to participate in the Eclipse Ecosystem by two essential drivers. The first driver relates to contributing to the Eclipse ecosystem in general, the second driver is more specific to the motivation for strategic membership. The first motivational driver is the state of the Eclipse ecosystem, the technical platform to support companies’ products, and a community as a marketing channel. The second motivator is influence; corporations choose to pay the annual fees for strategic membership mainly for the increase influence they gain when having a seat in the board of directors and Requirements Council.

The Reality of an Associate Model – Comparing Partner Activity in the Eclipse Ecosystem

This post consist of the main findings of the paper; ‘The reality of an associate model – comparing partner activity in the Eclipse Ecosystem’ by Aarnoutse, Renes and Snijders (2014). We have researched the partnership model of Eclipse and the activity of the different types of partners. This is done on platform as well as Ecosystem level.

Eclipse is an open source community in which individuals and organizations can collaborate. The projects of Eclipse “are focused on building an open development platform comprised of extensible frameworks, tools and runtimes for building, deploying and managing software across the lifecycle”.

We identified that Eclipse has 205 formal partners, of which 6 are sponsors and 202 are members of Eclipse. Google, IBM and Oracle are sponsor as well as member of Eclipse. For the exact distribution see the table below.

These partners do have differences in how they contribute to the Eclipse platform and Ecosystem. We have analyzed several items that represent activity. For example, the average amount of projects a partner contributes to, average amount of active committers of a company, the average amounts of commits and average amount of commits per committer. In the table below, there is a part of this comparison of activity between the members of Eclipse in the platform and in the Ecosystem.

EclipseActivity

The pie charts below show the total amount of commits and committers for the Eclipse platform as well as the ecosystem. Based on the availability of the data, we give an overview of the distribution of the different membership types as well as the people who code for the Eclipse Foundation of the platform. In addition, for the ecosystem we show the distribution of the membership types accompanied by the people who did code for a company which isn’t member of Eclipse.

Eclipse Distribution

Based on this research we conclude that a higher level of membership is related to more activity. The strategic and associate members both significantly relate to a higher amount of activity, while this is not the case with the solution members. Companies without a membership are more active than the associate members, which suggests that Eclipse should improve their partnership model by motivating associate members and incorporating active non-member companies.

If you are interested in reading more of the analysis, additional conclusions or implications of our research, please contact one of the authors; {f.Aarnoutse, C.Renes, R.Snijders, S.Jansen}@uu.nl.

Merits of a Meritocracy in Open Source Software Ecosystems

The Eclipse open source ecosystem has grown from a small internal IBM project to one of the biggest Integrated Development Environments in the market. Open source communities and ecosystems do not follow the standard governance forms typically used in big organizations. A meritocracy is a frequently occurring form of governance on different levels in open ecosystems. In this paper we research how this form of governance influences the health of projects within the Eclipse ecosystem in terms of the amount of commits within each month. We analyzed the hierarchy of Eclipse, how merits are conceptualized within the ecosystem and the effect of the appointments of mentors and project leads on the amount of commits. From our research, we can conclude that this system is not always as fair as it seems; merits are only a benefit in some cases.

Sample of project lead data with commit averages in months

Table 3 provides an example of the analyzed data on the effect of project lead appointments per project. The change column shows if the project was less productive(‘<‘), equally productive(‘=’) and more productive (‘>’) during the assignment of the project lead.

We concluded, that while commit data is of importance when a commiter wants to climb the ladder to the top, it is certainly not the only one. Rather, personality and connections are even more important. The Meritocracy itself is also not as ‘fair’ as it seems. The more to the top you get, the less it is about merits – and more about politics.


 

Eckhardt, A.E. & Kaats, E.J. (2014) The Merits of a Meritocracy in Open Source Ecosystems. Poceedings of Seminar Software Ecosystems 2013-2014, Utrecht, The Netherlands. Retrieved on 18/04/2014 from http://bit.ly/SECOmeritocracy

 

The Open Source Community Health Analysis Method

Built on a developed understanding of open source community (OSC) vitality, this post introduces a systematic approach for software product managers that enables them to evaluate the vitality of candidate open source communities.

The authors find it important to highlight that the primary objective of the following health analysis method is not to present a universally applicable method that is suitable for any type of open source community or any kind of software company. Instead, the focus lies on developing a situational understanding of each case and then to select a number of method fragments that are best suited for the respective situation. Thus the aim of the health analysis method is to provide software product managers with a toolset that enables them to make the best assessment in a variety of scenarios.

Within this post we will introduce the concept and the features of a method fragment pool and method assembly (Harmsen, Brinkkemper, & Oei, 1994). These will be utilized to create situational methods for assessing OSC vitality. Furthermore, we present a selection rationale for choosing and assembling appropriate method fragments. Finally, the post contains a collection of varying fragments that can be used individually, or in combination to perform an OSC vitality assessment, as well as a sub-section for method calibration.

Method Fragment Pool

Derived from the domain of method engineering, this segment introduces the concept of situational method design, as well as that of a method fragment pool, containing base elements for method assembly. While the general principles of situational method engineering (Van De Weerd & Brinkkemper, 2008) fully apply, this section focuses on a customized adoption towards the specific needs of Software Product Managers and their responsibilities in regard to OSC dependency management.

Built upon the work of Harmsen, Brinkkemper and Oei (1994), Figure 9 depicts a process flow aimed at creating a situational method for OSC vitality assessment. Furthermore, the diagram includes an iterative cycle, focused on the customization of method fragments towards a community’s unique characteristics.

The building blocks of a situational method are so called method fragments, which are stored within the method fragment pool. Each element within the pool focuses on assessing one specific OSC vitality aspect and can be either used by itself, or in combination with other fragments. Each fragment is linked to situational factors that make it best suited for assessing either a given community type or determining its applicability for a given Software Product.

Fragment selection begins with an assessment of OSC characteristics, necessary for narrowing down the total number of candidate method fragments for the given case. Combined with the requirements and the resources of the respective Software Product, both form the selection rational for choosing suitable method fragments within the existing fragment pool.

method design

Figure 9 – Situational OSC Vitality Method Design Process

Following the selection process, the chosen fragments are being assembled into a situational method. Although a particular order is not a requirement, a logical structure or segmentation into fragment groups can contribute to the method’s usability. Depending on the relative significance of community aspects, fragment importance can vary. Thus, looking at the overall quality of the software product, individual fragments can receive a greater weighted influence over less significant ones.

After method assembly, the individual fragments and the entire method can undergo an iterative process, aimed at improving the accuracy of the assessed OSC vitality. Hereby, historic or present community data can be used to review fragment and method accuracy. If a given fragment is found to generate inaccurate or contradictory vitality data it can be adjusted, its result weighed less or it can be fully removed from the situational method. All community specific adjustments can then be updated and re-inserted into the fragment pool. Should another software product, with varying project characteristics, require assessing this particular OSC, the new and more accurate fragments can be implemented.

Fragment Categorization and Selection Rationale

The following sub-section introduces a number of criteria that can be used to select situation specific method fragments for a given OS community or software development project. The main reason for this approach is the intention to create a practical heuristic that can be quickly utilized for method assembly. Furthermore, the approach does not enforce the selection of specific fragments, based on the presented rationale, but merely suggests using it as an indicator for assessing fragment suitability. The three introduced criteria have been selected based on their simplicity for the use by software product managers and their universal applicability for varying types of OSCs. Herby, the first two indicators stem from the analyzed scientific literature and focus on specific community characteristics. The third indicator is based on the authors’ estimate for required fragment resources and aims at a software product’s development scope.

Community Size

The most evident indicator for fragment selection is community size. Hereby the size can be measured by the number of OSC developers, the number of core developers or alternatively the total number of registered community members. While this indicator needs to be adjusted for each community’s nature or characteristics, it can help to quickly exclude fragments that are only applicable to large or very small OSC types. Each method fragment is labeled with one or many community size indicators, suggesting the most suitable community sizes.

situational_factor_community_size_2

Figure 10 –Fragment Selection Indicator – Community size

Community Maturity

As shown before, community age is a direct indicator for the probability of its survival and was found to have a significant positive correlation with the maturity of it software development processes and its governance structure. This in return makes it a suitable indicator for selecting applicable method fragment, as it can indicate which fragments are best suited for mature or young OSCs. Alike to the previous indicator, each fragment can have one or multiple labels describing matching OSC criteria.

situational_factor_community_maturity_3

Figure 11 – Fragment Selection Indicator – Community maturity

Fragment Costs

Looking directly at the perspective of software product managers, disposable time and financial resources can vary greatly with each managed software product. Thus, even if a method fragment for vitality assessment is promising, it yet may not be feasible to use.  This can be the case if the given fragment exceeds available project resources or if the overall analysis would benefit from focusing the efforts on less resource intensive fragments. In contrast to the previous indicator, each fragment is labeled with only one cost indicator that expresses the authors’ estimate of the necessary financial or time investment, in order to execute the according method fragment.

situational_factor_resource_investment_4

Figure 12 – Fragment Selection Indicator – Fragment costs

Situational Method Fragments

By combining selection rationale and vitality indicators, this sub-section presents a number of OSC health method fragments that can be used individually or in combination by software product managers. Within the following sub sections, we present five fragment categories, each aimed at different, identified vitality indicators.

In regard to fragment completeness, It is noteworthy that three vitality indicators have not been included in the fragment pool. OSC license utilization, user interest and bug report frequency have been deliberately excluded from the method base. It is the authors’ belief that these areas are either not relevant for software product managers or have been reviewed with contradictory results in scientific literature. As an example, OSCs with commercially not suitable licenses are unlikely to be considered for integration by software product managers.

Furthermore, the author would like to highlight that this method does not attempt to specifically dictate how to extract or quantify data but presents a guideline that can and should be adjusted to particular characteristics of each analyzed OSC.

Common Method Base

All OSC Health Analysis Methods contain two base fragments that act as a foundation for the entire vitality assessment. Figure 13 depicts the first fragment that focuses on data availability, as well as the second fragment that serves the purpose of establishing a base performance index for interval based method fragments.

initiate method

Figure 13 – OSC Health Method: Common method base

Both fragments can be seen as a prerequisite for structured data analysis and are thus not labeled with a selection rationale. Furthermore, the fragments should be seen as a mere guidance for adjusting the freely selected OSC fragments to a community’s accessible data, as well as for standardizing data operationalization in order to achieve consistent and reproducible results across all index based fragments. This is achieved by calibrating each interval variable based on one common starting date and standardized base points. Alike to established common practices in stock market indices, this approach enables detailed performance tracking of each vitality variable over time. Furthermore, it enables the use of aggregate constructs for certain vitality aspects or for the entire OSC vitality.

Performance Index based Method Fragments

Looking at distinctly quantifiable vitality indicators, this sub-section introduces a number of method fragments that can be used to generate interval based vitality indices.

The first fragment, depicted in Figure 14, aims at external OSC metrics and is most suitable for measuring outward facing OSC activities. This includes actions, such as extracting the number of OSC downloads over time, tracking online search popularity or monitoring release frequency. All activities produce individual results, as well as an aggregate index, representing the overall metrics performance.

The fragment is suitable for a broad range of OSCs with varying degrees of maturity and size. However, due to the necessity to establish and track changes to the indices, the fragment is relatively expensive and has the second highest cost rating

external osc metrics

Figure 14- OSC Health Method Fragment: Monitor external metrics

In contrast to Figure 14, the following fragment (Figure 15) measures internal OSC activity and attempts to accurately depict internal community processes. Hereby, the fragment measures OSC aspects, such as bug or feedback response times, mailing list or forum activities and source code contributions. Analogue to the external metrics, each activity of this fragment produces a vitality index for the respective variable, which in return are aggregated into an internal OSC performance index. Due to their nature, this fragment is labeled with an identical selection rational to the one in Figure 14.

It is noteworthy that the indicated fragment costs for Figure 14 and Figure 15 are based on pessimistic assumptions regarding data availability. Initial experiments with OSC vitality analysis have shown that even among popular and well established communities, broad data availability should not be considered a given.

internal OSC activity

Figure 15- OSC Health Method Fragment: Monitor Internal community activity

The final index driven method fragment serves the purpose of correcting the generated indices for forms of cyclical community activity. It can correct too low or too high activity levels by taking a community’s natural activity cycle into account and accordingly adjusting the individual and aggregate indices. Due to the necessity for historic data, as well as one for mature OSC processes, this fragment is only suited for medium to high OSC maturity levels. However, given that the indices solely need adjusting and no new data operationalization is necessary; this fragment is less expensive than its two predecessors.

recurring activity cycles

Figure 16 – OSC Health Method Fragment: Activity cycles

Scale Based Method Fragments

After having introduced the interval based vitality fragments; this section presents a number of ordinal scale Health Method fragments, based on vitality indicators that cannot be represented by means of a distinct unit scale. Instead, the indicators rely on a number of likert scales that are individually designed and adjusted for each fragment. Although this approach does not support a detailed monitoring of distinct variable changes, it allows for the tracking of proven indicators, which are otherwise difficult to operationalize, thus broadening the OSC analysis.

review code quality

Figure 17- OSC Health Method Fragment: Source Code quality

Figure 17 evaluates OSC vitality from the perspective of code quality. After creating a definition of quality for the OSC in question, the method establishes a likert scale that indicates improvements or decline of code quality over time. Once reviewed, the results are being documented and the current state of the code quality rated. Given that code reviews are seen as good practice when integrating software components, the fragment is suitable for communities of virtually all sizes and degrees of maturity. However, it is noteworthy that manual code reviews are also very expensive, as considerable resources need to be invested, typically at the expense of development efforts.

Alike to code quality, Figure 19 depicts a method fragment aimed at evaluating the state of OSC project documentation. While only suited for medium to very mature OSCs, this approach is less resource intensive than the manual code review. After establishing a suitable scale for documentation quality, the results can be rated and re-evaluated at different points in time. Documentation quality was mentioned by a majority of the interviewed open source experts as a quick indicator for well-practiced community governance. Furthermore, it serves as an indicator for the necessary time investment when integrating an OSC.

core developer reputation

Figure 18- OSC Health Method Fragment: Core developer reputation

The purpose of the final two scale driven method fragments is to review OSC core developer reputation and motivation.

Analogue to the previous approaches, the fragment in Figure 18 first establishes a reputation ranking for core developers and then progresses to developer identification and conducting interviews with domain experts. Both steps are necessary for establishing a reputation ranking. As core developers play an important role in most OSCs, this fragment is suited for a broad scope of maturity levels, however becomes less significant for large and very large communities, as other vitality factors, such as community governance or a growing member base gain in significance. Due to its qualitative nature and the typically small number of core developers, this fragment only requires a medium time investment to be executed.

documentation quality

Figure 19- OSC Health Method Fragment: Documentation quality

The final fragment within this category (Figure 20) reviews one of the main drivers for OSC vitality: Core developer motivation. After clearly identifying core developers within a project’s source code repositories or documentation, the fragment proposes the design of a core developer questionnaire and a matching motivation scale. Based on this, identified candidates can be interviewed and their motivation to contribute to the project tracked over time. As this approach is similar to one within the developer reputation fragment, the recommended OSC maturity and age indicators are identical. Hereby, the fragment costs are directly linked to the core developers’ interest in cooperating with the product manager and should be reviewed in light of previous cooperation attempts.

Figure 20 – OSC Health Method Fragment: Core Developer motivation

Software Development Practices Checklist

In contrast to the index or likert scale based method fragments, this subsection introduces a fragment that purely records the binary state of a number of vitality indicators within OSC software development practices. Frequently mentioned by both, the reviewed scientific literature and the interviewed OS experts, the software development processes within OSCs are a strong indicator for code quality and community vitality. Suitable for all, but the least mature and youngest communities, this fragment can quickly and without great time investment depict a community’s stability. Additionally, communities that improve their development practices are also likely to grow in maturity and thus increase their probability of survival.

software development practices

Figure 21- OSC Health Method Fragment: Software development practices

Alignment and Resource Optimization with Other OSC Stakeholders

The final fragment of the introduced fragment pool aims at reducing OSC monitoring costs by aligning business interests and dividing monitoring efforts among all commercial stakeholders sharing an interest in an OSC. Figure 22 illustrates an approach that systematically searches for fellow commercial stakeholders within a given OSC and attempts to identify mutually beneficial opportunities for all involved parties. Given the fact that open source is equally available to all software integrators, the authors assume that often identified commercial parties don’t engage in direct competition over access to the OSC but are interested in its vitality and software quality. Thus, sharing monitoring costs can significantly contribute to reducing individual project manager workloads. Alternatively, such an approach can lead to a more accurate vitality analysis, as the increased cumulative resources allow for using a larger number of method fragments. Based on the fragment’s nature, the authors find it suitable for all community sizes and maturity levels. Furthermore, given the fragment benefits, the resource investment is marginal and only involves search costs for identifying other stake holders and time investment for initial discussions.

align with commercial stakeholders

Figure 22 – OSC Health Method Fragment: Align goals with existing commercial stakeholders

Method Assembly and Calibration

Depending on fragment selection, the final situational method can contain any combination of index or scale based elements, as well as binary results. Additionally, two fragments with the purpose of either correcting activity indexes for cyclical trends or for optimizing OSC monitoring efforts can be included.

The only fragment required for method assembly is the method base. Consequently, any number or combination of fragments can be utilized within the situational method. Based on the chosen selection rationale, software product managers can group fragments and are free to either aggregate method results or compare them on an individual basis.

Specifically looking at index driven methods and the produced performance indices, they can be aggregated to create a unified OSC performance index. Next to the overall OSC health, individual vitality indicators can be monitored separately in order to identify trends within vitality areas. Furthermore, the approach supports index adjustment by correcting the individual weight of vitality indices according to their importance for a given OSC. Thus, a situational method cannot just be tuned by means of fragment selection but also by means of adjusting fragments to local OSC realities.

Method artifacts generated by non-index fragments lack a distinct scale; however they can be used to register changes in vitality. By individually tracking variations in the likert ratings, each method fragment can be used as an indicator for changes in OSC health. The same applies to the specific case of monitoring software development practices. Despite the binary nature of the results, each can be compared to past practices of the OSC. Thus, added practices can be seen as an improvement to community health or code quality, while the abandonment of practices can be seen as a decline in community performance.

Although not necessary, fragment grouping can contribute to the ease of use of the method and provides it with a clear structure. For illustrative purposes, all presented method fragments have been assembled into a single situational method, covering all vitality aspects introduced within this thesis.

Method Application

In order to present a practical example of the method’s application, a representative OSC was selected for data analysis. Given the Health Analysis Method’s situational nature and the great diversity among OSCs, the main reason for choosing a single OSC was to introduce a set of steps that can be taken in order to apply the introduced method fragments to a realistic assessment scenario. Furthermore, the aim was to present an analysis of a broadly known OSC and thus make the example as relatable as possible for project managers, familiarizing themselves with the new method.

OSC selection was performed based on the popularity index of one of the most widespread public source code repositories, namely Github. The platform hosts a vast amount of open source projects and provides additional features, such as social components, project collaboration tools, issue tracking or code reviews. Furthermore, Github is built around the distributed revision control system Git that provides an additional source of data. Due to Git’s fully distributed architecture, revision data can be directly extracted from every cloned copy of a repository.

Among the highest rated community projects on Github, the jQuery JavaScript library was selected for analysis. Although ranked third in the index (8. May 2013), the first two projects Twitter Bootstrap and Node.js were excluded due to either their direct ties to a commercial entity or the scope of available data and community maturity. An additional reasons for excluding the two most popular projects was the authors’ intentions to choose a typical OSC that had achieved a sufficient degree of maturity and had been exposed to a minimal degree of external influences affecting its development.

The selected OSC is a widely popular JavaScript library first introduced in 2006 by John Resing, a former developer at the Mozilla Foundation. The community’s age, its data availability and its established mature development practices make the community ideal for the exemplified OSC vitality analysis.

The analysis within the following sections is based on community data available on Github, the jQery Git repository, Ohloh (a public directory of open source software owned by Black Duck Software, Inc. ), the GHTorrent project, as well as community pages, blog entries, the mailing list and similar documentation of the jQery project. The jQery Git repository was mined with the open source tool GitStats[1].

The following subsections contain illustrative data that can be used for OSC vitality analysis of the representative OSC. Hereby, the aspect of data availability will be discussed within the respective method sections.

Index based analysis

Looking at external OSC metrics, the relative number of OSC committers, community online popularity and release frequency can be derived from publicly available sources and are thus used for establishing a base performance index in the respective categories.

Figure 23 depicts the number of active contributors per month and thus illustrates changes in commit activity over time. Furthermore, the graph highlights the activity peaks around each major release (see Table 7) and shows the overall growth in activity when the project transitioned to a major release (v. 1.5) in 2010.

number of active contributors

Figure 23 – Sample OSC: Number of unique contributors per month

Although the most recent Github API does offer the opportunity to extract the overall number of downloads, this data attribute is not tracked over time and does not take downloads from other mirrors into account. Unfortunately, the jQery foundation does not provide an aggregate overview of the total number of downloads for historic releases. Thus, this vitality indicator will not be included in this analysis.

jQuery online popularity

Figure 24 – Sample OSC: Online popularity in Google trending topics

A direct approach for assessing an OSC’s online popularity is Google Trends. The tool depicts the overall trend for queries mentioning a particular word or topic. While the use of Google trends can be seen as problematic or potentially inaccurate with keywords that can also be found within unrelated expressions or sentences, this risk is unlikely to be relevant for the unique name of jQery. However, it is noteworthy that a trend can be positive, as well as negative. Thus, a community that has received negative publicity or bad reviews could also appear as a positive search trend. Figure 24 depicts the trend performance for jQery. Hereby, the y- scale represents percent values with 100% set at the highest frequency of occurring searches.

The final vitality indicator within the external OSC metrics fragment is release frequency. Such information is available through a number of sources incl. Github or OS project trackers. For this analysis the data was directly extracted from the project’s community blog. Table 7 depicts a list of all major software releases, as well as the elapsed time in month between each release.

Version number

Release date

Elapsed time in months after since release

2.0.0

18. April, 2013

2 months

1.9.1

04. February, 2013

1 month

1.9.0

15. January, 2013

6 months

1.8.0

09. August, 2012

9 months

1.7

03. November, 2011

6 months

1.6

03. May, 2011

4 months

1.5

31. January, 2011

12 months

1.4

14. January, 2010

12 months

1.3

14. January, 2009

4 months

1.2

10. September, 2007

8 months

1.1

14. January, 2007

5 months

1.0

26. August, 2006

Table 7 – Sample OSC: Release frequency

As suggested by the method fragment, all measured indices are aggregated into one external performance index (Figure 25). For this purpose, each metric has been first recorded with the according activity for each month between Aug. 07 and Apr. 13. Due to the different measuring units, the performance was adjusted to changes in percent, relative to a standardized base value. The index for the number of active contributors per month is operationalized by changes in percent, compared to an average of 8 contributors per month. This value is based on the actual project average for its entire lifespan and has been rounded to an integer value. Release frequency is recorded in relative frequency compared to a release frequency of 2 releases per year or 0.16 releases per month. As online popularity is already measured in percent values, relative to an established base value, the index has been included in the analysis as is.

It is noteworthy, that such an approach is only suitable for a one-time analysis, as the 100 % mark of the popularity index is always adjusted to the number most frequently occurring searches in a community’s history. Thus, should an OSC reach a new record in search frequency, historic data must be adjusted in order to maintain index consistency. Furthermore, the index is only designed to indicate trends and therefore uses average community performance for activity comparison. Although, minimum activity can also be used as a reference point for indicator performance, this was avoided in order to prevent a too striking upward trend on the graph that would make long term-trends less observable.

In addition to the regular aggregate index, two additional indices were added in order to illustrate the impact of weight adjustment of individual variables. The second index highlights the effects of a doubled weighting of the importance of the number of active contributors. The last index illustrates the effect of a reduced (half the influence) weight of online search popularity. All occurring spikes are either linked to activity related to a new version release (see Table 7) or a significant increase in release frequency. Especially the latter is noticeable for the time period of Oct. 12 and Feb. 13. Both spikes are primarily caused by a drastic change in release frequency to one and two months respectively. These releases occurred significantly more frequent than the average of only two new versions per year.

Sample OSC - External Performance Index

Figure 25 – Sample OSC: External performance index

In contrast to the external metrics, data availability for internal OSC performance indicators is limited. One of the reasons for the lack of consistent data is the age of the community. The OSC has undergone a number of hosting transitions for e.g. its forums and unlike the data available via Git repositories, does not provide consistent long term records. Furthermore, the community lacks a dedicated mailing list and often refers support questions to other platforms, such as the stackoverflow [2] community, which covers a large number of jQuery related support discussions.

However, the jQuery foundation does list short term activity data for forum activity and ticket based issue tracking for a time span of the last week or month. Therefore, suitable data could be accumulated by means of long term data mining.

The only vitality indicator covering the entire OSC life span, selected for evaluation, is the number of code contributions. Hereby, the indicator serves two purposes by both, highlighting the number and frequency of commits, as well as visualizing OSC activity cycles over a time span of 6 years (see Figure 26).

The figure highlights a seasonally cyclical activity pattern with peaks towards the end and the beginning of fife out of six years and low activity levels towards the summer periods. The only exception to the pattern occurred in summer 2012 when the community released a new library version outside its typical pattern.

Comunity activity cycles

Figure 26 – Sample OSC: Activity cycle in commits per month

By combining the external performance indicators with the available internal data for commit frequency, an aggregate index can be generated that incorporates all long term community activity into a single Index. Figure 27 depicts a unified index that combines the external vitality indicator of Figure 25 with the OSC commit frequency introduced in Figure 26.  Hereby, the commit frequency is expressed in a relative percent value based on an average commit frequency of 60 commits per month. Furthermore, all of the aggregated variables have been included with equal relative weight, as the internal OSC vitality indicators are only represented by one variable. Otherwise, commit frequency would have an unproportional influence on the aggregate index. In addition to the ordinary aggregate index, a second index that was corrected for peak activities in winter periods and low activities during summer time was added. The activity corrections apply to three winter and summer months, which have been reduced or increased by 20% in significance respectively.

Aggregate OSC Performance Index

Figure 27 – Sample OSC: Aggregated performance Index and linear trend lines

It is noteworthy that when evaluating the graph, measurement consistency is most important while exact performance values lack significance. All index activity values are measured in relation to an established base value and could indicate stronger or weaker growth if the base is modified.

Questionnaires and Likert scales

Following the primarily data driven analysis of OSC activity, this section presents a more qualitative view on the sample OSC’s practices and covers aspects of code and document quality, as well as core developer reputation.  Given the illustrative nature of this example no community members were interviewed for OSC vitality assessment. Instead, the purpose of this sub-section is to introduce a number of techniques and likert scales that illustrate a possible application of the OSC Health Analysis method.

The first fragment within this category deals with OS code quality and requires pre-defined quality criteria in order to rate the component in question. While code quality ratings can be seen as controversial and heavily depend on the programming language, the reviewers’ preferences or the application’s purpose, generally acceptable criteria include the architecture of the application, DRY principles (don’t repeat yourself), component modularization, coding practices for e.g. documentation or the use of best case software patterns.

In order to allow for simple comparison of changes in quality, a five level likert scale can be utilized. In the case of the Sample OSC, the scale can be ranked as follows:

  1. Highly structured and optimized code
  2. Well-structured code with little repetition
  3. Moderately structured code without best practices
  4. Manageable code structure with a number of complex dependencies
  5. Low code quality with applying bad practices

Although a code review by a development team can bring valuable insights into a community’s suitability for a given project, it is a highly time consuming and expensive approach. Alternative sources for OSC quality assessment can be aggregated online reviews, written by OSC users on platforms, such as ohloh.net. Due to the great popularity of the sample OSC, the library had received a total of 799 reviews (state 17. May), rating it with an average of 4.75 out of 5 stars. Applied to the above scale, this rating would rank the OSC’s code as highly structured and optimized.

Analogue to code quality, the purpose of the documentation quality fragment is to create a likert scale that describes the state of an OSC’s documentation and consequently to rate its performance over time. In contrast to code quality, documentation quality is less expensive to evaluate and requires little content specific expertise. Hereby, quality influencing factors can be the degree of detail, the presence of code examples, update frequency or ease of use and accessibility. A sample scale for documentation quality can be structured as follows:

  1. Very well-structured, exhaustive and up to date documentation
  2. Well-structured,  exhaustive and mostly up to date documentation
  3. Mostly well-structured but partially outdated documentation
  4. Only partially structured or outdated documentation
  5. Badly or unstructured documentation

In case of the example OSC, the authors rate the community with the highest quality level. It provides both, a very well-structured documentation that is fully up-to date, as well as an additional dedicated learning center, designed to quickly familiarize new developers with the library’s capabilities, as well as for introducing examples of new code features.

The third method fragment within the category deals with core developer reputation. Starting with a general ranking, the following scale can be used for assessing top OSC developers:

  1. Widely recognized expert known beyond his/her domain of expertise
  2. Recognized domain expert
  3. Developer with acknowledged expertise (within the OSC)
  4. Active contributor to the OSC
  5. Less known developer

An exemplary approach for core developer identification is Git repository data mining. Indicators, such as the number or share of repository commits can be used as an approximation for the significance of a developer within the community. An analysis of the sample OSC revealed the ranking depicted in Table 8. Evidently, the most contributions to the OSC have been made by John Resig. Resig is widely considered the founder of the community and enjoys broad recognition within a number of technical domains, related to web technologies. Furthermore, Resig is a familiar actor within the broader open source community and counts the Mozilla Corporation as a former employer. Based on these observations, it is the authors’ belief that he can be considered for the highest rating within the above scale. Although less known than Resig, other top contributors within the community enjoy a solid reputation and can be ranked within the second or third categories of the above likert scale.

Name

Number of commits

(% of total contributions)

Number of lines of code added

Number of lines of code removed

John Resig

1714 (32.99%)

106571

92494

Dave Methvin

478 (9.20%)

6851

7260

Timmy Willison

404 (7.78%)

6034

4857

Julian Aubourg

330 (6.35%)

17875

15188

Jörn Zaefferer

327 (6.29%)

22681

21149

Rick Waldron

308 (5.93%)

6734

12142

Brandon Aaron

250 (4.81%)

11102

5408

Ariel Flesler

200 (3.85%)

18053

3317

Richard Gibson

138 (2.66%)

5401

4751

Oleg Gaidarenko

104 (2.00%)

1927

1387

Table 8 – Sample OSC: Core developers ranked my number of commits

The final proposed method fragment of this sub-section deals with core developer motivation. It suggests to identify community leaders, design a questionnaire to assess their motivation for contributing to the community and consequently perform interviews to acquire the necessary data. The approach for core developer identification overlaps with the activities of the core developer reputation review and thus does not need to be repeated at this step. The results with the leading OSC contributors are depicted in Table 8.

Given a five-stage likert scale, ranking from “does not apply” to “fully applies”, a set of suitable interview question can be introduced, asking to what degree the following statements apply:

  • I am motivated to contribute to the community.
  • I have sufficient time to participate in community activities and to invest development time.
  • I perceive the contribution to the community as rewarding and feel appreciated
  • I feel that the community is moving forward in a good direction.
  • I feel that I am being supported by fellow OSC developers.
  • I feel that I am being supported by all community members.
  • I feel that the created software is valuable by the community.
  • I feel that my participation in the community is improving my professional skills.

Hereby, the purpose of the above questions is to assess the core developer’s motivation to further lead the project. Ideally the measurement should be repeated over time in order to identify on positive or negative trends within the core team. The results and implications for the entire OSC can be reviewed in light of the dependencies of the community interaction model, introduced in section 4.02.

Community practices

Given the age and acquired level of maturity, the sample OSC follows all of the suggested software development practices. In particular a number of development processes have been consistently standardized across jQery core, as well as other jQuery projects, such as jQery UI or jQery Mobile. An overview of the evaluated activities is depicted in Table 9.

OSC Development Practice

Presence

Defined Release Schedule yes
Systematic Testing Methods yes
Formal Requirements Management yes
Standardized Software Development Processes yes

Table 9 – Sample OSC: Software development practices in place

Suitable partners for business alignment

The final fragment of the OSC Health Analysis Method does not deal directly with vitality assessment but with business alignment with other stakeholder, sharing an interest in the same OSC. The purpose of the associated activities is to optimize resource allocation for vitality assessment, which can either be used to reduce monitoring costs or to increase measurement accuracy or scope.

Given the domain of the sample OSC, the primary target groups for monitoring alignment are software developers with a focus on web development solutions. This can include providers of content management systems, web-framework, as well as front-end developers. Alternatively, smaller organizations that strongly depend on the library can also be considered suitable partners for shared monitoring activities.

In the case of the sample OSC, it is particularly easy to find suitable candidate organizations. The jQery foundation, which is the umbrella organization for jQuery core and other jQuery based OS projects, publicly lists corporate foundation members. Among the listed gold, silver and bronze partners a number of non-profit, as well as commercial organizations can be found. All of them can be considered strong supporters of jQuery.