Do Missing Link Community Smell Affect Developers Productivity: An Empirical Study

Article history: Received 05 June 2021 Revised 24 June 2021 Accepted 20 July 2021 Published online 17 August 2021 Missing link smell occurs when developers contribute to the same source code without communicating with each other. Existing studies have analyzed the relationship of missing link smells with code smell and developer contribution. However, the productivity of developers involved in missing link smell has not been explored yet. This study investigates how productivity differs between smelly and non-smelly developers. For this purpose, the productivity of smelly and non-smelly developers of seven open-source projects are analyzed. The result shows that the developers not involved in missing link smell have more productivity than the developers involved in smells. The observed difference is also found statistically significant. This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/).

Missing link smell occurs when developers contribute to the same source code without communicating with each other. Existing studies have analyzed the relationship of missing link smells with code smell and developer contribution. However, the productivity of developers involved in missing link smell has not been explored yet. This study investigates how productivity differs between smelly and non-smelly developers. For this purpose, the productivity of smelly and non-smelly developers of seven open-source projects are analyzed. The result shows that the developers not involved in missing link smell have more productivity than the developers involved in smells. The observed difference is also found statistically significant.
This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/).

Keywords:
Community Smell Empirical Study Missing Link Smell Productivity counterpart. Then, the developers involved with each smell are identified by extracting the instance of smell. Then, the developers are categorized into smelly and non-smelly developers. Besides, the productivity of individual developers is measured by the number of changes per active day. Finally, statistical analysis is performed on the productivity of smelly and non-smelly developers.
The study results show that there is a significant difference between the productivity of smelly and non-smelly developers. The average productivity of non-smelly developers is significantly higher than smelly developers.

A. Missing Link Community Smell.
Missing link community smell refers to when two developers collaborate in a part of source code but do not communicate with each other [3]. This smell can be detected by finding those collaborations for which no communication is found in the defined communication channel, e.g., mailing list. The occurrence of missing link smell is described below with a sample software development community.
A sample software development community of six developers is illustrated in Figure 1. The example is taken from [17]. Developers are connected through the solid line in the network if they communicate with each other. The dashed lines connect developers to the source code on which they work. The development community can be used to generate two types of Developer Social Network (DSN), such as communication DSN and collaboration DSN. Firstly, the communication DSN can be generated from Figure 1 by considering only communication links, which are displayed in Figure 2. Then, the collaboration network can be generated by linking developers who work in the same part of the source code. Figure 3 represents the collaboration DSN for the considered development community. For example, developer A and developer B work in the same source code file (Figure 1), so they are connected in the collaboration DSN ( Figure 3).
Missing link smell now can be detected by comparing the collaboration network with the communication network. It can be easily observed that one link, EF, in the collaboration network ( Figure 3) does not have the corresponding counterpart in the communication network ( Figure 2). Hence, it represents an instance of a missing link smell between developer E and developer F.
In recent times, community smells are studied to incorporate the organizational and social aspects of the software development community in software engineering research. Some studies [1][6]  [5] and predicting [9][10] [11] these smells in open-source projects. Besides, a few studies investigated the relationship and the impact of community smells on different software artifacts such as code smell and bug [2][13] [18].
The concept of community smell is first introduced in an industrial case study [1]. The authors defined nine different community smells and proposed a list of possible mitigations of these smells, such as learning community, cultural conveyor, stand-up voting, etc. Later, Magnoni [3] proposed the identification pattern of four community smells and developed a tool named Codeface4Smells (https://github.com/maelstromdat/CodeFace4Smells), extending an existing socio-technical network analysis tool Codeface (http://siemens.github.io/codeface). The enhanced tool detected both communities smells and code smells in an automated approach [7]. Besides detection, a few studies [9][10] [11] tried to predict the community smells. Palomba et al. [9] worked on the prediction of community smells from socio-technical factors. Almarimi et al. [11] also built a model to predict community smells using Ensemble Classifier Chain (ECC) and Genetic Programming (GP) techniques.
Tamburri et al. [5] explored the diffuseness of community smells and developer's perception about the presence and effect of community smells. The authors found that the diffuseness of community smells high in open-source projects, and developers recognized community smells as an obstacle that may hinder software evolution. The authors also analyzed the relationship between community smells and different socio-technical factors, such as socio-technical congruence, turnover, and truck factor.
Catolino et al. [14] investigated the role of gender diversity and women's participation in community smells. The authors found that gender-diverse teams had fewer community smells than non-gender-diverse teams, and the involvement of women in teams can reduce the number of community smells. In another study, Catolino et al. [16] suggested some refactoring strategies to deal with community smells in practice, such as mentoring, creating communication plans, and restructuring the development community. In a recent study, Catolino et al. [8] investigated the impact of socio-technical factors on community smells and found that communicability is essential in most cases to prevent the increase of community smells.
Ahammed et al. [18] investigated how missing link community smell was related to the introduction of bugs, i.e., Fix-Inducing Changes (FIC) in the system. The authors found that the number of smelly commits (developers involved in community smells) and FIC commits are positively correlated. The authors also found that the severity of bugs was most significant that were introduced by developers involved in missing link smells. In another study [17], the same authors made an exploratory study on seven projects from Apache on the engagement of developers in missing link community smell. They found that the contribution activities of developers are positively correlated with their involvement in missing link smell.
The existing studies investigated the impact of community smell on technical artifacts such as code smell intensity [2] or bug [13] by employing a community-aware prediction model. Palomba et al. [2] conducted an empirical study on nine open-source projects. They also measured how community smells impact the code smell intensity by proposing a code of smell intensity prediction model. They found that community smells contribute to the intensity of code smell. Eken et al. [13] conducted an empirical investigation on ten open-source projects to find how community smells can predict bugs. The authors found the impact of community smells as a contributing factor in predicting bug-prone classes. The current study aims at understanding the impact of community smell from the perspective of developers on how they perform in the software project. The study performs an empirical investigation on 1004 developers from 7 open-source projects where the projects are divided into a six-month window. The study reveals how missing link community smell affects the productivity of developers in open-source projects by measuring the productivity in terms of the number of changes per active day.

B. Proposed Framework
This study aims to understand how missing link smell affects the productivity of developers. First, missing link smells are detected from the project repository and mailing list. Then, the developers were involved with extracted missing link smells. Thus, the developers of the project can be divided into two categories: smelly and non-smelly developers. Next, the number of changes made by individual developers to the repository is computed. The productivity of individual developers is calculated as the number of changes per active day. Finally, the productivity of smelly and non-smelly developers is compared to identify the effect of missing link smell. The overview of the methodology is illustrated in Figure 4.

1) Data collection
The data is collected from 7 open-source projects for the analysis of the study. The choice of these projects is guided by the availability of source code and developer mailing list archive. The source code of the selected projects is available in Github, and the development mailing list archive is available in Gmane, a mailing list archive. The name of the projects, source code repository, number of commits, number of files, lines of code, analyzed periods, project ages, number of developers are reported in Table 1. The analyzed projects have different sizes in terms of KLOC (ranging from 483 to 1392 KLOCs) and different community sizes (from 44 to 438 developers).

2) Missing link smell detection
Missing link smells are detected in the projects according to the identification pattern introduced by [3]. First, the source code repository of a project is cloned locally from Github (https://github.com/), and the mailing list archive is downloaded from Gmane (http://gmane.io/). The projects are analyzed using a six-month window. For each window, a collaboration DSN is generated by analyzing the project's repository. All commits are analyzed. Developers who contribute to the same part of source code within that window are connected through an edge. Next, a communication DSN is constructed analyzing the mailing list of the project. All emails in the mailing list are analyzed, and developers who replied in the same email within a given window are connected. Finally, collaboration DSN and communication DSN are compared to find missing link smell. For each edge in the collaboration network, the corresponding communication part is searched in the communication DSN. Any edge that is present in collaboration DSN but absent in communication DSN is identified as missing link smell. The steps mentioned above are performed on selected projects using Codeface4Smells tool. The tool preprocesses the provided artifacts, i.e., source code repository and mailing list, and generates developers' collaboration and communication network [3]. The generated networks are then used to detect the occurrence of missing link smells. The tool returns the list of missing link smells along with the corresponding developers involved with these smells for each evaluated project. The developers involved in at least one missing link smell are identified as smelly developers, and the rest are considered non-smelly developers.

3) Measuring productivity
The productivity of an individual can be measured as the amount of output generated per unit time [19]. The most straightforward approach to measure the contribution of a developer is to count the number of commits. However, assessing the contribution of developers using the number of commits is not a viable measurement because all commits are not equal in size. Therefore, the size of commits should be taken into account while measuring the developer's contribution. The total of modified lines in a commit is used to measure the size of that commit. The previous study also used a similar approach to measure the developer's contribution [20].
The contribution of a developer is extracted from the project repository. First, all the commits of an individual developer and all the files modified in these commits are identified. Then the number of changes, i.e., the sum of added and deleted lines, in the modified files are calculated. Then, the total number of changes is computed as the sum of all changes of a developer. Next, the number of active days of the individual developer is measured by analyzing the commit history of that developer. The number of active days is the count of days the developer made at least one commit in the repository. Then the productivity is calculated as the number of changes per active day by a developer. Equation (1) shows how productivity is measured.

4) Data analysis
This study aims at understanding whether smelly developers exhibit different productivity compared to non-smelly developers. The following null hypothesis is formulated to investigate the impact of missing link smell on developers productivity:

H0: The productivity of smelly and non-smelly developers is not significantly different.
To attempt rejecting H0, Wilcoxon Rank Sum Test, a non-parametric statistical test, is used. This test can determine whether the difference of two ordinal or interval non-parametric distributions is significantly different. The test statistic (W) indicates a significant difference between two sample sets if the ranks of the two sets significantly differ. The test is used to assess whether the productivity of developers differs between smelly and non-smelly developer groups. The test will also reveal whether the observed difference between the productivity of smelly developers and non-smelly developers is statistically significant. The result is considered significant if the p-value is less than 0.01.

III. Results and Discussions
This section presents and discusses the results obtained through the experimentation on the selected projects. The experimentation is performed according to the methodology stated above. The resulting dataset consists of 1004 developers from seven different projects. The number of smelly and nonsmelly developers of all evaluated projects is reported in Table 2. The total number of smelly developers is 468, and the number of non-smelly developers is 536 in the evaluated projects. Figure 6 illustrates the project-wise ratio of smelly and non-smelly developers.
The productivity of both smelly and non-smelly developers is measured; the number of changes per active day. Thus, the dataset contains two developer groups, i.e., smelly and non-smelly, with their corresponding productivity value. Then the Wilcoxon Rank Sum Test is performed to assess the null hypothesis, H0, which states the productivity does not differ between these two groups. The p-value obtained from the test is used to accept or reject the null hypothesis. The mean productivity of these two groups is also calculated.
The productivity of smelly and non-smelly developers is reported in Table 3. The mean productivity of smelly developers is 333.90, whereas the mean productivity of non-smelly developers is 445.84. The observed difference is identified significant from Wilcoxon Rank Sum Test (W = 72374, p-value < 0.01). The p-value indicates that the null hypothesis H0 can be rejected. Thus, the result implies that the productivity of smelly developers and non-smelly developers is significantly different. The productivity (mean) of non-smelly developers is significantly higher than smelly developers. The result suggests that the developers involved in missing link smell show lower productivity in terms of the number of changes per active day than the developers who are not involved in missing link smell. These results indicate that missing link smell affects the productivity of developers negatively. The lower productivity of developers can increase the cost of the software project. Hence, missing links should be monitored carefully, and steps are taken to mitigate these smells if necessary.

IV. Conclusion
This study investigates the effect of missing link smell on developers' productivity. The productivity of 1004 developers from seven open-source projects is analyzed. Missing link smells are identified in these projects, and the developers are categorized into two groups, i.e., smelly and nonsmelly. Productivity is measured as the number of changes performed by a developer per active day.
The Wilcoxon Rank Sum Test result shows that the productivity differs significantly between smelly and non-smelly developers. The developers who are not involved in any missing link smell show higher productivity than the developers involved in smell. The result suggests that missing link smells should be taken care of to manage development productivity effectively. Missing link smell should be monitored, and necessary steps should be taken to mitigate this smell to maintain productivity and software cost.
The missing link smells detected by Codeface4Smells are directly included in the study without further verification. Moreover, this tool uses a mailing list to generate the communication network as the source of communication data. The result can be different if other communication channels exist, such as Skype and Slack. However, according to contribution guidelines of evaluated projects, a mailing list is the primary communication channel in these communities.
In the future, more open-source projects can be analyzed to generalize the result. Moreover, other types of community smell such as Organizational Silo, Radio Silence can also be considered to see their effect on productivity.