Skip to the content.

Introduction

The undocumented evolution of a software project and its underlying architecture underscores the need to recover the architecture from the software’s implementation-level artifacts. Despite the existence of various software remodularization techniques, they often suffer from inaccuracies and evaluating their effectiveness is challenging due to the absence of accurate “ground-truth” architectures or reference models. In this paper, we propose Automated Construction of Reference Model (ACRM), an approach for automatically constructing reference models for various software projects using the metadata of all software versions and historical maintenance records. We evaluate ACRM through both quantitative and qualitative analysis. The experiment results provide quantitative validation and show that the generated reference models are reasonable, as confirmed by the relationship between proposed reference models and architecture-level bugs or code smells. We also conduct a qualitative study, involving industrial developers and students, which further validates the generated reference models. The survey shows that, on average, 89% of the participants agree with the reference models generated by ACRM. Moreover, we propose an improved metric, wc2c, which analyze the strengths and weaknesses of different types of software clustering techniques using the proposed reference models of the analyzed software. Finally, we discuss the potential benefits of using ACRM in analyzed projects, particularly in terms of improving software quality, reducing maintenance costs, and enhancing developer productivity.

Studied Subject

ID Project # Versions # Major Versions # Stars KLOC (Avg) # Classes (Avg) Commits
1 Activemq 64 2 1,764 324.9 3,057 11,309
2 Activemq-artemis 32 2 602 518.3 3,324 9,680
3 Aeron 86 2 5,065 51.1 330 15,842
4 Alluxio 62 3 4,613 248.0 916 30,937
5 Apktool 34 2 10,220 16.6 179 1,648
6 Assertj-core 50 3 1,756 109.9 2,600 2,870
7 Atmosphere 204 3 3,430 40.6 259 5,931
8 Atomix 95 3 1,901 55.6 619 4,265
9 AxonFramework 99 4 2,020 93.0 724 5,951
10 Beam 83 2 3,998 389.6 1,063 27,132
11 Bisq 86 2 3,102 111.1 892 11,168
12 Byte-buddy 202 2 3,485 117.0 581 5,200
13 Calcite 52 2 1,894 211.5 869 4,175
14 Camel 154 3 3,242 680.0 7,981 45,096
15 Cas 218 4 7,620 91.1 1,219 16,869
16 Cassandra 241 4 5,950 189.2 775 25,297
17 Conversations 215 3 3,541 54.6 150 6,274
18 Cxf 153 2 642 527.7 4,618 15,722
19 Dbeaver 108 4 13,652 286.0 2,233 16,052
20 Debezium 73 2 3,265 75.5 363 3,125
21 Discovery 76 3 2,954 17.4 289 2,403
22 Dropwizard 147 3 7,657 44.0 509 5,430
23 Eclim 76 2 1,026 33.2 326 4,849
24 Flink 101 2 13,149 698.3 4,037 22,170
25 Fresco 40 2 16,207 89.2 547 2,531
26 Grakn 45 2 2,107 76.6 570 4,291
27 Guacamole-client 33 2 1,004 19.5 281 5,378
28 Hadoop 293 4 10,489 972.6 1,784 23,874
29 Hawtio 137 2 1,138 63.3 199 8,803
30 Hive 40 2 3,174 850.3 2,345 14,501
31 Java-tron 51 3 2,380 80.2 849 14,129
32 karaf 82 3 480 80.0 655 8,197
33 Maxwell 170 2 2,141 68.8 123 3,110
34 Nifi 88 2 2,066 60.1 693 5,286
35 Okhttp 95 4 37252 50.3 167 4645
36 Openapi-generator 53 3 5,446 374.2 542 14,218
37 Orientdb 157 3 4,154 368.1 2,329 19,352
38 Pdfbox 52 2 1,162 134.7 939 8,962
39 Pmd 70 2 2,887 184.3 1,415 16,532
40 Powermock 42 2 3,121 36.8 590 1,607
41 Redisson 163 3 13,242 74.7 486 5,675
42 Rest-assured 56 3 4,748 20.0 180 1,959
43 Speedment 67 2 1,832 95.3 1,537 4,674
44 Spotbugs 41 2 1,894 227.6 1,891 16,206
45 Spring-framework 175 3 37,411 502.5 3,773 20,896
46 Spring-security 143 4 4,843 145.0 1,231 8,732
47 Storm 33 2 6,078 160.0 920 10,316
48 Testcontainers-java 73 2 3,805 8.3 175 2,008
49 Tika 56 2 1,002 82.0 526 4,747
50 Traccar 31 2 2,392 25.9 415 6,227

Empirical Analysis

dl Empirical Analysis

Reasonability of Reference Model (RQ1)

Quantitative evaluation

dl Statistical results for the location weight distribution of classes of the reference model, number of bugs, and architectural smells.xlsx

dl Spearman rank correlation between the location weights of classes in reference models and architectural smells or bugs.xlsx

Qualitative evaluation

dl Questionnair.zip

Applicability of Reference Model (RQ2)

dl wc2c_cvg results.xlsx

dl c2c_cvg.results.xlsx