Abstract

Identifying important nodes in complex networks is essential in disease transmission control, network attack protection, and valuable information detection. Many evaluation indicators, such as degree centrality, betweenness centrality, and closeness centrality, have been proposed to identify important nodes. Some researchers assign different weight to different indicator and combine them together to obtain the final evaluation results. However, the weight is usually subjectively assigned based on the researcher’s experience, which may lead to inaccurate results. In this paper, we propose an entropy-based self-adaptive node importance evaluation method to evaluate node importance objectively. Firstly, based on complex network theory, we select four indicators to reflect different characteristics of the network structure. Secondly, we calculate the weights of different indicators based on information entropy theory. Finally, based on aforesaid steps, the node importance is obtained by weighted average method. The experimental results show that our method performs better than the existing methods.

1. Introduction

Complex network is playing an important role in our daily life. People communicate with each other and stay in touch with old friends through online social networks. The Internet connects all the world, so nowadays information spreads faster and wider than before. Electricity companies build their own networks to provide electricity for production and living. Policemen cooperate through their inner networks in catching criminals. However, identifying important nodes in complex networks is a critical issue in various situations. For example, if the important nodes are well isolated during disease transmission, the epidemic would be controlled effectively. In traffic networks, we can ease traffic congestion by taking corresponding measures to split traffic flow in certain important nodes. However, how to evaluate node importance is not an easy task, especially if there exist layered but complicated relationships instead of a flattened hierarchy. A typical example is mobile edge computing network, where smaller edge clouds connects with each other, but at the same time follow arrangements from larger edge clouds. While mobile users may connect to and accept service from both of them [1].

To solve this problem, many researchers have proposed different methods to identify important nodes in complex networks [2]. Considering the local information of nodes and their neighbors, degree centrality [3], semilocal centrality [4], and k-shell decomposition [5] are proposed to characterize the node importance. But it does not take into account the layer information among nodes of mobile edge computing network. Because these indicators only consider the local information of nodes, the calculation complexity is low. However, they cannot accurately reflect the characteristics of the whole network. To consider more global characteristics of the whole network, closeness centrality [6], betweenness centrality [7], etc. are proposed. These indicators consider paths and information flows. Therefore, these indicators could reflect more characteristics about the whole network structure. However, the calculation complexity is high. To reduce the time complexity, some researchers proposed methods that divide networks into several parts, such as community-based methods [8] and cluster-based methods [9]. Besides the aforesaid methods, researchers also proposed methods considering not only the number of neighbors but also the importance of neighbors [1013]. These methods often iterate step by step to obtain the steady result of each node.

From the above discussions, we can find that most of the indicators listed above could only reflect one characteristic of the network and cannot produce a comprehensive evaluation. To make up for it, we can combine several indicators and assign different weights to different indicators to obtain better evaluation result. However, the weight assignment is usually selected based on the subjective experience of the researchers instead of enough scientific basis, which has great possibility of leading to inaccurate evaluation results.

In this paper, we propose an information entropy theory-based self-adaptive node importance evaluation method (EBSAM), which could evaluate the node importance objectively by adaptively assigning weights to different indicators. Firstly, we select four indicators to evaluate node importance separately based on complex network theory. Secondly, the weights of different indicator are calculated based on the information entropy theory. Finally, the four indicators are combined together to indicate the node importance. So the proposed EBSAM can combine the best one of other node importance indicators and compute the weights of them to create a more comprehensive indicator. To study the effectiveness of our method, we conducted experiments on three different networks and compared the results with other methods. The results show that the proposed method performs better than the existing methods and improves the evaluation accuracy.

The rest of the paper is organized as follows. Section 2 gives a brief overview of the related work. Section 3 illustrates the problem definition and other basic preliminaries. In Section 4, we propose the entropy-based self-adaptive evaluation method and explain the technical detail of it. The comparative simulation experiments followed by the result analysis are given in Section 5. Section 6 concludes this paper and points out the future work.

Researchers have proposed many methods to evaluate the node importance from different perspectives. In this section, we look into some of the most recent and important research works done on node importance evaluation in complex networks.

Liu et al. [14] proposed a node ranking method based on the importance of lines. Firstly, the proposed method calculates the importance of lines between nodes with their topological properties. In addition, the contribution of each node to the line importance is recorded. The final ranking result is a combination of the node degree and its contribution to the line’s importance. Important bridge nodes could be well identified with lower computational complexity. The proposed method performs better than current single local centrality measures, but still does not consider enough global information for more accurate evaluation. Hu et al. [15] applied the Locally Linear Embedding (LLE) algorithm [16] in evaluating node importance. LLE, which is often used in machine learning, is a nonlinear dimensionality reduction technique. In order to identify the important nodes in a complex network, several centrality measures have been proposed. The input of the algorithm is a matrix constructed by calculating the centrality measures of the nodes in the network. However, due to the limitation of LLE, this algorithm has some requirements for the distribution of the input data. Xu et al. [17] proposed a comprehensive node importance evaluation approach by classifying nodes into several types according to their functions in the network. Different measure indices are applied to evaluate the importance of different types of network nodes. The paper takes the power transmission grids as example and divides nodes into three types: power supply node, connection node, and terminal load node. For each type of node, the ranking result is obtained based on different centrality measures according to their function in the network. Although this method could evaluate node importance precisely, it is only applicable when nodes in the network could be divided into several different types. For networks where the functionality of node is hard to distinguish, the method performs badly. Zhang et al. [18] proposed a node importance evaluation method that combines betweenness centrality and closeness centrality. They believe that two types of factors determine the importance of nodes. The first factor is its location in the network, and the second factor is the contribution of its neighboring nodes. Betweenness centrality has an important impact on the location of a node, and closeness centrality could determine the contribution of neighboring nodes. The final node importance is a plus of the two factors. Pinget al. [19] believe that the importance contributions from both the adjacent and nonadjacent nodes have an important impact on the node importance. They divide nodes into different layers according to their distance with the evaluated node. In addition, two parameters are defined to indicate the dependence strength between two nodes. The contribution probability from one node to another is denoted by the importance correlation parameter. The impact of the layer on the dependence strength is reflected by the strength correlation parameter. The final result combines both the importance of the evaluated node and the contribution of other nodes in the network. The above methods mainly exploit the local information or global information to evaluate the node importance.

Yu et al. [20] evaluate the node importance considering both the factors of the node closeness centrality degree and the node degree. The global importance of nodes is represented by closeness centrality. The local importance of nodes is characterized by the importance contribution between adjacent nodes. Therefore, both local attributes and global attributes are considered during the node importance evaluation process. Hu et al. [21] proposed a method that combines the k-shell decomposition algorithm with the community centrality. The method considers not only the local information of the node but also the community structure it belongs to. The final result is a combination of these two indicators. Different weights are assigned to the two indicators. However, the weight is set based on the people’s personal experience on network structure. Therefore, the evaluation result is very subjective. Zhang et al. [22] proposed a new algorithm combines betweenness centrality and Katz centrality. The proposed method comprehensively considers both the local node importance and the global node importance. It overcomes the limitations of betweenness centrality for only considering shortest paths. In addition, it overcomes the limitations of Katz centrality for local optimum. However, the weights of the two indicators are selected by conducting amounts of experiments on the dataset with different weight values. Apparently it is not a good way to determine the weight value by conducting lots of experiments. Yang and Xie [23] proposed a node importance evaluation method by using the multiobjective decision method. They select several different representative indicators. The weights of the indicators are calculated based on Analytic Hierarchy Process. Each node in the network is regarded as a solution, and different indicators of each node are regarded as the solution properties. The evaluation result is obtained through calculating the closeness degree of each node in the network to the ideal solution. In this method, the weights of different indicators are calculated using Analytic Hierarchy Process. Therefore, the accuracy is highly dependent on the researchers’ personal experience. Similarly, Liu et al. [24] proposed a multiattribute ranking method for node importance evaluation in complex networks. They also select four representative indicators and assign the weights by using Analytic Hierarchy Process. The final result is obtained using the Technique for Order Preference by Similarity to Ideal Object (TOPSIS). The method is similar to the method proposed in [23]. The difference between these two methods lies in representative indicators selection. The above methods have the problem that the accuracy is highly dependent on the researchers’ personal experience.

Therefore, how to assign appropriate weights for different indicators in different networks objectively and adaptively is still a problem to be solved. We will address the problem in this paper.

3. System Model and Node Importance Indicators

3.1. The Topology of Complex Networks

The complex networks can be modelled as undirected and unweighted networks. We define an undirected and unweighted network as . denotes the set of nodes in the complex network, and denotes the set of edges in the complex network. is the total number of nodes in the network.

3.2. The Definition of Node Importance Indicators

There are two different types of methods in network node importance evaluation. The first type of methods only considers the local node information, which means that only the node itself and its neighbor’s quantity are considered. The second type of methods considers the hierarchy infrastructure of a network and the position of each node of the network, which means that the global information of a node is considered. To absorb their respective advantages and effectively evaluate the node importance, we adopt two local-information-related attributions and two global-information-related attributions. Degree centrality and improved K-shell decomposition can reflect the local information of a node. Moreover, closeness centrality and betweenness centrality can reflect the global information of a node.

3.2.1. Degree Centrality

Degree centrality [3], namely , is defined as the ratio of the number of edges that connect to a node directly:where is the number of edges connecting to node directly. is the total number of nodes in the network. A larger value of indicates that node has more neighbors. Therefore, can influence more nodes in the network and is more important.

3.2.2. Closeness Centrality

Closeness centrality (CC) [6], is defined to represent the average distance of node to all other nodes in the network. Suppose denotes the length of the shortest path from the source node to the destination node . The average shortest distance from node to all other nodes in the complex network can be calculated by

The smaller is, the more important is. The closeness centrality of node is defined as the reciprocal of :

If there is no path between and , is set to 0. A larger value of indicates that node is closer to the centre of the network. In other words, the position of node is very important in the network.

3.2.3. Betweenness Centrality

Betweenness centrality (BC) [7] is defined to represent the importance of a node in data transmission. Suppose and are two nodes in the network. The betweenness centrality is defined as follows:where denotes the number of the shortest paths from to . denotes the number of the shortest paths (from node to node ) that go through node . A larger indicates that there more shortest paths travel through node . Therefore, is more important in the data transmission process.

3.2.4. Improved K-Shell Decomposition

K-shell decomposition [5] is employed to identify the position of a node. The schematic diagram of K-shell decomposition is illustrated in Figure 1(a). Firstly, remove all nodes whose degree is 1 from the network, and set their value to 1. Repeat this operation until the degree of all nodes in the network is larger than 1. Then set , and do the removing operation continuously until all nodes have been removed from the network. The larger is, themore important the node is in the network.

As can be seen from the definition, K-shell decomposition would assign the same value to all nodes when the network is a Star network or a Tree network. To overcome this challenge, improved K-shell decomposition (IKs) is proposed by Liu et al. in [24]. The process of improved K-shell decomposition calculation is illustrated in Figure 1(b). Firstly, is initialized to 1. Then, all the nodes whose degrees are minimum currently are removed from the network and is increased by 1. Repeat this operation until all nodes have been removed from the network. The improved K-shell decomposition can overcome the limitation of K-shell decomposition and can reflect the characteristic of the network structure more precisely.

4. Our Proposed Method

We illustrate the technical details of our entropy-based self-adaptive node importance evaluation method in this section.

4.1. Attribute Matrix of Nodes

The nodes in a complex networks can be denoted by . The indicators that are chosen to evaluate the node importance are defined as . is the total number of indicators. In our method, , and the attributes of node can be expressed as . Therefore, the attribute matrix is defined as follows:

4.2. Data Normalization

The value of different indicators can vary in different ranges. For example, the value of is a decimal number in while the value of is larger than 1. So the data should be normalized before they are combined together to allow for a uniform measurement. Common normalization methods include decimal scaling, Gaussian normalization, zero-mean normalization, min-max normalization, etc. Min-max normalization method is employed to normalize the attribute matrix, defined as following:

The normalized attribute matrix is as follows:

4.3. Weights Calculation

Introduced by Claude E. Shannon in 1948, entropy is a measure of unpredictability and uncertainty in information [25, 26]. For example, the entropy is zero when we toss a two-headed coin. That is because there is a 100% chance of getting heads. The entropy has a maximum value when we toss a fair coin. Since the chance of getting tails is equal to the chance of getting heads, there is no way to predict what will come next. A smaller value of entropy indicates that there is less useful information content [2731]. In a multiattribute decision-making problem, we need to assign a larger weight to attribute with more useful information rather than the attribute with greater uncertainty. By analyzing the probability distribution of the original data, we could obtain the entropy objectively. Calculating the weight of each attribute based on entropy is more reasonable than setting it subjectively.

In this paper, node importance is decided by four indicators and their weights are obtained based on entropy theory. Suppose the weight of each indicator is expressed as . According to Shannon entropy theory, the entropy of each indicator can be calculated as follows:where is the normalized th indicator value of node . And is the entropy of the indicators.

As mentioned above, the larger the entropy is, the less the useful information contained in the indicator. Therefore, the weight should be smaller. The weight of each indicator is calculated by the following:

We now illustrate the relationship between entropy and weight by taking the campus network of Beijing University of Posts and Telecommunications (BUPT) as an example. The topology of the campus network of Beijing University is illustrated in Figure 2. The dots in Figure 2 denote the main nodes of BUPT campus network. The relationship between entropy and weight is illustrated in Figure 3. As we can see, the larger the entropy is, the smaller the weight is. The smaller the entropy is, the more useful information can be provided by an indicator. Therefore, the indicator with smaller entropy has a larger weight.

4.4. Node Importance Ranking

The node importance is calculated by the following:

The larger is, the more important the node is.

The general node importance calculation and node ranking steps in complex networks is shown in Algorithm 1. Firstly, determine the indicators and calculate the value of the four indicators for all nodes in the complex network. Then we construct the attribute matrix based on equation (5). Thirdly, we calculate the normalized attribute matrix based on equations (6) and (7). Fourthly, we calculate the entropy of each indicator based on equations (8) and (9) (lines 1–11). Then, the weights of all indicators could be obtained based on equation (10) (lines 14–16), and the node importance is calculated based on equation (11) (lines 17–19). Finally, all nodes are ranked based on the node importance (lines 20–21). The head of the node list is the most importance node. The time complexity of our algorithm is , where n denotes the number of nodes in the network.

Input: the normalized attribute matrix
Output: the ranking result
(1)for each in do
(2)  ;
(3)for each in do
(4)   ;
(5)end
(6)  ;
(7)for each in do
(8)  ;
(9)  ;
(10)end
(11)
(12)
(13)end
(14)for each in do
(15);
(16)end
(17)for each in do
(18)  
(19)end
(20)Rank the node list based on
(21)return the ranked node list;

5. Experiments

We conducted the experiments on three real networks and compared the results of our method with the random selection method (Random) and the TOPSIS-RE method in [24]. The experimental result proves that our method performs better.

5.1. Experiment Setup

The selected networks are the campus network of Beijing University of Posts and Telecommunications (BUPT), Shanxi Water Network, and Shanxi Railway Network. First we prove the effectiveness of our method by experimenting on the BUPT campus network. Then we illustrate the experimental results on Shanxi Water Network and Shanxi Railway Network to see how the proposed method works in more complicated cases. The experiment is conducted on a PC with Intel Core i5-3470 3.2 GHz CPU, 4 GB RAM.

TOPSIS-RE extensively employs the Technique for Order Preference by Similarity to Ideal Object (TOPSIS) to evaluate the node importance. The core idea of TOPSIS-RE is to construct a positive ideal object and a negative ideal object from the original data. The positive ideal object is calculated based on the max value of the indicators, and the negative ideal object is calculated based on min value of the indicators. All methods are implemented by using the network analysis software Cytoscape together with Java programming language.

In the experiments, all nodes are ranked based on the node importance. Then, the nodes are removed one by one from the networks according to the ranking results. The Number of Connected Components (NCC) is employed to evaluate the effectiveness of the methods. A connected component of an undirected network is a subgraph in which any two nodes are connected to each other by edges. After we remove one or more nodes in a network, the network will be divided into several disconnected subgraphs. Any node inside a subgraph is reachable from other nodes in the same subgraph. There is no path between two nodes belonging to different a subgraph. NCC is the number of these disconnected subgraphs. NCC reflects the connectivity of a network. The robustness of a network could be measured by calculating the size of the largest connected component after removing a fraction of the nodes [3234]. The number of connected component in a network could reflect its connectivity. A larger value of NCC reflects that the network is divided into more disconnected subgraphs, which indicates the node you remove is more important respect to network connectivity. Therefore, a larger value of NCC indicates a better performance. A node is considered to be more important if more number of connected component increases after it has been removed.

5.2. Experimental Results
5.2.1. Experiment Results on BUPT Campus Network

The topological structure of BUPT campus network is illustrated in Figure 2. The number in each node is just an identity of the node. It does not have any meaning except to identify different nodes. The node can be identified by its number in the graph.

The node importance rank results of EBSAM, TOPSIS-RE, and a random selection algorithm (Random) are illustrated in Table 1. We only list the top 16 nodes in the rank results because the rank results of the rest nodes are the same in EBSAM and TOPSIS-RE.

According to the rank result, we remove the nodes one by one from the network until all nodes have been removed from the network. We calculate the number of connected components in the network after removing a node. The removing process of EBSAM is shown in Figure 4. We list out the topological structure of the network after removing every four nodes. The NCC of EBSAM and TOPSIS-RE and Random methods are shown in Figure 5. As we can see, the number of connected components of Random method is much less than the other two methods. Therefore, Random is less effective in destroying the network by attacking the important nodes. We can also see that the number of connected components of EBSAM is more than TOPSIS-RE. Hence, the connectivity of the network is worse with EBSAM. Attacking the network based on the ranking result of EBSAM is more effective than TOPSIS-RE. That is because we obtain the weight of the four indicators objectively and adaptively other than assign a fixed value subjectively.

5.2.2. Experimental Results on Shanxi Water Network

As shown in Figure 6, Shanxi Water Network plays a vital role in the normal production and living activities. The green line in Figure 6 denotes the water supply network. The Shanxi water network provides guarantee for water demand of north China, and its topological structure is shown in Figure 7. As shown in Figure 7, Shanxi Water Network is composed of 82 nodes. The experimental result is shown in Figure 8. As we can see, the connectivity of the network has been destroyed after the top 50 nodes have been attacked. However, NCC of our method is larger than the other two compared methods. Therefore, the performance of EBSAM is better than other compared methods.

5.2.3. Experimental Results on Shanxi Railway Network

Finally, we conduct experiment on Shanxi Railway Network. As shown in Figure 9, Shanxi Railway Network is a part of the transportation network in Shanxi. It provides great convenience for people’s outgoing and commodities trading. The topological structure of Shanxi Water Network is shown in Figure 10. The experimental result is shown in Figure 11. The network is coming to break down after the top 60 nodes have been attacked. The NCC of Shanxi Water Network obtains the largest ascent with our method. Therefore, EBSAM performs better than other methods.

6. Conclusions and Future Work

In this paper, we proposed an entropy theory-based self-adaptive node importance evaluation method for complex networks. Firstly, we select four centrality measures which can reflect different characteristics of the node as node importance evaluation indicators. Then, we combine them together with appropriate weights calculated by an entropy theory-based algorithm. The algorithm shows a strong adaptability and thus allows be widely implemented in different kinds of networks. In the traditional method, the weights are selected based on the subjective experience of the researchers instead of enough scientific basis, which would lead to inaccurate evaluation results. The proposed method is better because it utilizes entropy theory to calculate the weight of each indicator. A smaller value of entropy indicates that the corresponding attribution contains less useful information. In a multiattribute decision-making problem, we need to assign a larger weight to attribute with more useful information rather than the attribute with greater uncertainty. So with this algorithm, we can better assign proper weight to different attributions. The experimental results on three types of real-world complex networks show that our method performs better with compared methods. Our ongoing research will focus on investigating the effectiveness of our method in more complex environments.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by National Key R&D Program of China (Funding No. 2018YFB1402800) and the Natural Science Foundation of China (No. 61571066).