Abstract

This article proposes the use of Wi-Fi ToF and a deep learning approach to build a cheap, practical, and highly-accurate IPS. To complement that, rather than using the classic geometrical approach (such as multilateration), it uses a more data-driven approach, i.e., the location fingerprinting technique. The fingerprint of a location, in this case, is a set of Wi-Fi ToFs between the target device and an access point (AP). Therefore, the number of APs in the area dictates the set size. The location fingerprinting technique requires a collection of fingerprints of various locations in the area to build a reference database or map. This database or map contains the information used to carry out the main task of the location fingerprinting technique, namely, estimating the position of a device based on its location fingerprint. For that task, we propose using a fully connected deep neural network (FCDNN) model to act as a positioning engine. The model is given a location fingerprint as its input to produce the estimated location coordinates as its output. We conduct an experiment to analyze the impact of the available AP pair in the dataset, from 1 unique AP pair, 2 AP pairs, and more, using WKNN and FCDNN to compare their performance. Our experimental results show that our IPS, DeepIndoor, can achieve an average positioning error or mean square error of 0.1749 m, and root mean square error of 0.5740 m in scenario 3, where 1–10 AP pairs or the raw dataset is used.

1. Introduction

The goal of 5G and IoT applications is to improve people’s daily lives by transforming a variety of things from conventional to intelligent [1, 2]. Through efficient packet radio access and adjustable bandwidth, it will offer better data speeds and reduced latency [3]. Among them, the deep learning theory is considered one of the most promising techniques to tackle tremendous highdimensional data [4]. Many of those applications require location-related information to deliver their services. The majority of location-based services (LBSs) for outdoor environments are possible due to GNSSs and Global Navigation Satellite System [57] and Global Positioning System (GPS) [8]. The state of the art of outdoor positioning technologies nowadays can be considered mature [9] and sufficient in terms of fulfilling the related service requirements such as QoS [10] and user mobility [11] that depends on the network architecture used [12]. Unfortunately, that is not the case for indoor scenarios. Localizing an object or route indoors using GPS is usually not feasible due to the loss of the signal emitted by its satellites [13]. The complexity of indoor environments with walls and various objects contributes to this phenomenon. Regarding the numerous potential applications that can be enabled, e.g., indoor wayfinding, asset tracking, and crowd monitoring; it is unfortunate that there is no standardized solution for indoor positioning systems (IPSs) yet [5].

However, the topic of indoor positioning solutions has gained substantial interest among industries and academia [1417]. The current landscape of IPS underlying connectivity technologies mainly consists of Bluetooth, Wi-Fi, Zigbee, RFID (radio frequency identification), and UWB (ultra-wideband). Each of these technologies comes with its characteristics in enabling an IPS. The characteristics of each mentioned technology in enabling an IPS are listed in Table 1.

In the case of where the users are people, Bluetooth and Wi-Fi are typically the preferred option over the other technologies as both are available in most smartphones nowadays. Regarding the deployment cost, Wi-Fi can be preferred over Bluetooth as the deployment of Wi-Fi access points (APs) in indoor facilities are more common than Bluetooth beacons. It makes deploying a Wi-Fi-based IPS cheaper than a Bluetooth-based IPS since there is no need to implement new infrastructure in the area. On the UWB side, it leads in terms of accuracy (see Table 1).

However, the availability of UWB in smartphones is not very common yet besides its high deployment cost [22]. This makes a UWB-based IPS not as practical as either a Wi-Fi-based IPS or a Bluetooth-based IPS. Although the building blocks of making an IPS have been available, realizing a cheap, practical, and highly-accurate IPS remains a challenge [2326]. It proposes DeepIndoor, a Wi-Fi-based IPS utilizing the time of flight of Wi-Fi signals and a deep learning approach. DeepIndoor leverages the advantages of a Wi-Fi-based IPS in terms of its practicality and the low deployment cost and combines them with a deep learning approach to improve its accuracy. It uses a data-driven approach for the location inference technique to work with deep learning, i.e., location fingerprinting.

The location fingerprinting technique can be considered more robust than the classic geometrical approaches (such as multilateration). It does not rely on line-of-sight (LOS) communication to make a good estimation. In location fingerprinting, a location is estimated based on its fingerprint (or set of features), which in this case is a set of Wi-Fi time of flights (ToFs). For that task, it proposes using a fully connected deep neural network (FCDNN) model to act as a positioning engine. The model is given a location fingerprint as its input to produce the estimated location coordinates as its output. The successful applications of deep learning in various domains [27] and as computing resources become cheaper and more available, it encourage us to apply it in this domain. By doing this research, our major contributions can be seen as follows:(i)We design a cheap, practical, and highly-accurate IPS using Wi-Fi ToF and a deep learning approach.(ii)We conduct extensive experiments to evaluate the condition of available AP pair scenarios and optimize the performance of the WKNN algorithm and our positioning engine or DeepIndoor on a publicly available dataset which can be accessed in [28].(iii)We detail the structure and configuration of our positioning engine to encourage its applications in other testbeds or perhaps to work with other features than Wi-Fi ToF for future developments.

The rest of this article is divided into several sections: Section 2 presents some previous related works, Section 3 details our system model, Section 4 covers our experiment settings, Section 5 shows our results and findings, and Section 6 provides the conclusions of this research.

Indoor positioning system or IPS consists of radio frequency-based system and nonradio-frequency-based system. In frequency-based system, namely, Wi-Fi, there are several localization parameters that consist of distance based and direction based [29]. In distance based, there are signal based and time based. In signal based, there are RSSI and CSI. While in time based, there are ToF and RTT. The localization parameter, ToF is a time difference between time of departure in APs and time of arrival in users. The disadvantages are time synchronization, needed for both APs and the user, and higher cost. The strengths include great resistance to multipath effects and high localization accuracy. The other localization parameter, RSSI is a received signal strength indicator that computes distance by power loss and the signal strength deficiency between APs and the user. The weakness is prone to the noise, multipath effects, and NLoS, and the strength is easy to implement and no synchronization of time and additional hardware is needed. Hence, if the priority is the high accuracy, then the ToF localization parameter can be considered over RSSI. However, the RSSI localization option might be preferred over ToF if the priority is low price. To estimate the user position, the positioning algorithm is needed to calculate the localization parameter.

There are the range-based method like trilateration and the range-free method like fingerprint to utilize the localization parameter. The user position estimation for both the methods in a 2D space requires measurements from at least three APs [30], and at least four APs needed for 3D space.

Ma et al. [31] proposed a novel positioning algorithm to improve positioning result of Wi-Fi RTT ranging. They also explained a characteristic of Wi-Fi fine time measurement (FTM). From the results, the proposed approach achieved a localization error of 1.20 m for static and 1.31 m for dynamic positioning.

Zhou et al. [32] proposed a novel indoor-positioning system algorithm with matrix completion and anchor selection. From the results, the proposed approach achieved a localization error of 1.52 m.

In fingerprinting algorithm, there are deterministic approach such as the Kalman filter, NN, KNN, WKNN, SVM, DT, PCA, and neural networks and the probabilistic approach such as Gaussian distribution, particle filter, Kernel method, hidden Markov model, and Naive Bayes method. For example, using the fingerprinting algorithm and Kalman filter, Giovanelli et al. [33] proposed a novel indoor-positioning system with ToF and RSSI data fusion. The mean RMS error of data fusion is about 50% lower than when just RSSI data are used, 5.69 m and 2.78 m, respectively. The proposed system utilizes both ToF and RSSI as the location-dependent characteristic. The ToF measurement might have large fluctuations because of the jitter, the limited resolution of time intervals, or a combination of both. Hence, it can be reduced with averaging [3436]. Although, the impact of ToF on range measurements may decrease with distance, though, the uncertainty of RSSI measurements may grow with distance. Thus, the RSSI and ToF data may complement each other.

Rizk et al. [37] presents an indoor-positioning system with Wi-Fi RTT and RSSI. To solve the problem of signal fluctuations, interference from fingerprinting, multipath propagation errors, and NLOS transmissions, the proposed system achieved a localization error of 0.51 m and 0.59 m, respectively, for office and lab environments.

Singh et al. [38] presents an overview of machine learning-based indoor-positioning system with Wi-Fi RSSI fingerprints. The survey provided an ML-based Wi-Fi RSSI fingerprinting for indoor localization and a comparison of their performance. The performance of ML prediction models such as DT, SVM, KNN, ANN, MLP, CNN, RNN, and DQN has been compared based on classification accuracy, positioning error, robustness, scalability, complexity, localization space, and database used. Also, the author evaluated that CNN [39] has high robustness, high scalability, and low complexity. Then, from the summarized view of indoor localization schemes table, it can be concluded that KNN could increase the robustness and decrease the positioning error, while PCA could decrease the complexity to reduce the computational time. They also summarized the lists of available open-source datasets.

Chin et al. [40] proposed a MIMO-based indoor positioning with CSI data using the artificial neural network. They compare the performance of GCNN, CNN, and FCNN. The error distance that is below 0.2 m is more than 90% for the GCNN, error distance that is below 0.2 m is 75% for the proposed CNN, and error distance is all above 0.4 m for the FCNN.

3. System Model

3.1. Location Fingerprinting

Location fingerprinting is a location inference technique that utilizes location-dependent characteristics to infer where the estimated location is [41]. A fingerprint, in this context, is a set of characteristics or features that characterize a location. As it utilizes Wi-Fi ToF for this research, a fingerprint is a set of Wi-Fi ToFs. The location fingerprinting technique consists of two stages: (i) the offline stage and (ii) the online stage. The features of various locations in the testbed are collected to build a reference database or map in the offline stage. This database contains various fingerprints with their respective location coordinates. In the online stage, the features of an unknown location are collected to create its fingerprint. The fingerprint of the unknown location is then compared with the fingerprints stored in the reference database to estimate where the unknown location is. A high-level view of the location fingerprinting technique is depicted in Figure 1.

3.2. Wi-Fi ToF

Measuring the ToF of a Wi-Fi signal has been made possible from the time the fine time measurement (FTM) protocol was introduced in the IEEE 802.11-2016 standard. The communication between a client device and an AP under the FTM protocol is shown in Figure 2.

The ToF between two devices is perceived as half of the round-trip time (RTT). Thus, the ToF between a client device and an AP (as illustrated in Figure 2) can be calculated as follows:where t1, t2, t3, and t4 are timestamps recorded on the local device denoting the time of arrival (ToA) or time of departure (ToD) of the corresponding message.

This research uses a publicly available dataset (that can be accessed at [28]) for our experiments. The dataset consists of records of Wi-Fi signals traveling from a transmitting device to a receiving device. The ToD and ToA of the corresponding Wi-Fi signal are available in each record. The ToF of each record can be formulated as follows:

Feature based on ToF for fingerprint, ToF from to , ToA of the signal assessment , ToD of the signal assessment , and the assessment error path indicates ψ, , , , and , respectively. Note that the value of is not provided in the dataset. However, it considers the value of e as part of the characteristics that are willing to be captured since it represents the area condition.

3.3. FCDNN as a Positioning Engine

The role of an FCDNN as the positioning engine, in this case, is to estimate the position of the client device based on its location fingerprint. In the input layer of the FCDNN, the number of neurons is equal to the number of features in the dataset. Additionally, the number of neurons in the output layer depends on the 2-D or 3-D space coordinate. In our case, there are 6 available features in the dataset and 10 APs in the area, and they use the 3-D Cartesian coordinate system to represent the client device location. However, we just used 5 features and omitted the column that displays the AP index in order to evaluate the simulated situations. The structure of the FCDNN for such a case is shown in Figure 3. On the other hand, the number of hidden layers and their neurons is not specified initially (presented in another section).

3.4. Using the FCDNN

To use the FCDNN, it needs to feed it with an input of a location fingerprint. In Figure 3, it is shown in the input layer that each element of the input is connected to a neuron. Therefore, for the input of , the activation value of each neuron in the input layer is as follows:where xj denotes the jth element of the input and denotes the activation value of the jth neuron in the input layer.

For the remaining layers, each neuron is connected to all the neurons in the previous layer (see Figure 3). Determination of the neuron’s activation value in the hidden and output layers is as follows:

The activation value of jth neuron in ith layer, link’s weight from to , activation bias of jth neuron in ith layer, and number of layers represents , , , and L, respectively. Notice that in equation (4), it implements the rectified linear units (ReLU) function in calculating the activation value of the neurons.

The output of the FCDNN is generated based on the value of the neurons in the output layer. As shown in Figure 3, each neuron in the output layer is connected to an input element. Therefore, the output of the FCDNN is calculated as follows:where denotes the output, which is the estimated location coordinates (or label), and denotes the number of neurons in the output layer.

3.5. Optimizing the FCDNN

By changing the model’s parameters , such as weights and biases, the FCDNN is optimized. This process aims to make the model better at doing its task. For that purpose, it uses gradient descent to minimize the cost function by updating each element of in the opposite direction of the cost function gradient w.r.t. the elements of [41].

The model’s cost using the root mean square error (RMSE), since it considers the issue as a regression problem as follows:

Sample count, weights and biases, true label of ith sample, and estimated label of ith sample indicated by m, d, , and , respectively. Note that and are treated equally as both depend on the model’s parameters.

To update the parameters using gradient descent, it adopts the stochastic gradient descent (SGD) algorithm [41]. Additionally, minibatch SGD improves computing efficiency. We use the estimate of moments of gradients to hasten convergence and slow down the quick decay of learning rates [42, 43]. First, calculate that is the gradient of the cost function w.r.t. to the parameter at the timestep t as follows:

After obtaining , compute the value of and , which are the exponential moving averages of the gradient and the squared gradient, respectively. The computations are as follows:where is the hyperparameter that controls the exponential decay rates of the corresponding moving averages and and are the first moment (the mean) and the second raw moment (the uncentered variance) of the gradient estimates, respectively.

Since the value of and is initialized as 0, weight-correction and bias-correction to both and is performed to counteract the moment estimates that are biased towards zero at initial timesteps. The value of bias-corrected and is calculated as follows:where and denotes the weight-corrected and bias-corrected, respectively. Finally, it can update the model’s parameters as follows:where the parameter that minimizes is to be estimated. Each summand function is typically associated with the ith observation in the dataset (used for training). The hyperparameter , or the learning rate, controls the step size for each iteration.

4. Experiment Settings

4.1. The Testbed Map

The dataset used in this research is obtained from [28]. The dataset consists of ToF measurements at 4410 locations in the given area. Before creating the fingerprints of each location, we check if all of the APs are accessible at each of those 4410 locations. The locations where a pair of 1, 2, 3, and more APs in the area can be heard. Then, we make 3 scenarios for those APs pair locations. Where, 1 AP pair, 2–10 AP pairs, 1–10 AP pairs, or raw dataset is simulated as shown in Table 2. The ToF measurement samples at 4410 different locations in the given area. In our experiment, our goal is to estimate the client device location based on the Wi-Fi ToFs between the client device and the available APs in the area. Table 3 provides an illustration of measurement data at a specific place, where for every client location (X, Y, Z), there are ψ or the ToF-based feature between the Tx device or the user device and Rx device or APs. Then, these features are available from distance (m) in the dataset. Some client location (X, Y, Z) may have 1, 2, 3, 4, 5, or more available ToF-based feature from nearby APs. For example, in Table 4, in the dataset there are 4 available ToF-based features from 4 nearby APs for client location (−0.74769843 m, 7.46585460 m, and 1.4 m) and (−0.72188836 m, 8.71135620 m, and 1.4 m).

4.2. The Training, Validation, and Testing Dataset

From the ToF measurement samples, it can create 4410 fingerprints, and each of the fingerprints belongs to a location in the testbed. Those fingerprints are split into different sets for training and testing purpose. 80% of them are used for training, and the rest of 20% are used for testing. Thus, the number of fingerprints in the training and testing dataset are 3528 and 882, respectively. In Table 4, for the scenario 1, where 1 AP pair was filtered from the dataset with 1006 fingerprints. Thus, the number of fingerprints in the training and testing dataset are 805 and 201, respectively. In addition, the dataset containing 2689 fingerprints was filtered for scenario 2, which included 2 AP pairs and above, up to 10 AP pairs. Thus, the number of fingerprints in the training and testing dataset are 2151 and 538, respectively. Additionally, for scenario 3, where there were at least one AP pair and up to ten AP pairs were filtered out of the dataset with 4410 fingerprints. Thus, the number of fingerprints in the training and testing dataset are 3528 and 882, respectively. Notice that rounding is applied. Then, for these 3 scenarios, 2 algorithms which are WKNN and the proposed FCDNN were simulated to compare the performance to predicted user positions.

4.3. The Model’s Hyperparameters

The value of each hyperparameter of our model is detailed in Table 5. There are options for batch size, epochs, hidden layer, neuron in the hidden layer, and .

5. Results

5.1. Exploration of Model Structures

The experiments started by training WKNN and the proposed FCDNN models where each model has a different combination of an AP pair that filtered from the dataset. To obtain their accuracy, 3 scenarios were simulated to predict user locations (X, Y, Z). Thus, in terms of number of AP pairs, these models are tested for their accuracy. Note that there is a trade-off between accuracy and over-fitness; therefore, the right balance between the two is aimed. The positioning error is calculated as the L-2 norm between the estimated and the ground-truth position in Equation (11), where denotes the positioning error of the ith example.

5.2. Exploration of Model Structures

Weighted k-nearest neighbor: in Figure 4, the positioning error of X, Y, and Z in the WKNN algorithm are shown. It can be seen, if the distribution of the true values is more condensed near the predicted lines and also linear along the lines, the algorithm can be considered more accurate. Additionally, because every user’s z location is the same (1.4 m), the WKNN algorithm can identify this distribution of user z positions and forecast that all user z positions will be at a single location with a value of almost 1.4 m. It may be inferred from this that the WKNN was able to discriminate between the user x, y, and z position distribution, where the user x and y position have diverse distributions. Additionally, distinguish between the user z locations and the fact that they all have the same value under a single distribution. Figures 5 and 6 show WKNN loss for K = [1, 39]. It can be seen that, 0.9424 m loss and 1.3635 m RMSE are the lowest in scenario 3 with K = 3, where the dataset used has 1 and above, up to the 10 AP pair distribution. For scenarios 1 and 2, the loss and RMSE are lowest with K = 2. Furthermore, scenario 1 with K = 2 has the greatest 3.1136 m loss and 2.4771 m RMSE. Thus, in order for WKNN to achieve lower loss and RMSE, the AP pair distribution of 2 and even 1 are needed in scenario 3.

5.3. Fully Connected Deep Neural Network

In Figure 7, the positioning error of X, Y, and Z in the proposed FCDNN algorithm are shown. It can be seen, if the distribution of the true values is more condensed near the predicted lines and also linear along the lines, the algorithm can be considered more accurate. Moreover, because every user’s z location is the same (1.4 m), the FCDNN method predicts that the user’s z position has a variety of points with values ranging from 1.39 m to 1.49 m but is unable to discern the user’s z position distribution. Therefore, it may be inferred that with a single distribution, the FCDNN was unable to discriminate across user z position distributions where the user x and y position have diverse distributions and were unable to distinguish between the user z places since they all had the same value under one distribution. Figures 8 and 9 show the proposed FCDNN loss for 1000 epochs for both training and testing. It can be seen that, 0.1749 m loss and 0.5740 m RMSE are the lowest in scenario 3, where the dataset used has 1 and above, up to the 10 AP pair distributions, and 0.6277 m loss and 1.1231 m RMSE are the highest in scenario 1. Thus, in order for FCDNN to achieve lower loss and RMSE, the AP pair distribution of 2 and even 1 are needed in scenario 3.

5.4. Performance Comparison of WKNN and FCDNN

In Table 6, FCDNN has lower loss and RMSE than WKNN in all 3 scenarios, and if there are ever more diverse APs accessible, the performance of both algorithms is more likely to improve. The WKNN has the lowest loss and RMSE with 0.9424 m and 1.3635 m, respectively in scenario 3 or 1–10 AP pairs. The FCDNN has the lowest loss and RMSE with 0.1749 m and 0.5740 m, respectively, in scenario 3 or 1–10 AP pairs or the raw dataset.

6. Conclusion

The proposed IPS, DeepIndoor, which combines Wi-Fi ToF and a deep learning approach, successfully achieves the goal of this research, namely, enabling the realization of a high-accuracy, cheap, and practical IPS. The use of a deep learning model, where it established FCDNN that was made for this purpose, allows for the high accuracy. The average positioning error of DeepIndoor is 0.1749 m and RMSE of 0.5740 m. The realization of DeepIndoor is also cheap and practical since it utilizes Wi-Fi as the underlying technology where the availability of Wi-Fi in most smartphones and the deployment of Wi-Fi networks in many indoor facilities contribute to these advantages. Therefore, accuracy will increase if there is a larger variety of the AP pair distribution available in the dataset used for training and testing.

Data Availability

The Wi-Fi ToF dataset used to support the findings of this study are available in the GitHub repository (https://github.com/intel/WiFi-Location-Core-PE-and-Measurement-Database).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors want to thank Telkom University, Bandung, for being willing to provide research funding and publication costs. FCT/MCTES partially funds this work. This work was funded by Telkom University.