Spatiotemporal-based clusters as a method for dengue surveillance

ABSTRACT Objectives. To develop and demonstrate the use of a new method for epidemiological surveillance of dengue. Methods. This was a retrospective cohort study using data from the Health Department of São José do Rio Preto (São Paulo, Brazil). The geographical coordinates were obtained using QGIS™ (Creative Commons Corporation, Mountain View, California, United States), based on patient addresses in the dengue notification system of the Government of Brazil. SaTScan™ (Martin Kulldorff, Boston, Massachusetts, United States) was then used to create a space-time scan analysis to find statistically significant clusters of dengue. These results were plotted and visualized using Google Earth™ mapping service (Google Incorporated, Mountain View, California, United States). Results. More clusters were detected when the maximum number of households per cluster was set to 10% (11 statistically significant clusters) rather than 50% (8 statistically significant clusters). The cluster radius varied from 0.18 – 2.04 km and the period of time varied from 6 days – 6 months. The infection rate was more than 0.5 cases/household. Conclusions. When using SaTScan for space-time analysis of dengue cases, the maximum number of households per cluster should be set to 10%. This methodology may be useful to optimizing dengue surveillance systems, especially in countries where resources are scarce and government programs have not had much success controlling the disease.

incorrect patient addresses, and an unwillingness of residents to welcome health workers (1). In this context, it is essential to use established methods, as well as innovative approaches, to combat dengue fever (5,6).
One innovation with potential is the geoprocessing of dengue data, a process that can improve visualization of the spatial and temporal distribution of events. This information can, in turn, provide a more comprehensive understanding of a situation and can aid in generating new hypotheses and formulating preventive measures (6 -9). It can also allow health officials to easily identify priority areas and better direct efforts to control dengue (1,6,10,11). Similar processes have been proposed to combat malaria in the Brazilian Amazon (12), as well as infections in hospitals (13). However, no studies have used a geoprocessing approach in conjunction with free programs that can map dengue cases based solely on the probable address of the infection site. Therefore, the purpose of this study was to develop and demonstrate the use of a new, low-cost, epidemiological method for dengue surveillance based on geoprocessing techniques.

MATERIALS AND METHODS
This was a retrospective study of confirmed cases of dengue in the Health Department of São José do Rio Preto (São Paulo, Brazil) in 2009. The area has an estimated population of 409 000, with a population density of 945 inhabitants per km 2 (14). The study used information on dengue notifications together with free software programs to develop an auxiliary method of surveillance based on patient addresses and to identify priority areas for dengue control teams.

Data source
Information was collected from patient medical records available in the Sistema de Informação de Agravos de Notificação (Information System for Notifiable Diseases; SINAN) for the Health Department of São José do Rio Preto. Every dengue case confirmed in January -December 2009 was evaluated. All dengue cases were confirmed by enzyme-linked immunosorbent assay, non -structural protein 1, or polymerase chain reaction amplification. The notifications that did not contain adequate information about the patient's address were used as exclusion criterion. Of the total, 72 inappropriate records were excluded and 1 051 cases were submitted to analysis. The year 2009 was chosen because it had the most reliable information available for validating the proposed method. The patient's address was used as the probable place of infection.

Geoprocessing of cases and statistical analyses
The address of the primary residence for each dengue patient was entered into QGIS™ version 2.2 (Creative Commons Corporation, Mountain View, California, United States), an opensource geographic information system used to locate geographic coordinates. Then, SaTScan™ version 9.4.2 (Martin Kulldorff, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, Massachusetts, United States) was used to analyze the space-time scan. It systematically creates circular windows throughout the geographic area over time. Then it identifies statistically significant clusters of cases within a circle compared to the incidence outside. (15). To identify clusters, the aggregated time length was set at 7 days. The radius of the circular scanning window around each geographical point was defined as the maximum percentage of households to be included in the cluster during the study period. The maximum radius was set at 50% in order to detect large clusters and 10% to detect smaller clusters. The maximum duration of temporal clusters was 50% of the study period.
The location and dimensions of the window associated with the largest likelihood value were used to define the most likely cluster. The significance of this cluster was tested by a probabilistic analysis using the Monte Carlo method (999 replications), as well as a retrospective space-time analysis. Clusters with a P-value < 0.05 were considered statistically significant.
SaTScan™ then generated databases of clusters that were plotted for spatial visualization using Google Earth™ mapping service version 7.1.2.2041 (Google Incorporated, Mountain View, California, United States).

Ethics
This study was authorized by the Ethical Committee from União das Faculdades dos Grande Lagos (São José do Rio Preto, São Paulo, Brazil). It did not include informed consent because patient data remained confidential. Only the researcher involved in the statistical analysis had access to all of the information.

RESULTS
The spatial analysis with the maximum number of households set at 50% identified eight statistically significant clusters within the study period ( Figure 1). Two additional clusters were also identified, but these were not statistically significant (P = 0.44 and P = 0.96, respectively; Table 1). The 1 051 cases of dengue were related to 924 households representing an overall infection rate of 1.14 cases/household. The radii of these statistically significant clusters ranged from 0.18 -2.04 km. These clusters were scattered around the study area and in different time intervals. A large temporal cluster of 6 months (cluster 4) was detected for May -November 2009. This cluster was also the second largest spatial cluster, with a radius of 1.5 km (Table 2). Infection rates in the clusters were relatively high; three-quarters of them had infection rates above 50%. In particular, cluster 1, which included 86 households, had a very high infection rate (0.86 cases/household). However, cluster 5 had the highest infection rate (1.0 cases/household), but contained only five households (Table 1).
When only 10% of households were included in each cluster, the analysis identified 11 statistically significant clusters (P < 0.05; Figure 2), in addition to 9 that were not statistically significant (P > 0.05; Table 1). The radii of the statistically significant clusters were similar whether the maximum was set at 10% or 50%. Clusters 5 and 8 increased in size, but cluster 4 decreased to less than one-half of its original when the maximum number of households was lowered to 10%. At 10%, cluster 5 had the highest temporal coverage during the study period ( Table 2); and the infection rates for 8 of the 11 (72.7%) clusters were statistically significant with values > 0.5 cases/household. In particular, cluster 1 had an infection rate of 0.86 cases/household, with the maximum set at either 10% or 50%. Cluster 11 showed the same infection rate as cluster 1, but included fewer cases of dengue and fewer affected households (Table 1).

DISCUSSION
Although the popularity of geoprocessing to study the dynamics of diseases, especially dengue (16), is increasing, no studies have simultaneously used three open-access programs as an epidemiological surveillance method. In our study, more clusters were detected when the maximum number of households was set to 10% than when it was set to 50% (11 statistically significant clusters versus 8 statistically significant clusters, respectively). In addition, the radii varied (0.18 km -2.04 km) as did the time duration (6 days -6 months), and the infection rate was greater than 0.5 cases/ household).
The application of such resources for understanding the spatial and temporal distribution of dengue is becoming more popular, and work is being conducted by researchers in several countries (16,17). For example, in 2014, Banu and colleagues (1) used data from 82 countries in the Asia-Pacific region collected in 1995 -2004 and a similar methodology to demonstrate the importance of processing dengue data for better prevention and control. However, our approach to identifying clusters using a space-time scan analysis with two scales is more appropriate than a purely spatial scan (1,11,18). Similarly, although other units-zip-codes, neighborhoods, cities, states-may be used, geographical coordinates are widely applied as the spatial unit of aggregation in geoprocessing studies of infectious and non-infectious diseases (11). In addition, we set the aggregation time as 7 days to facilitate weekly government surveillance surveys, and in turn, to help guide field teams; however, times as short as a day may also be used (11), especially in areas with a high incidence of dengue.
Furthermore, setting the maximum number of households at 50%, as in our study, has previously been employed and questioned by other authors (19). They agree that this may contribute to the loss of smaller, statistically significant clusters (because they remain undetected). Thus, it is quite reasonable to reduce this percentage to 10%, as in our study. In addition, although any Geographic Information System program can geolocate data points on a map (20), Google Earth™ is more accurate and more informative, and the exact address of cases can be properly georeferenced, including the location, as well as a single view of each cluster household.
Limitations. Despite several strengths, this study also has some limitations. For instance, this study was based on retrospective data that was taken from information collected by various health professionals over a year. However, the data were carefully filtered and only data that had no conflicting information was used, i.e., the case of dengue was confirmed and the patient's primary address was properly recorded.
In addition, the primary residence may not have been the location where the person was bitten by the infected mosquito; however, this may not be problematic since dengue infections are more likely to involve several members of a household (11). Therefore, these limitations may have little influence on the results.

Conclusions
Our results show that this auxiliary methodology of dengue surveillance may more accurately detect priority areas for intervention when the maximum number of families included in each cluster is adjusted to 10% of total households. This method may be extremely relevant in countries where resources are scarce and government programs have not