Estimating national excess mortality from subnational data: application to Argentina

ABSTRACT This paper presents a method to estimate excess mortality where national data are missing for some or all of the coronavirus disease 2019 (COVID-19) pandemic period, but subnational data exist, such as in Argentina. By making use of the stability of the regional distribution of deaths, data on deaths in Córdoba province were used to project excess deaths in Argentina from March 2020 up to the end of 2021. The number of excess deaths was estimated at 134 504, which is 14.8% higher than the reported number of COVID-19 deaths in Argentina for the same time period.


METHODS
In order to use D l to project to D c we require an estimate for the current D l /D c , which is unknown since we do not have D c . Therefore, D l /D c (the estimate for D l /D c ) is obtained by using the historical ratio. To make things concrete, we introduce another subscript for time such that our estimate for the ratio between local unit deaths and total national deaths in 2021, i.e. D l,2021 /D c,2021 might be D l,2019 /D c,2019 , or D l,2020 /D c,2020 , if data exist for 2020. In simple terms, this could be from before the pandemic or from a period during the pandemic where information on both local mortality and nationwide mortality is available. We can establish bounds on this estimate by using standard This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 IGO License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited. No modifications or commercial use of this article are permitted. In any reproduction of this article there should not be any suggestion that PAHO or this article endorse any specific organization or products. The use of the PAHO logo is not permitted. This notice should be preserved along with the article's original URL. Open access logo and text by PLoS, under the Creative Commons Attribution-Share Alike 3.0 Unported license. first-order Taylor expansion (delta method) for the variance of the ratio estimator: Where D l and D c are the estimates for the total number of deaths at the local unit level and country level, respectively. This method is appropriate given two important conditions: first, the pandemic's spatial distribution is similar in the local unit and at the national level; and second, the share of deaths is stable, meaning movement between local units (migration) is low, so that state-or province-level local units are more suitable than village, town or city level local units, where the share of deaths is likely to be less stable.
This method was applied to Argentina and the province of Córdoba, where these conditions are satisfied. Argentina is a federation of 24 provinces, which is the local unit. The province of Córdoba is the second largest province in terms of population size. La Voz del Interior is a newspaper in Córdoba which published monthly number of deaths from all causes in the province from January 2019 to December 2021 (3). These data were available a long time before Argentina had released its 2020 all-cause mortality counts and are much more recent, as the latest national level data available only go up to December 2020 (4). Additional data on the annual number of deaths by province and daily number of COVID-19 deaths by province were obtained from the open-data portal of the Argentinian Ministry of Health (5).

RESULTS
From 2005 to 2019, the share of annual all-cause deaths in Argentina accounted for by Córdoba was stable, ranging from 8.6% to 8.8%. In 2019, the share of registered monthly number of deaths in Córdoba to that of all of Argentina ranged from 8.1% (January) to 10.5% (December), with the annual share being 9.0%. In 2020, this monthly share ranged from 7.7% (August) to 10.3% (October) with the annual share also being 9.0% (Figure 1, Panel A). The spatial distribution of COVID-19 deaths is very similar between Córdoba and the rest of Argentina, with the peaks and troughs almost perfectly aligned temporally ( Figure 1, panel B). Thus, the projection of national mortality from Córdoba to the rest of Argentina during the COVID-19 pandemic satisfies both conditions mentioned earlier and is likely to hold.
Estimating D l /D c using the monthly data on all-cause mortality nationally and in Córdoba (Figure 1, panel A), the share was estimated at 9.0% (95% confidence interval (CI) 8.7% to 9.3%). Multiplying the reported monthly number of deaths in Córdoba by the reciprocal of this estimated share, the projections for  Figure 2. There were significant differences in August and September 2020 with the total national mortality being higher, but this is then compensated for by the projected mortality being higher in October and November, such that the total annual mortality is very similar: 378 995 nationally registered deaths and 379 386 deaths as projected from Córdoba. In addition, the method used in this study estimated that the peak monthly mortality in Argentina occurred around June 2021, with about 55 500 projected deaths, which is 22 400 deaths more than expected. This corresponds with the reported COVID-19 trajectory, where deaths peaked during the same period (Figure 1, panel B).
The total number of excess deaths in Argentina from March 2020 to December 2021 (as computed by the world mortality method (2)) was 134 504 (95% CI 108 202 to 162 766), with COVID-19 deaths for the same period at 117 111, as reported to the World Health Organization (6), representing an undercount ratio of 1.15, i.e. 14.8% higher than reported.

DISCUSSION
The main result of this study is that the number of excess deaths in Argentina, from March 2020 to the end of December 2021, as estimated by a projection using actual all-cause mortality data from the province of Córdoba, was 134 504. The number of COVID-19 deaths in Argentina reported to the World Health Organization for the same period was 117 111, representing an undercount ratio of 1.15, i.e. 14.8% higher than reported. This undercount ratio places Argentina among other Latin American countries that have generally low undercount ratios, indicating few potentially missing COVID-19 deaths. Argentina's estimated undercount ratio is similar to Brazil (1.11), Paraguay (1.15), and Peru (1.07), and higher than Chile (0.99) and Panama (1.01). It is much lower than the undercount ratio observed in Bolivia (2.51), Ecuador (2.03), and Mexico (1.99), where proper certification of COVID-19 has been an issue due to limited testing (7).
The excess deaths estimate for Argentina in the present study is comparable to other currently available estimates from the Institute for Health Metrics and Evaluation (IHME) and The Economist (8,9), which are 125 694 and 154 403, respectively. Both the IHME and The Economist models rely only on national all-cause mortality data up to the end of 2020 in Argentina and the relation between observed all-cause mortality in other countries, and COVID-19 and socioeconomic variables, which are then used to project to countries without available all-cause mortality data.
The main limitation of the method used in this study is the requirement for a similar trajectory of COVID-19 deaths between the relevant subnational unit and the national level. Indeed, in many countries, COVID-19 has had a very different spatial spread between different regions, such that the method shown here would not be well suited to project national excess deaths. Prominent examples include the United States of America, where both reported COVID-19 and excess deaths during the first wave of the pandemic (March-April 2020) were limited to the eastern states such as New York and New Jersey. In Peru, excess deaths in the region of Lima started well before the region of Tacna (10). In Ecuador, spread began at the regional coastlines before it reached inland (7). The method shown in this paper was used to derive an up-to-date estimate of excess mortality in Argentina, overcoming the absence of all-cause mortality data in 2021. It may be used in other settings, within Latin America and beyond, to derive similar estimates if national-level data are delayed or even do not exist for the COVID-19 period. In addition, if civil registration and vital statistics systems in some regions are disrupted or delayed, national data aggregated from regional data will be delayed as well. Using information from well functioning regional civil registration and vital statistics systems with the method shown here provides an up-to-date estimate of excess mortality. When future national-level allcause mortality data in Argentina for 2021 are published, they may be used to validate the proposed method. Furthermore, as information on COVID-19 from Córdoba for 2022 is also expected be available earlier than the national information for 2022, this method can continue projecting and tracking excess mortality in Argentina. In several countries, subnational all-cause mortality data exist where national data have yet to be published or may never be published due to low coverage of national vital statistics registration. The method shown here may also be applied to countries such as India (11), Indonesia (12), Syrian Arab Republic (13), Turkey (14), Yemen (15), and others.