Saturday, June 24, 2017 - 09:00 to 15:45
Probabilistic integration of large Brazilian socioeconomic and clinical databases
The integration of disparate large and heterogeneous socioeconomic and clinical databases is now considered essential to capture and model longitudinal and social aspects of diseases. However, such integration has significant challenges associated with it. Databases are often stored in disparate locations, make use of different identifiers, have variable data quality, record information in bespoke purpose-specific formats and have different levels of associated metadata. Novel computational methods are required to integrate such databases and enable their statistical analyses for clinical research purposes. In this paper, we describe a probabilistic approach for constructing a very large population-based cohort comprised of 114 million individuals using linkages between clinical databases from the National Health System and other administrative databases from various government entities in order to facilitate epidemiological research. We discuss and evaluate the design and validation of our data integration model and probabilistic data linkage methods for creating research data marts that can be statistically analyzed.
Marcos Barreto's picture
Marcos Barreto
University College London (UK)
Clicia Pinto's picture
Clicia Pinto
Robespierre Pita's picture
Robespierre Pita
George Barbosa's picture
George Barbosa
Samila Sena's picture
Samila Sena
Rosemeire Fiaccone's picture
Rosemeire Fiaccone
Leila D. A. F. Amorim's picture
Leila D. A. F. Amorim
Maria Yuri Ichihara's picture
Maria Yuri Ichihara
Mauricio Barreto's picture
Mauricio Barreto
Spiros Denaxas's picture
Spiros Denaxas
University College London (UK)
Sandra Reis's picture
Sandra Reis
Bruno Araujo's picture
Bruno Araujo
Juracy Bertoldo's picture
Juracy Bertoldo