ScholarSpace will be brought offline for upgrades on Wednesday December 9th at 11AM HST. Service will be disrupted for approximately 2 hours. Please direct any questions to

Show simple item record

Item Description Khan, Mahmudul Huq en_US 2009-07-15T17:17:19Z 2009-07-15T17:17:19Z 1991 en_US
dc.description Thesis (Ph. D.)--University of Hawaii at Manoa, 1991. en_US
dc.description Includes bibliographical references (leaves 128-132) en_US
dc.description Microfiche. en_US
dc.description xiii, 132 leaves, bound ill. 29 cm en_US
dc.description.abstract Two data files can be compared only if they have common variables. Pairs are formed by the combination of two records, one from each file. The actual status of a pair of records is either a match or a nonmatch, but is usually unknown. Record linkage methodology is used to make a decision on the actual status of each observed pair. Decision-errors are committed whenever the actual status and the decision are different. The new decision-rule is based on minimizing the sum of these errors of misclassification. Discrete discriminant analysis has been used to discriminate between match and nonmatch pairs. It is assumed that there are chance factors associated with any pair of records belonging to the sets of match and nonmatch pairs. A high value of the chance factor implies that the reliability of the pair belonging to the corresponding set is low. This is the basic premise in this research. Two discriminant scores are computed from the conditional probabilities of the observed values among match and nonmatch pairs, and the true proportion of match and nonmatch pairs. A pair is decided as a link or a nonlink, depending on which score is larger. The discriminant scores are estimated by using the chance probabilities and applying Bayes' theorem on conditional probability. In order to reduce the number of unnecessary comparisons, the files are often blocked by one or more variables. A new method for choosing the best blocking scheme has been proposed in this study. Some of the actual match pairs may be missed because they fall in different blocks. The idea is to select the scheme for which the failure to pick up the match pairs is the least. In comparison to some of the commonly used linkage procedures, the new method is statistically sound, yet it only requires the frequency counts of the variables in the files. Simulated experiments using the new method have been very successful. without any data error the results were 100% correct. Even with 10% data errors 80% of the true matches were correctly decided. en_US
dc.language.iso en-US en_US
dc.relation Theses for the degree of Doctor of Philosophy (University of Hawaii at Manoa). Biomedical Sciences (Biostatistics - Epidemiology); no. 2600 en_US
dc.rights All UHM dissertations and theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission from the copyright owner. en_US
dc.subject Medical record linkage en_US
dc.subject Statistics -- Data processing en_US
dc.title Maximizing the use of blocking in record linkage : theory and simulation en_US
dc.type Thesis en_US
dc.type.dcmi Text en_US

Item File(s)

Description Files Size Format View
Restricted for viewing only uhm_phd_9129682_r.pdf 3.339Mb PDF View/Open
For UH users only uhm_phd_9129682_uh.pdf 3.301Mb PDF View/Open

This item appears in the following Collection(s)

Show simple item record


Advanced Search


My Account