sample="quota" bates="501869639" isource="rjr" decade="Bliley" class="ui" date="19690901" Doll-Hill Progress Report No. 1 1 September 1969 The following report covers the period from the inception of the study until 1 September 1969. In order to fully understand the progress of the study, it seems worthwhile to describe in this first report what constitutes the data set and some of its limitations. Sample Description The study is concerned with the "pure" smoking experience of 15,908 British doctors during the period 1951-1961. "Pure" smoking is defined as cigarette smoking uncomplicated by pipes, cigars, or mixtures of the three types of smoking. Non-smokers as well as ex-smokers are included in this sample but again, these subjects either never smoked cigarettes or smoked only cigarettes at least one year prior to 1951. The distribution of this sample by age and smoking groups at the beginning of the study is shown in Table 1. A natural question arises: how well does this sample of pure cigarette smokers, ex-smokers, and non-smokers compare to a) the total Doll-Hill population and b) the census population of British doctors. The latter group represents another sampling procedure of the same population. Doll and Hill sampled the British physician population with questionnaires while the Registrar General sampled via census methods. Table 2 compares our present sample with the above. It can be readily seen that our sample has a greater concentration of younger physicians than either Doll-Hill's original sample or the Census. This is probably low to the best at pipe and cigar smokers in the present sample since these people tend to be older. Dr. Doll did not include any smoking information on the less than 35 year old subjects (113 subjects) in the starting population that he sent us. He did, however, include these subjects in the 2,705 deaths attributed to the initial population. Due to the omission we will be unable to calculate mortality rates for the less than 35 year old group since the population at risk is unknown. We will be able to use these younger subjects when they enter the 35-44 age group and contribute person-years to this cohort. Table 2 presents excellent agreement between the Doll-Hill and the Census populations in regard to the age distribution. The discrepancy noted between the two totals (34,494 and 39,507) may be due to the inclusion of radiologists in the Census population. If this is not the reason, we can offer no other at this time and this question probably should be brought to Dr. Doll's attention. II. Progress to Date The data from Doll-Hill consists of the following items: 1) Identification number 2) Age of subject at the start of the study (1951) 3) Number of years subject was in study before his death 4) International Classification of Death code (ICD) 5) Smoking classification of subject where NON = Non-smoker EX = Ex-smoker; smoker who stopped smoking cigarettes at least one year prior to the study C01 = Smoker of 1 to 14 cigarettes/day C15 = Smoker of 15 to 24 cigarettes/day C25 = Smoker of 25 or more cigarettes/day There are 2,705 deaths total over the ten year span (1951-1961) from all causes. Table 3 indicates a breakdown of this total by age and smoking groups. Cross-classification frequency distributions have been generated from magnetic tape for the four variables: age at entry, ICD code, number of years in study, and smoking habits. These are shown in the Appendix. Frequencies are the putting causes of death are being implemented by means of computer programs and will be ready for the next progress report. Computer programs completed as of this date are: (1) Mantel-Haepszel Summary Chi-Square Procedure (2) Program to compute the crude mortality rates by the migration and "standard" methods for the major causes of death. The adjustment procedures for the crude mortality rates have been considered. Two methods, the direct and indirect, are available for use. The direct method, the one Doll used for his adjustment, relates the age-specific death rates of a given community (namely the British male doctors) to some population taken as a standard (for example, the total population of England and Wales). It finds for each age group what the expected number of deaths in the standard population would be if the age-specific mortality of the community were applied. This is accomplished by multiplying the specific rate for each age group by the population for the corresponding age group in the standard population. The adjusted death rate, direct method, is formed by adding the expected deaths for each age group and dividing the sum by the total population that was taken as a standard. The indirect method is computed by multiplying the crude death rate of the community by an adjustment factor that is designed to take account of the peculiarities of the age composition of the community. Since the direct method has been used by Doll and the results of the two methods do not differ appreciably, our analysis should utilize the direct method as well. Mention of the other method is only for information purposes. A more serious question should be considered concerning the adjustment procedure. This relates to the basic problem of converting the physician's death rates to those of the general population, that is, the value of the adjustment procedure itself. The physicians represent a homogeneous group of people, probably selected for their availability and intelligence. Can the rates of such a specialized group justifiably be related to the heterogeneous general population? The purpose of adjustment is for comparability of one community or one time period with another. If smoking is a deterrant to health, why shouldn't this fact show up within the physician population. It seems to us that the adjustment procedure is unnecessary and without merit for our purposes in this study. Of course, we have an obligation to reproduce the Doll-Hill procedure using the "pure" smoking groups and to adjust using the male British physician population but these reasons are empirical and unimportant in the establishment of smoking and health relationships. The crux of the adjustment problem may lie in statistical nomenclature and in the historical concepts of Vital Statistics. The latter branch of statistics is always concerned with time and place comparisons and therefore adjustments are required to enable these comparisons to be be conducted. In fact, the mortality rate estimates would be meaningless and useless unless they could be compared with some other set. However, the mortality rates of the male British physicians are calculated to determine the existence of a link between smoking and causes of death for this specific occupation. Statisticians consider this a fixed sample rather than a random one. Almost the entire British medical population is included as evidenced by the Census figures. This is hardly a random sample and therefore inferences concerning this and only this population are entitled to be made. Adjustments cannot alter this restriction since the general population of British males was not sampled. It is suffice to say that the physicians and the general population differ in many ways and to sample one and make inferences concerning the other seems quite erroneous and misleading. Expected Deaths Most smoking and health studies will attempt to assess the total mortality experience of the population under study. Doll and Hill neglect to do this and explain the "figures for total mortality should not, therefore, be interpreted until the mortality of each of its principal disease components has been separately studied" (p. 1402 Reference 3). This premise can be easily argued especially when it or investigators (Himpang) have chosen not to perform such an analysis. It should be noted that the calculation of expected deaths which the authors perform in the 1954 and 1956 papers (References 1 and 4) is not performed in the 1964 paper (Reference 3). The expected deaths are utilized in the chi-square test. They state, "The numbers of deaths in most of these categories are so large that tests of statistical significance are hardly necessary." (P. 1401 Reference 3). This presents a definite inconsistency because later on in the 1964 paper (Reference 3) probability values are quoted for the various causes of death. Where were these probability levels derived from if not as the results of statistical significance tests. We would assume then that significance tests were performed and in the same manner as in the 1954 and 1956 papers. It would seem in order to examine the total mortality experience in this report even though Doll and Hill do not. The 1954 paper (P. 1452 Reference 4) details the method the authors used to conduct these significance tests. We have similarly performed the same test but using our present sample which contains only "pure" smokers. Table 4 is identical to Table I of Reference 4. The statistical significance of the difference between death rates can me more easily assessed from the actual numbers of deaths recorded; that is, by comparing them with the numbers which would have been expected to occur in each smoking category if smoking were unrelated to the chance of dying. To calculate the expected number of deaths, multiply the percentages for a particular age group from Table 4 by the total number of deaths for that age group in Table 5. If mortality is unrelated to smoking, then the total deaths from Table 5 for each age group should be distributed in the same proportion as the smoking groups of Table 4. This then is the expected number of deaths as shown in Table 6. The observed numbers are the actual deaths due to all causes by age and smoking groups at the end of the study. The totals represent the total expected deaths in each smoking group and the total observed number. These are the values submitted to the chi-square test shown in Table 7. The results are very interesting and show the lack of any smoking gradient. The total chi square is highly significant but this is due to the large contribution of the C25 group. The non-smokers represent the second largest contribution which certainly is contrary to previous findings reported in the literature. The machinery has been set up to perform this analysis on each of the major causes of death listed in the contract. III. Interesting Avenues for Investigation It seemed interesting to compare the Doll-Hill deaths with that of the Registrar General's (Reference 2) Census for male British doctors. The Doll-Hill study covered a ten-year (1951-1961) period while the Census data is for the five-year period (1949-1953). We chose malignant neoplasms, all sites, ICD Codes 140-205 to explore comparing seemingly the same population. The deaths were compared when converted to the number occurring in one year to equalize the time span. Since the time span overlaps but fails to coincide, the Census deaths for the doctors should reflect better diagnostic procedures which should lead to more treatment successes. This problem is far from resolved in our mind but note the wide discrepancy between the deaths/year for the last two age groups. This sort of exploration, that is, using material other than the Doll-Hill sample for comparative purposes seems worthy of further investigation and may shed light on some sampling problems in the Doll-Hill report. Another item that should be looked into is the mortality rate calculation using the individual ages rather than age groups. our expanse investigates that the average in the age groups is twice the differences the other value until close to one of the bounds. It should not be difficult to calculate rates for reach age with the use of a computer program. We have also become aware of a paper by Sprigett which cautions against using long study intervals in the study of lung cancer, due to rapid rate of change of mortality from this cause. We intend to split the Doll-Hill sample into two five-year groups to see whether deaths from ICD cause 162 differ in the two cohorts. Summary The progress to date has been the production of frequency distributions of the cross-classified variables, implementation of computer programs to produce the required plots and distributions by major causes of death, and development of algorithms to calculate person/years by the migration and "standard" methods. The computer printouts are available for your inspection at your request. - 13 -