In this chapter, we address issues related to the selection of the study sample and methods for collecting data on study participants. We begin with a discussion of methods for determining sample size and the process by which treatment and control/comparison groups may be selected. We then describe methods available to evaluators for collecting data on study participants, including surveys and program administrative data sources. We conclude the chapter with a discussion of the content and timing of baseline and follow-up data collection efforts.
All of the evaluation design alternatives call for the identification of volunteer subjects for the study from one or more target populations. These volunteers will constitute the sample for the evaluation. The volunteers will be assigned by either a random or non-random methodology (depending on the design) to treatment and control or comparison groups. In this section we discuss issues related to identification of the target populations, recruitment of volunteers, assignment to treatment or control/comparison group, enrollment in the program, and the number of volunteers (i.e., the sample size) needed to obtain estimates of program impacts that have reasonable statistical precision (i.e., the sample size).
One issue that cuts across most of the issue areas considered below concerns the extent to which the program's "normal" process of recruitment and enrollment is maintained during the evaluation period. It seems inevitable that the process will be changed to some degree. Large changes, however, may make it difficult to generalize findings to fathers enrolled through the normal process. Hence, changes to the process that are made for purposes of the evaluation should be minimized, made in a way that is not likely to have an effect on the types of fathers enrolled in the program or the nature of the program itself during the evaluation period, and documented.
In the experimental and randomized outreach designs for a single program, the target population from which all study volunteers will be obtained will be the same for the treatment and control groups. In the non-experimental design, volunteers for the comparison group will come from a different, but similar, population as those for the treatment group -- the comparison target population. In a multi-site evaluation, volunteers will come from target populations at each site and, if a non-experimental design is used, from multiple comparison target populations. Below we first discuss issues related to the definition of the target population for an experimental or randomized outreach design, then consider issues concerning the selection of comparison target populations, and conclude with a discussion of the time frame for recruiting volunteers from the target population.
1. Target Population for Experimental or Randomized Outreach Design
The target population for obtaining study volunteers may be defined as the target population for the program being evaluated. The latter might be defined in many ways, such as all non-custodial fathers in some geographic area or as all non-custodial fathers who come in contact with the program's referral source(s). Using the program's target population as the target population for the study volunteers is important for assuring that the results of the evaluation are results for fathers with the characteristics that are "normally served" by the program.
There is, however, at least one important reason to consider going outside of the program's target population for the evaluation: to increase the number of volunteers by enough to ensure adequate numbers of treatment and control subjects, and to be sure that the program is not underutilized during the study period. If feasible, this should be done in a way that expands the population but does not materially alter the distribution of characteristics of fathers within the population. For instance, it might be possible to expand the program's target area into an adjacent area that has a similar population. Alternatively, if a program recruit's participants through a hospital maternity ward, fathers contacted and recruited through other hospital maternity wards could be added to the target population for the evaluation.
2. Comparison Target Population (Non-Experimental Design)
To the extent possible, the comparison target population should be matched to the treatment target population on characteristics of fathers and characteristics of the environment. Thus, if the treatment target population is all non-custodial fathers in a specific area, the comparison population would best be all non-custodial fathers in another area that have characteristics similar to those of non-custodial fathers in the treatment target population. The economic and policy environments of the two areas should also be similar. Alternatively, if the treatment target population is non-custodial fathers contacted through a hospital's maternity ward, the comparison target population could be non-custodial fathers contacted through the maternity ward of a similar hospital located in an area with a similar economic and policy environment.
In comparing the economic and policy environments in two areas, it is important to consider the possibility of differences in changes to the environments of the two areas over the evaluation period. For instance, if the economy improves in one area relative to the other, it will increase employment and perhaps child support among fathers in that area relative to those in the other area. One way to guard against this is to make sure the areas from which the two target populations are from are geographically adjacent and in the same local jurisdiction (e.g., county). The advantages of such proximity should, however, be weighed against the possibility of spillover effects -- interactions among the fathers in the two populations that might have an effect on the outcomes for either group.
It is likely that any comparison target population will differ in some respects from the program's target population. One way to increase uniformity would be to use a screening mechanism that screens out fathers with certain characteristics that are found in the comparison group population but not the target group population. For instance, if the target population only includes fathers from a specific minority population, then the screen would exclude fathers who are not in that same minority. Screens for age, place of residence, employment, and other factors might also be appropriate.
3. The Time Frame for Recruiting Volunteers
Volunteers for the evaluation will be recruited during a specified time frame. Given the small sizes of existing programs, it is tempting to have a long recruitment period to increase the sample size. An extended recruitment would obviously slow down the evaluation. In addition, the longer the recruitment period, the greater would be the risk that a change in the program or the environment during the recruitment and evaluation period would compromise the evaluation. Hence, the evaluators should be wary of using a very long recruitment period (more than, say, one year) as a means to increase the sample size. Extending the recruitment period is less of a problem in an experimental or randomized outreach design than in a non-experimental design because both treatment and control group volunteers would be subject to the same environmental changes.(1)
The study subjects will be volunteers from one or more target populations. In this section we discuss issues concerning recruitment of the volunteers for the study.
We recommend that the evaluators identify and recruit volunteers for the study in the same way that the program normally identifies and recruits program participants. If the program advertises its services, then the evaluators would advertise for volunteers in the same general way. If an agency refers fathers to the program, then the agency, and perhaps similar agencies, should be asked to refer the same types of fathers to the evaluators.
When a potential volunteer is identified and contacted, the person should be told about the opportunity to participate in an "important research study about fathers who do not have custody of their children." If advertising is used to attract volunteers from the community, the ads should include some information about the study, including information on payments for volunteers who complete the study, and a toll-free number to call. If identified by a referring agency, agency staff should briefly explain the opportunity and offer similar information in written form. In this case it would be desirable to have the agency make a telephone available to the father for the purpose calling the evaluator. The father should also be told that volunteers will be paid a specified amount for completing an interview. The connection between the study and the program should not be mentioned, because volunteers who are not eligible to participate in the program might later be disappointed and upset. The role of agency staff should, in general, be kept to a minimum to avoid burdening them and to limit opportunities they might have to intentionally or unintentionally compromise the evaluation.
During the initial phone contact with the evaluator, the evaluator should:
It seems likely that many fathers who are initially identified as potential volunteers will not volunteer. Extensive efforts could be made to encourage volunteering, but they could ultimately be counterproductive because marginal volunteers might turn out to be very uncooperative study subjects and unlikely program participants. As described above, the process for obtaining volunteers allows fathers to "back-out," without embarrassment or other immediate consequences by simply not contacting, or re-contacting, the evaluator.
Under the experimental and randomized outreach designs, volunteers from the program's target population will be randomly assigned to treatment or control groups. In this section we discuss the process of random assignment.
For the experimental design, we highly recommend that random assignment occur shortly following the baseline interview and also that the interviewer not be involved in the process. Knowledge of the opportunity to participate on the part of either the volunteer or the interviewer could have an effect on the quality and nature of the volunteer's answers, making answers for treatment group members less comparable to those for control group members. This could be accomplished by having the evaluator randomly assign the volunteer after being notified of the completion of the interview, but before reviewing the information obtained from the interview. When evaluation staff have information about a volunteer, they may be tempted to thwart random assignment so that especially "deserving" or "promising" subjects are assigned to treatment, or that undeserving or unpromising subjects are not. The process described above limits such opportunities.(2) Alternatively, study volunteers could be assigned to treatment and control groups based on their Social Security Number (SSN). For example, persons with SSNs ending with the numbers 0, 1, 3, 6, and 8 would be assigned to the treatment group, while those with SSNs ending with the numbers 2, 4, 5, 7, and 9 would be assigned to the control group.
When a volunteer is assigned to the treatment group, the evaluator should then take steps to implement the treatment. Under the experimental design, we recommend that the evaluator contact the program and give the program information needed to contact the volunteer. It would then be up to the program to recruit the father. Control group fathers would not be identified to the program, and would not be recruited.
It may be advisable to have the evaluator call the volunteers assigned to the treatment group before giving their contact information to the program to thank them for participating in the interview and ask them if they would like the opportunity to participate in a special program that helps non-custodial fathers and their children. Only fathers who reply affirmatively would be referred; others would presumably not participate in the program and would be part of the non-participant treatment group (see Chapter Three). If this is not done, some fathers who are contacted by the program following the interview may guess that the interviewer supplied their name to the program without the father's permission to do so.
A somewhat different process would be more appropriate for the randomized outreach design. In this case, the interviewer could ask the father if he wanted someone to contact him with information about a program to help non-custodial fathers and their children. All those who reply affirmatively would then be contacted by the evaluator's staff and provided with the information. The evaluator would also assign volunteers to treatment and control groups upon completion of the interview, but before examination of the interview data. The randomized outreach might be applied in one of two ways.
A simple way would be to have the evaluator provide information to program staff about treatment group volunteers, but not about control group volunteers. Then the program could conduct outreach activities to enroll the treatment group volunteers. A more costly and perhaps problematic way would be to have the evaluator conduct the outreach activities directly, with some outreach to control group volunteers and more intense outreach to treatment group volunteers. The latter method, including some outreach to control group cases, may yield more study participants. This would be important if the program would otherwise have excess capacity. The method has some distinct disadvantages, however, including: being more expensive; being susceptible to manipulation by evaluator staff; being different from the program's "normal" outreach efforts, and perhaps being less effective than comparable outreach that comes directly from the program. If increased program participation is desired, a better approach might be to offer more enrollment incentives to all volunteers upon completion of the baseline interview.
Under the non-experimental design, the only issue is enrolling treatment group volunteers in the program. As in the experimental design, the interviewer should not know to which group the father belongs.
One of the biggest challenges to an evaluation will be finding enough volunteers and eventual participants to obtain a reasonable level of statistical precision for impact estimates. In this section we discuss the relationship between the size of the sample, its division into participants and non-participants, and statistical precision.
The relationship between sample size and estimator precision depends on many factors that cannot be determined in advanced. This is especially the case when multivariate analyses are required to obtain impact estimates, as we would recommend (see Chapter Eight). Analysis of this relationship in the following much simpler situation is, however, indicative of what the actual relationship is likely to be.
Suppose we could simply randomly assign some fathers to participate in the program -- without the option of not participating -- and others to not participate. Then the treatment group would be synonymous with the participants and the control group would by synonymous with non-participants. Suppose also that the outcome of interest was a simple qualitative one; for example, has the father established paternity at the time of the follow-up interview? The difference between the percent of treatment and control fathers establishing paternity is an unbiased estimate of the impact of the program on establishment of paternity.
Even if the estimated difference in percent were positive (larger percent for the treatment), as we would expect, it might be positive just because, by chance, we happened to assign a larger share of fathers who would eventually establish paternity to the treatment group. To be confident that the difference was not just due to chance, we would require the estimated difference to be at least as large as a "critical value" --a value that has a small probability of being exceeded in a controlled experiment if the treatment does not have a positive effect. Formally, we would reject the null hypothesis of no impact (or possibly a negative impact) in favor of the alternative of a positive impact if the difference in percent is positive and larger than the critical value.
The critical value for this test depends on the sample sizes in the treatment and control group, the level of statistical significance desired, and the percent of fathers in the target population who do not establish paternity in the absence of program participation (the "population percent"). Increasing the sample size in either the control or treatment group reduces the critical value because it becomes less likely that a large difference is due to chance. The significance level is the chance that a difference will be greater than the critical value; choosing a smaller value requires the critical value to be larger. The population percent is unknown, but it can be shown that for given sample sizes and significance level, the critical value is greatest when the population percent is 50. In the absence of other information about this percent, 50 percent is often used to determine what the highest critical value would be for a given sample size and significance level.
Critical values for various sample sizes are shown in Exhibit 6.1 for a five percent significance level (the most commonly used level) under the assumption that the population percent establishing paternity is 50. The difference in percent would have to be not only positive, but at least as large as the critical value to be significant at the five percent level (i.e., the chance of the difference exceeding the critical value if the program had no effect is just five percent). (3) In considering the numbers in the table, it is important to keep in mind that the sample sizes refer to volunteers who actually complete the study; numbers of initial volunteers needed to achieved a desired critical value may need to be substantially higher because of anticipated attrition.
For most of the fatherhood programs we are familiar with, it would be difficult to obtain as many as 100 participants and an equal number of non-participants for the evaluation from a single program over a reasonably short period of time. For those sample sizes, the difference in percent would have to be as high as 11.7 percentage points to conclude that it is statistically significant. Is that large enough? This answer partly depends on how large the true difference would have to be for policymakers, funders, and others to conclude that it is an "important" difference. If a difference is not considered important unless it is at least 20 percentage points, then this sample would be of adequate size, but if a five percent difference is considered important it would not be.(4)
One program we visited had substantially more participants than the others. The Racine Goodwill Industries Fatherhood Program receives 55 to 65 court referrals per month, and enrollment for these fathers is mandatory. Presumably a six month evaluation enrollment period would yield over 300 participants. If, say, one or more other counties were used as comparison counties, it would presumably be feasible to generate 300 comparison cases. With this sample size, a difference of 6.7 percentage points would be statistically significant.
Before conducting an impact evaluation, it would be prudent to examine information that is indicative of the likely size of the program's effect on outcomes and compare that to the precision of the estimates for the likely sample sizes. As an example, for a non-experimental evaluation of the Racine program, cross-county differences in rates of compliance with court-ordered child support would be indicative of the possible size of the effect. If differences are only a few percentage points, significant effects are not likely to be found with samples of the size indicated. If, however, much larger differences are found, the evaluation may provide evidence that the program has a substantial impact.
The critical value can be lowered by increasing the size of either the treatment or control sample. Given a total sample size, the lowest critical value can be obtained by splitting the sample evenly between the two groups. In some situations it might be easier to increase the size of one group, but not the other, and there is no reason not to do this other than cost. For instance, if the program is filled to capacity, it would still be valuable to increase the sample size of the control group. In the randomized design it would be necessary to do this by changing the probability that a volunteer is assigned to treatment from 50 percent to some lower figure. This figure should be determined in advanced, based on anticipated numbers of volunteers and known program capacity; filling the program first, then putting additional volunteers into the control group, would violate random assignment.
Different sample sizes for the two groups might be preferred for other reasons, too. One such reason is ethical objections to assigning fathers to the control group when the program is operating below capacity; in this case, the share of volunteers randomly assigned to treatment could be increased above 50 percent, but at the cost of reducing the precision of the estimates.
The primary sources of data available to those who want evaluate responsible fatherhood interventions are surveys and program administrative data. We discuss issues associated with these two data sources in the sections below.
Because most of the data necessary to conduct an evaluation of a fatherhood intervention will not be available from an existing source, the evaluation will necessarily rely on data collected through surveys of fathers, mothers, and, when feasible, children. The use of surveys facilitates the collection of uniform data across all study participants and allows the evaluator to collect information that otherwise might not be available.
In order to evaluate the impact of a program on specific outcomes, data on outcomes and other explanatory variables must be collected for both the treatment and comparison/control group before and after implementation of the program intervention. Therefore, at least two surveys will be administered to the study participants: a baseline survey and at least one follow-up survey. We discuss the timing and content of baseline and follow-up surveys in Section IV below.
There are a number of issues related to the gathering of information from a survey of participants and their families; most are not unique to the evaluation of fatherhood programs. These include:
These issues are likely to be resolved based on cost and feasibility considerations, and based on the outcomes of primary interest to the evaluators. We recommend, however, the use of in-person interviews for collecting the survey data. In-person interviews have several advantages over telephone or self-administered surveys. First, the response rate for in-person interviews is better than for self-administered interviews, and the number of incomplete answers are likely to be fewer. Second, the baseline and follow-up surveys required for a responsible fatherhood evaluation are rather lengthy. It may be difficult and uncomfortable to keep a respondent on the telephone for an extended period of time. Third, in-person interviews allow for the use of visual aids (e.g., flashcards listing potential responses) to illicit uniform responses. Finally, if there is a payment associated with the respondent's effort in completing the survey, that payment may be made directly to the respondent once the survey is completed.
An issue associated with the use of in-person interviews is where to conduct the interview. It may be more convenient for respondents to have the interviewer come to their home to administer the survey. Conducting an in-home interview, however, may have several problems: the area or environment may not be safe for the interviewer; there is a greater possibility of interruptions during the interview (e.g., the telephone, the presence of other family members); and the presence of other persons in the house who may overhear the interview may affect responses. For these reasons, it may be desirable to designate an easily accessible location where all interviews can be conducted.
Costs associated with conducting the survey will also depend on the outcomes of primary interest. For example, information on outcomes such as earnings, child support, paternity establishment, hours of contact with one's child, subsequent children out-of-wedlock, and drug or alcohol use may be easily collected from self-reports made by both fathers and mothers. If more complex information on child well-being, father well-being, father/child relationships, and father/mother relationships is desired, however, greater resources will be required both in developing and administering an instrument to measure these concepts.
Once the variables of interest have been determined, the evaluator must develop the survey instrument. We have discussed potential measures for program outcomes and explanatory variables in Chapters Four and Five. As discussed in those chapters, it is best to rely on survey questions that have been used in previous studies. Many national surveys collect information on many of the same variables that will be of interest to a fatherhood program evaluation. These instruments may serve as guides to the evaluator when developing questions for the baseline and follow-up surveys. Once the surveys have been developed, they should be "pre-tested" -- administered to a small sample of subjects -- to learn about possible problems (e.g., ambiguous questions, new alternatives missing from response lists), and correct them when possible. Pre-testing usually precedes from a slow "talk-through" with readily available respondents through interviews that are conducted with respondents from the target population as if they were participating in the actual survey. When the final instrument has been developed, the interviewers who will be administering the survey will need to be trained to ensure that the survey is administered correctly and uniformly across all respondents.
Another potential source of information that may be used to conduct an evaluation is program administrative data. Most programs collect and maintain some information on their participants. One program we visited collects a variety of information on the initial application forms including:
In addition, this program is currently developing a follow-up database that will track outcomes for participants in the areas of paternity establishment, child support, arrears, visitation, employment, job duration, wages, educational attainment, and criminal activity. Follow-up information will be collected on former participants every six months.
These types of administrative data can be useful for conducting an impact evaluation, however, they suffer from one critical flaw: they are available only for persons who actually enroll in the program. Unless similar data can be obtained for the control or comparison group, a rigorous impact evaluation cannot be conducted using program administrative data alone. Data on outcomes for participants obtained through administrative can be useful for comparing the outcomes for participants at program completion to their outcomes as measured at follow-up (typically some months later). Such a comparison can illustrate temporary versus longer lasting program effects.
Another type of information often available through program administrative records which can be very useful to an evaluation is information on the types and levels of services that program participants receive. It is important for an impact evaluation that all persons in the treatment group receive the same treatment. If analysis of information on program inputs reveals that participants receive different types or levels of services, then there may be reason to believe that estimates of program impacts will be biased. Unfortunately, this type of bias is difficult or impossible to control for since differences in services levels across participants is unlikely to be random. If there are differences in service levels/types across participants, it is probably because participants have different needs. The information and sample sizes necessary to correct for bias stemming from the selection between participants and services will not typically be available to evaluators.
In this section we discuss issues related to the administration of the baseline and follow-up surveys. We begin with a brief discussion of the content of baseline and follow-up surveys, and then discuss timing issues associated with survey data collection efforts.
The initial or baseline survey should collect information on all explanatory and outcome variables of interest. We refer the reader to Chapter Five for a discussion of potential explanatory variables for inclusion in the survey instrument, and Chapter Four for a discussion of potential outcome variables. The baseline survey will be more comprehensive than the follow-up survey because it is not necessary to collect follow-up information on characteristics that do not change over the observation period (e.g., date of birth, race, sex, source of referral, employment history, etc.). For purposes of the impact evaluation, follow-up surveys need only focus on collecting information on the outcomes of interest. In addition, follow-up surveys might include questions about whether study participants received any services similar to those provided by the program being evaluated from any other source. If treatment or control group members received services outside the program, it should be accounted for in the impact analysis.
Programs may wish to collect other types of information on a follow-up survey that may not be directly useful to the impact evaluation. For example, information on the participant's experience in program, such as which services he found most/least useful, can aid program staff in improving the effectiveness of their services.
Baseline Surveys: Ideally, the baseline survey should be conducted as soon after individuals are recruited into the study as possible, and before the individual has been assigned to the treatment or control group. This is for several reasons: First, it will ensure that interviewers do not know ("are blind to") the treatment status of the persons they are interviewing, and therefore will not introduce any unintended bias through the manner in which interviewers are administering the questionnaire. Second, it will ensure that interviewees' responses will not be influenced by referral to or subsequent contact with the program. Third, the more quickly the survey is implemented, the less likely individuals will be lost from the sample, either through subsequent lack of interest in participating in the study or because they can no longer be located.
Follow-Up Surveys: The length of time between conducting the baseline and follow-up surveys will depend on several factors: the length of time it takes a participant to complete the program; the particular outcomes of interest to the evaluation; and whether or not long-term impacts are of interest to the evaluation. In general, the follow-up survey should be conducted after a time interval that is sufficient for program participants to have completed the treatment, and for the treatment to have had an impact on the outcomes of interest.
Our site visits provided us with a contrasting example. One site has a defined six-week curriculum that has, as one primary focus, the goal of improving the employment prospects of young fathers. For this program, a follow-up survey may be conducted at a rather short interval following program completion, as the impact of the job readiness skills taught in the program are likely to have an immediate impact on the outcome of interest (employment).
In contrast, another site relies on intensive case management that focuses on changing the attitudes of participants so that they find in themselves the ability to achieve whatever goals they wish to pursue. For example, assume that employment is an outcome of interest to the program. The program treatment helps participants to realize that if they want to be employed in a decent job, they have the knowledge inside themselves to discover the means to do so. The treatment does not directly teach them job readiness skills, rather, the treatment induces them to go out and obtain the job readiness training or awakens the skills they already posses. A follow-up survey for participants in this program may need to be administered after a much longer interval because the treatment (learning self awareness and self empowerment) works indirectly on the outcomes of interest.
To conduct an evaluation of longer-term impacts of an intervention (for example, on paternity establishment, fathering of new children, interactions with children, educational attainment, employment, and substance abuse) one would want to obtain information on program participants for a three- to five-year period following participation (and possibly longer). Program administrators at the second program discussed above indicated that follow-up over a prolonged period could be a problem because participants are typically quite mobile and time-consuming to find. When outreach specialists were asked about tracking former participants for several years after participation, they indicated that for some it would be possible, but for others it would be very difficult. It should be noted that a previous study of this program met with mixed results in efforts to contact former participants.
Despite efforts to improve tracking procedures, attrition from the sample is still likely over a prolonged period, especially when tracking high risk populations. Differential attrition in participant and comparison groups could result in attrition bias in outcome comparisons. If follow-up interviews are to be conducted after long intervals, heavy emphasis must be placed on methods to reduce attrition.
Success in reaching study participants for shorter or longer-term follow-up can be enhanced by collecting more systematic contact information (e.g., information about friends and relatives) at the time of intake and at termination from the program. The offer of a monetary incentive for individuals responding to follow-up surveys or even to contact the program at various intervals in the future might enhance the ability of the program to track former participants over a prolonged period. Contact information as well as some useful follow-up outcome data may be obtainable through institutional records, though confidentiality requirements may be a difficult constraint to overcome. Unemployment insurance data, school records, criminal justice system data, and information maintained by welfare and child protective service agencies are examples of possible sources of information that might be used both to track participants and to collect independent outcome data. In addition, tracking fathers through their children, who may be easier to find, is another alternative. Finding that a father is not in touch with his child provides important information regarding certain outcomes of the fatherhood intervention.
1. The evaluator should consider including control variables for date of recruitment in the multivariate models.
2. This problem is illustrated by the pilot test of SSA's Project NetWork demonstration. At the time, Lewin staff were helping design the baseline survey and we had an opportunity to review the pilot study data. We discovered systematic differences between the characteristics of "randomly assigned" treatment and control subjects. Upon investigation, it was determined that the case managers had influenced the assignment process to assign those with the best rehabilitation prospects to the treatment group -- a problem that was fixed for the later evaluation.
3. The test discussed here is a "one-tailed test" -- the null hypothesis of "no impact" is being tested against the alternative of a "positive impact." The null hypothesis is only rejected if the realized difference is positive and sufficiently large. Of course, the program could have a negative impact, in which case the realized difference is likely to be negative. Given the way the test is constructed, any negative difference, no matter how large, would lead to acceptance of the null hypothesis. This would be fine as long as the policy implications of a negative effect are the same as those for no effect. If they are different, the evaluator may want to use a two-tailed test. Use of a two-tailed test would increase each critical value in the table by almost 20 percent.
4. More precisely, if the true effect is an increase of 20 percentage points, then it is unlikely that this test would lead to a conclusion of "no difference," but if the true effect is only five percentage points we are likely to conclude that there is no difference. If effects as small as five percent are of little interest, then the conclusion of no difference in the latter case would have no serious consequences, while if a five percent difference is considered important, such a mistake would be unfortunate.