In this chapter we discuss the analyses of the evaluation data that will be necessary to estimate the impacts of responsible fatherhood programs. Given the preliminary and general nature of this evaluation design, the analysis methods discussed are intended to be illustrative of the methods that will be required. We provide a non-technical discussion of the methodology here; a technical presentation of the methodology appears in Appendix E.
In general, the impact evaluation will examine differences between outcomes for participants and non-participants. While the easiest way to conduct such an analysis is to compare differences in means or percentages of outcome variables for the two groups, outcome differences may reflect factors other than the impact of the program -- especially systematic differences due to the selection of study volunteers into the participant and non-participant groups, as well as others. We recommend using more complex multivariate methods in order to address these issues.
The selection issue we focus on in this chapter is the selection of study volunteers into participant and non-participant groups. As discussed in Chapter Three, participants and treatment group subjects are not synonymous. In all three designs (experimental, non-experimental, and randomized outreach) some treatment group subjects will choose not to participate, and in the randomized outreach design some control group subjects will participate. The methodology must, then, explicitly recognize the difference between "treatment" and "participation."
There are two other selection issues that we do not consider, but that deserve mention. The first is self-selection of study volunteers from the target population. Outcomes for study volunteers are likely to differ systematically from outcomes for other fathers in the target population, regardless of participation, and the impacts of participation on study volunteers may also differ from those that might be achieved for non-volunteers were they to participate. Studying the selection of volunteers would provide information about the extent to which estimated impacts of the demonstration would generalize to other fathers, but such a study would be difficult and costly to perform. We recommend, instead, that scarce resources be used to obtain estimates of the impacts of participation on those who volunteer.
The second selection issue that we will not consider further is attrition of study volunteers. No matter how intense the effort to obtain follow-up data from all volunteers, some will inevitably be lost to the study. This issue would be essentially the same as the issue of self-selection of volunteers if attrition were unrelated to program participation; then, those who leave the sample could be viewed as non-volunteers. It is quite possible, however, that attrition will be related to program participation, with participants less likely to drop out than non-participants.(1) Further, attrition among participants could be related to outcomes, with fathers who have less favorable outcomes more likely to leave the sample. If attrition rates vary substantially across participants and non-participants, then some effort should be made to correct for possible attrition bias.(2)
We focus on the analysis of data collected under an experimental study design (see Chapter Three), but also discuss how the analysis would need to be modified under each of the two alternative designs (non-experimental and randomized outreach). Differences in methodologies for the three alternative designs are subtle, but important.
We first present a methodology for evaluating the impact of a single program at a single site (Section II). To simplify the presentation, we discuss how the analysis would proceed for a single outcome variable that is assumed to be a continuous variable with an unlimited range (e.g., a child's score on a psychological assessment of anxiety or depression). This model can be repeated for multiple continuous outcome variables. We also discuss extensions of the model to qualitative (e.g., paternity establishment) or limited dependent variables (e.g., level of child support payments).
After completing the discussion of the methodology for evaluating the impacts of a single program at a single site, we discuss a methodology for jointly evaluating the impacts of multiple programs and/or multiple sites of the same program (Section III).
1. Difference in Means Analysis
In an experimental evaluation, fathers are randomly selected for referral to the program. If all treatment fathers participate in the program and all control fathers did not, then the impact of the program on a continuous outcome variable could be measured as the difference in means for the treatment and control groups. If the sample sizes are reasonably large, random assignment to the two groups makes it very likely that any substantial difference in means is due to the program and not due to random differences in the characteristics of fathers in the two groups, which are likely to be small.
Some fathers who are referred do not, however, participate in the program.(3) Presumably their outcomes would be more favorable if they did participate. If so, the difference in mean outcomes is likely to understate the program's impact on the average eligible father. The difference in mean outcomes will still be an unbiased estimate of the impact of referring fathers to the program, but funders and others are more likely to be interested in the impact on fathers who actually participate because it is only those fathers that make use of substantial program resources.(4)
One might be tempted, instead, to use the difference in mean outcomes for participants (i.e., for the subset of treatment group fathers who participated) and non-participants (i.e., for control group fathers plus non-participating treatment group fathers). This is likely to overstate the program's impact because participating fathers may be more motivated than non-participating fathers, and thus may achieve better outcomes than non-participating fathers even without participating in the program.
Instead, an unbiased estimate can be obtained by dividing the difference in mean outcomes for the treatment and control groups by the share of the treatment group that participates, as described in Chapter Three. This corrects for the fact that only a share of the fathers in the treatment group actually participate in the program. Because the share who participate is less than one, the resulting estimate will be larger than the difference in the mean outcomes for the treatment and control groups.(5)
2. Regression Analysis
For reasons discussed in Chapter Five, the evaluator may want to control for the influence of other explanatory variables on the outcome variable in estimating the effect of participation. If all treatment fathers participate, this is accomplished through a regression analysis of the outcome variable. The regression model specifies that the outcome variable is a (linear) function of a set of explanatory variables, and the coefficient of each explanatory variable represents the effect of a change in that variable on the expected value of the outcome variable, holding all other explanatory variables constant. One of the explanatory variables would be a dummy variable to indicate whether the individual is in the treatment group; other explanatory variables would represent baseline characteristics thought to have an effect on the outcome variable. The estimated coefficient of the treatment dummy would be the estimate of the treatment effect.
If all referred fathers did participate, the expected value for this estimate of the program's impact on the outcome variable would be identical to the expected value of the difference in mean outcomes for the treatment and control group. The only reason we would prefer this estimate to the difference in mean estimate is that the standard error of the estimate would be lower -- provided that we had judiciously chosen as the other explanatory variables a set of variables that explained substantial amount of the variation in outcomes across fathers.(6)
All referred fathers will not participate, however, and the coefficient of the treatment dummy from the regression will be too small (biased toward zero) as an estimate of the program's impact on participants, just as the difference in mean outcomes for treatment and control fathers would be. We could, instead, replace the treatment dummy with a participation dummy, which would be coded as one for treatment fathers who participate only, and zero for everyone else. The estimated coefficient of the participation dummy would likely be too large as an estimate of the impact of participation (biased away from zero) for the same reason that the difference between the mean outcomes for participants and non-participants is too large: those fathers who choose participate are likely to be more highly motivated and have better outcomes than those who choose not to participate even in the absence of participation.
The solution to the bias problem in the regression approach is a mathematical extension of the solution used in the difference in means approach, although not obviously so. Instead of using either the participation dummy or the treatment dummy as an explanatory variable in the regression model, the analyst should use a "modified participation variable" that, like these two variables, is zero for all control group fathers, but that is equal to an estimated "participation probability" for treatment group fathers. Specifically, the value assigned to treatment group fathers, whether or not they actually participate in the program, should be the estimated conditional participation probability obtained from the participation analysis (Chapter Seven and Appendix E). Use of this value instead of a value of one for all treatment group fathers is analogous to dividing the difference in mean outcomes for the treatment and control fathers by the share of treatment fathers who participated in the program.
While it is relatively easy to obtain this "two-step" regression estimator of the participation effect, computation of standard errors is more problematic because correct standard errors need to take account of estimation errors in the participation probabilities. Further, use of a maximum likelihood estimator for the joint participation and outcome models, or some other joint estimator that is computationally simpler, may produce more estimates with lower standard errors.
Two features of the regression methodology deserve further attention before we turn to variants for alternative evaluation designs. First, the methodology can be used to estimate participation effects even if there is no control group other than self-selected non-participants, but is not likely to work well. In such a case, it would be essential that some elements of the characteristics that determine conditional participation probability not be included among the other explanatory variables included in the outcome equation. Otherwise, the conditional participation probabilities will be highly (multi-) collinear with these variables, resulting in a very imprecise estimate of the program impact. Strong candidates for variables to include in the participation equation, but not in the outcome equation -- variables that have a strong effect on the probability of participation but only a negligible direct effect on the outcome variable -- are hard to find. We will return to this issue in the discussion of the methodology for a randomized outreach evaluation, where it is more critical.
A second feature of this methodology is the implicit assumption that program participation has the same impact for all participating fathers. This seems unlikely. A much more general model would specify entirely different relationships between the outcome variable and father characteristics for participants and non-participants; that is, participation would be modeled as changing the entire relationship between the baseline characteristics and the outcome variable, rather than a "parallel shift" of the equation.(7) Under this model the impact of program participation would vary with baseline characteristics in a very nonrestrictive way.
The sample sizes that would be required to obtain reasonably precise estimates of such a general model are not likely to be achieved given the size of current responsible fatherhood programs. We recommend, instead, that the assessment of variation in impacts with baseline characteristics be limited to examining interactions between impacts and a very small number of key characteristics, assuming that the effects of other baseline characteristics on outcomes are invariant to participation. This can be done by including as explanatory variables in the regression equation variables that are products of the conditional participation probability and selected characteristics of fathers, as discussed further in Appendix E.
In the non-experimental design presented in Chapter Three there is a group of volunteers from the target population for the responsible fatherhood program being evaluated -- the treatment group -- who may or may not choose to participate in the program and comparison population of fathers -- the comparison group -- who do not have the option of participating in the program. Thus, volunteers are in the treatment or comparison group because they are drawn from two separate populations; in contrast, in the experimental design study volunteers come from the same population and are randomly assigned to one group or the other. The absence of random assignment means that the characteristics of treatment group fathers likely differ from those of comparison group fathers in their baseline characteristics. The difference in mean outcomes for the treatment and comparison group fathers would presumably reflect differences in baseline characteristics as well as the program participation of some treatment group fathers.
The regression methodology described for the experimental case can be used to solve, or at least reduce, the problem caused by non-random assignment. The application of that methodology, including the use of estimated participation probabilities in outcome regressions, would be just the same is in the experimental design. In the non-experimental design, however, the other explanatory variables in the model serve to control for differences in baseline characteristics of treatment and comparison fathers, as well as to reduce standard errors. Hence, it is especially important to measure baseline characteristics that are important determinants of the outcome under this design. Confidence that the estimated program effect reflects the impact of the program, and not systematic differences in the baseline characteristics of the two groups of study volunteers, will depend on how well the evaluators can perform this task.
In many situations, the best statistical predictors of human behavior in a given period are proven measures of the same or similar behavior in previous periods -- employment this year is a much better predictor of employment next year than such variables as education, age, race, ethnicity, and family characteristics, for example. Hence, the outcome variable measured at baseline ought to be high on the priority list for explanatory variables to include in the outcome regression model. Thus, if the outcome variable is the child's score on a psychological test, there would be substantial benefit in testing the child at baseline as well as at follow-up.
In the randomized outreach design (Chapter Three), study volunteers are randomly assigned to receive strong (treatment) or weak (control) outreach. Fathers in either group may decide to participate in the program, but the differences in outreach are expected to result in higher participation rates among fathers who receive the treatment outreach.
The regression methodology described for the experimental design can be applied to this type of design after making two modifications. First, as discussed in Chapter Seven, the conditional participation probabilities will be estimated from data for both the treatment and control subjects, one or more variables representing the randomized outreach will be key determinants of those probabilities. Second, the definition of the "modified participation variable" needs to be changed for the control group fathers. Recall that in the experimental design this variable is equal to the estimated conditional participation probability from the participation analysis for all treatment group fathers and zero for all control group fathers. In the randomized outreach design, the variable is the conditional participation probability for all fathers. These probabilities will be presumably be lower for control fathers than for treatment fathers, but they will not be zero, as in the experimental case.
In all other respects the model for the experimental design applies. With the modification in place, the estimated coefficient of the participation variable will be an unbiased estimate of the impact of the program on the outcome variable.
The role and importance of effective treatment outreach becomes evident by recognizing that this model is formally equivalent to a model discussed at the end of Section III.A, above, in which all volunteers are self-selected into participant or non-participant groups. We criticized that model on the grounds that the participation probabilities would likely be highly collinear with other explanatory variables in the outcome equation. The randomized outreach serves to break up this collinearity; the outreach variable would presumably be a key determinant of the participation probability, but would not be included among the other explanatory variables in the outcome equation.
The role of randomized outreach in the estimation methodology implies that the outreach must satisfy two important criteria. First, it must be effective; if it does not have a substantial impact on the probability of participation it will do little to reduce the collinearity between participation probabilities and other explanatory variables in the outcome equation. Second, it should have a negligible direct effect on outcomes. Some outreach methods might have substantial direct effects: efforts by respected role models to persuade fathers to participate and promises of long-term financial or other material rewards for participating are examples. Such methods might also be very effective in increasing participation, so some care must be exercised to avoid using them if the objective of random outreach is to help the evaluator separate impact effects from selection effects.
To this point we have assumed in our model specifications that the outcome (dependent) variable is a continuous variable with unlimited range. It is likely, however, that many key outcome variables will not satisfy both of these conditions. Some will be categorical (e.g. have the father and mother married) while others will have a limited range (e.g., hours of child contact and level of child support cannot be negative). Further, among categorical variables there are likely to be two types: qualitative variables, that indicate which of two unranked categories a father is in, and ordinal variables, where the categories have a meaningful ranking from lowest to highest (e.g., responses to questions that require selection of a value on, say, a five-point scale).
Appropriate modifications to the regression model can be made to accommodate each of these types of dependent variables. Possibilities include:
The selection issue that is addressed in the context of regression analysis for the continuous unrestricted outcome variable assumed previously must also be addressed in these models. The approach to solving the problem is essentially the same as in the regression case. The evaluator could include an estimated participation probability variable, estimated from the participation analysis, as an explanatory variable in any one of these multivariate models. As in the regression case, however, the preferred estimation method is likely to involve joint estimation of the outcome and participation equations, by maximum likelihood or perhaps by some method that is less computationally intensive.
In this section we begin by modifying the methodology discussed in Section II.A for the estimation of impacts in an experimental design for the evaluation of one program to the joint evaluation of multiple programs (including multiple sites for a single program). We assume that volunteers at each site are randomly assigned to control and treatment groups, that some treatment subjects do not participate in the program at each site, and that all control subjects do not participate. We also assume there is no cross-site contamination (e.g., subjects at one site participating in the program at another site.) We then turn to using the modified model in non-experimental and randomized outreach designs.
Assuming for the moment that all treatment group fathers at all sites participate in the program, only two modifications to the regression methodology for the experimental single-site evaluation are needed. First, a set of "site dummies," variables distinguishing each site, should be added as explanatory variables in the regression. These will control for differences in the demographic, economic, and policy environments across the sites that are not captured by baseline characteristics of fathers. If the number of sites is very large, these could be replaced by a smaller set of variables that describe the environmental factors. While this would allow the evaluator to assess the effects of specific environmental factors on outcomes, it is unrealistic to expect meaningful results from such an analysis unless the number of sites is very large and the key environmental differences can be captured in a small number of variables.
Second, instead of using a single dummy variable to indicate whether the father is in the treatment or control group, the evaluator will likely use a separate dummy variable for each site. The coefficient of the treatment dummy for each site will be the estimate of the impact for treatment group fathers at that site. Estimates are likely to vary across sites because of variation in the way the programs are implemented, as well as for other reasons.
For any pair of sites, the difference in impacts can be estimated as the difference between the corresponding treatment dummy coefficients and a statistical for the null hypothesis of "no difference" can be easily performed. If the difference is not statistically significant, the evaluator may improve the precision of the estimates by constraining the estimated impacts for the pair of sites to be the same. This would be especially appealing for programs that are similar with respect to key program characteristics (e.g., multiple sites of a single program).
Of course not all treatment group fathers will participate, and the estimation procedure needs to be modified to take this into account. Analogous to the single-site case, the evaluator will need to replace the treatment dummy for each site by a modified participation dummy. For each site this dummy will have a value of zero for all control fathers at the site as well as for all fathers at other sites. For treatment group fathers at the site, the value of one should be replace with an estimated participation probability, obtained from the participation analysis (see Chapter Seven).
In Chapter Three we indicated that evaluating multiple sites would be one way to address the problem of small samples likely to be encountered in a single site evaluation. The gains are greatest if the programs' impacts and effects of other variables on outcomes are the same at all sites. Then, adding new sites is equivalent to increasing the sample at the first site. If the sites are sufficiently disparate in their programs, target populations, and environments, then there is no gain over conducting separate, single-site evaluations. The reality of any multi-site evaluation is likely to be somewhere in between. In selecting sites for a multi-site evaluation, homogeneous sites should be preferred over heterogeneous sites, other things equal, if improving estimator precision is a priority.
As in the single-site case, the methodology developed for the experimental design can be reasonably applied to the non-experimental design if careful attention is paid to measuring baseline characteristics that are predictive of outcomes. We assume that there would be a comparison group for each site and that each comparison group site would be matched to its corresponding treatment site on environmental characteristics that are likely to have an impact on outcomes. Under this condition, the site dummies in the model would capture the environmental factors common to each site.
An alternative would be to have a different, perhaps smaller, number of comparison sites than treatment sites. In the absence of matches for each sites, the site dummies would have to be dropped. They could be replaced with a set of variables that measure key aspects of the environment at each site, including the treatment sites (e.g., strength of the local labor market). The number of such variables would have to be small relative to the number of sites to obtain meaningful results.
Under the randomized outreach design the specification for the outcome equation would be the same as under the experimental design except that the participation variable for each site would be set equal to participation probabilities for all fathers at the site, whether treatment or control, and to zero for fathers at all other sites. As discussed in Chapter Seven, the participation analysis itself would use data from both treatment and control fathers at all sites and the explanatory variables for that analysis would include both site and treatment dummies.
1. It is also possible that attrition could be related to whether the subject is in the treatment or control group, regardless of participation, but this seems less likely.
2. The evaluator may want to estimate an attrition model analogous to that of the participation model for use in correcting possible attrition bias in the impact analysis. See Maddala, G.S. (1990) Limited Dependent and Qualitative Variables in Econometrics, Chapter 9, Cambridge: Cambridge University Press.
3. We assume the control fathers are not allowed to participate.
4. Referred fathers who do not participate may use some program resources in the recruiting process. While these may be small for each such father in comparison to resources used by participants, if there is a large number of such fathers, expenses incurred for their unsuccessful recruitment should not be neglected.
5. The appropriateness of this correction can be demonstrated mathematically as follows. Let p be the share of referred fathers who participate, let d be the mean effect of their participation (the quantity we are trying to estimate), let op be what their mean outcome would be if they did not participate, let on be the mean outcome for referred fathers who are non-participants, let ot be the mean outcome for all treatment group fathers combined, and let oc be the mean outcome for the control group. In the absence of the program, we would expect the control group and treatment group mean outcomes to be about the same (i.e., they would be the same except for random chance differences, which will almost certainly be small if the sample is reasonably large). The mean outcome for the treatment group would, in the absence of the program, be equal to:
p x op + (1-p) x on,
and this would approximately equal oc. Because of the program, however, the mean outcome for the treatment group is:
ot = p x (op + d) + (1-p) x on = p x d + p x op + (1-p) x on = p x d + oc.
The last equality is approximate, based upon the expected relationship between the means for the treatment and control groups in the absence of the program. Subtracting oc from the left- and right-hand sides of this equation and dividing by p yields the estimate described in the text:
(ot - oc)/p = d.
6. If the other explanatory variables explain little variation in outcomes, the standard error from the regression estimate may actually be higher than that for the difference in mean estimate, essentially because we have "wasted" information in our sample by trying to estimate the effects of some unimportant variables on outcomes.
7. See Maddala, op cit.
8. See Maddala, op cit.