Colleges and universities are under growing pressure to improve performance metrics such as the four-year graduation rate and the placement of graduates in jobs aligned with their training. At first glance, higher education would appear to be in a great position to understand and improve its own metrics. After all, colleges and universities are full of skilled data analysts who understand the ins and outs of education policy. College professors, graduate students, and staff are among the world’s top producers of education research.
Improving Performance Metrics
Unfortunately, improving performance metrics is not a simple matter of reading the right research articles. Education research commonly focuses on primary and secondary schools rather than higher education, and when higher education is studied the research is rarely tailored to the needs of administrators. Conversely, when a college or university rolls out a new student policy, it rarely does so in a way that facilitates evaluation and research.
There are several ways that the practices of researchers and administrators are out of alignment.
Colleges and universities are famously slow to change their policies, but when they do so the change can be precipitate. The new policy is announced and the old policy is retired.
It is simply assumed that the new policy will lead to better results. There is rarely a plan for comparing the results of the new and old policy to see if progress is actually being made. There may in fact be no appetite for such a comparison, since if the new policy is not an improvement the administrators who promoted it may be embarrassed.
Professors do occasionally evaluate a higher-education policy, but the goals and timing of a professor’s evaluation study will rarely coincide with the needs of higher education administrators. A professor’s evaluation is typically carried out years after the policy change, and is often motivated by the professor’s research agenda rather than the concerns of a university administration.
The quality of any evaluation may be limited by the way that the policy was rolled out. The generality of an evaluation may be limited to the specific university that was studied. Administrators at another university may wonder whether a policy success can be replicated at their own institution.
None of these problems are new, and none are unique to higher education. In fact, many of these concerns were raised in 1969 when Donald Campbell described the challenges of evaluating government programs.
Colleges and universities need an approach that more tightly integrates policy changes with research and evaluation. Policies should be rolled out in a manner that facilitates evaluation, and timely evaluation should be carried out in a way that serves the needs of administrators.
Donald Campbell was one of the first observers to recognize the need for an integrated approach. In 1969 Campbell proposed that policy changes should be approached as “experiments.” Instead of ramming through a reform and assuming success, Campbell proposed that administrators should propose one or more policies, evaluate them on a trial basis, and expand them only if initial evaluation suggested they were effective. The design of these policy experiments should be as rigorous as a laboratory experiment. If possible, Campbell wrote, policy experiments should use random assignment to ensure that the groups subject to different policies were comparable.
The idea of using random assignment to compare different policies or strategies has become increasingly popular in recent years. The approach is known under different names in different fields. In medicine and social science, it is often called a randomized control trial; in advertising and software it is called an A/B test; and in finance and manufacturing it is known as a champion-challenger test. The basic idea of a champion-challenger test is simple. Whatever policy is currently in place is called the champion; any proposed change is called a challenger.
Motivations for Champion – Challenger Testing
To understand the motivation for a champion-challenger test, consider the case of College X. College X enrolls a diverse student body that includes some very strong students as well as some students whose high school grades and ACT scores suggest a high risk of dropout. For years College X has required high-risk students to attend a week-long remedial “boot camp” in the summer before their first year. Many boot camp attendees later drop out anyway, and there is some doubt about whether the boot camp helps at all. On the other hand, it is conceivable that even more high-risk students would drop out if it weren’t for the boot camp. College X doesn’t really know.
College X has several options. It can continue the boot camp as is, on the assumption that it is better than nothing. It can modify the boot camp in some way; for example, it might “flip” the classroom so that exercises are done in class and lectures are viewed at home. Finally, it might discontinue the boot camp and reallocate the funds to some other service, such as a drop-in tutoring center.
None of these is necessarily a bad option. The problem is that College X doesn’t know which option is the best. If it simply makes a choice and hopes for good things, it may find itself revisiting the choice again in a few years. For example, it may find that some faculty are disappointed in the tutoring center and want to bring back the boot camp. What progress is possible if College X continues to make decisions in this way?
Instead of making a blind decision, College X decides to carry out a champion-challenger test. One third of the incoming class will participate in the traditional boot camp (the current champion); one third will participate in a “flipped” boot camp (challenger #1); and one third will not participate in either boot camp but will get priority in accessing the tutoring center (challenger #2). The students will be assigned to campus and tutoring at random in order to ensure that the students in each group are comparable with respect to background and motivation.
After the boot camp, College X will monitor each group’s progress through its first academic year. How many students in each group drop out? What classes does each group of students take, and what grades do they receive? By the end of the academic year, College X will know which group of student has had the best performance, so it will know which program it should offer to the next incoming class. That option will be the new champion.
Whichever choice College X makes, it will be on its way to becoming not just an organization that teaches effectively, but an organization that learns effectively as well.
 Campbell, Donald T. 1969. “Reforms as Experiments.” American Psychologist 24(4):409–29.
Dr. Paul von Hippel
A professor in UT Austin's LBJ School of Public Affairs, Dr. von Hippel is an expert on research design, and on statistical methods for missing data. Before his academic career he was a data scientist who developed fraud-detection scores for banks including JP Morgan Chase and the Bank of America. His research interests include educational inequality and the relationship between schooling, health, and obesity.