Randomized controlled trials (RCTs) have been touted as the gold standard in causal inference for many years. But is it true? Is there room to leverage modern data analytics to go beyond RCT?
Unfortunately, the answer is a lot more nuanced. The emergence of big data and associated processing capabilities led to complacency that big data analytics can come to the rescue. To begin the series, we investigate how RCTs compare to observational studies. In future parts of the series, we continue by investigating the synergistic roles data science and causal inference play in helping us uncover relationships between data and its ability to answer causal research or business questions.
From a methodological standpoint, let’s compare relative merits and weaknesses of RCT and observational studies, which can benefit from data processing algorithms.
Randomized Controlled Testing (RCT)
Observational – data-driven approach
Can account for confounders, such as social psychological factors
Simpler and more natural, intervention knowledge can be obtained through data analysis under certain situations
Randomization is necessary with campaigns with highly restrictive inclusion/exclusion criteria when pre-post matching is not possible
With proper matching, can glean more nuanced intervention insights to improve operations
A lot of relevant feature engineering and machine learning algorithms
Does not suffer from attrition effects
More difficult and expensive to implement and execute
Cannot account for certain types of confounders unless they can be inferred from data
Sensitive to attrition (can be mitigated via matching, followed by pairwise randomization)
Dependence on machine learning algorithms and practitioners
Unanticipated consequences for non double-blind random assignment
Baseline matching is not always possible, especially with highly restrictive inclusion/exclusion criteria
Randomization can be complex with multiple nested units, such as faculty, section, course, and department
Pre-post matching can be challenging especially in constantly changing environments in terms of student success initiatives and metric non-stationarity
Difficult to get to the nuances of why intervention worked or did not
Now, let’s explore the strengths of RCT and preview the role of signal processing in helping observational studies overcome their limitations in areas where RCTs are superior.
In general, regular RCTs randomly assign a pool of students to pilot or control. Such random assignment can account for confounders that are difficult to measure with observational data. However, data science and digital signal processing algorithms with more granular time-series event data can help us glean insights into social, psychological, and behavioral factors of students. Let’s go through a few examples of how such algorithms can be leveraged to deal effectively with confounders.
For example, a student’s enrollment and course-taking patterns can be used to infer the student’s proactive or procrastinating nature in terms of starting a new term fully in charge or wandering around in relation to optimal completion pathways. LMS data can be useful in assessing the student’s level of engagement, responses under adversities, and consistency in preparing for assignments and exams. Natural language processing of student postings and replies, along with social network analytics, can provide clues on difficult-to-capture confounders. Collectively, these techniques are called feature engineering. Feature engineering has a powerful role in providing predictive and prescriptive insights as well as increasing the utility of student time-series data with student touch points in observational settings.
In certain situations, A/B testing (or RCT) is highly preferred and recommended, for example, when: Reaching out to all students based on highly restrictive inclusion and exclusion criteria, such as prediction scores, GPA, and credits attempted — where it is practically impossible to find matching students in baseline (at the same time).
Pre-post matching is not possible due to enactment of major policy or curriculum changes, which influences student success rates substantially before and after the enactment.
Randomization is not as easy as it sounds, especially when there are multiple nested units of randomization and success metrics. In higher education, these units encompass courses, sections, departments, colleges, faculty, and advisors. For example, course success metrics may depend on courses, sections, faculty, etc. When absolutely necessary, here are guidelines for nested randomization.
Select randomization unit(s) carefully in relation to success metric. N should be relatively large (> 50) at the selected randomization unit (RU), to which multiple students belong. Persistence is at a student level. Course grade is at a student-section level with dependence on faculty. For large courses with multiple sections taught by multiple instructors with the first-come-first-enroll policy, we need to be extra careful since there can be selection bias in sections.
Once random assignment has been made, evaluate key student or student-section features, predictions, and aggregate historical success metrics or their probability distributions by randomization units to ensure that the assignment is truly random. That is, the effect size (mean difference between randomized pilot and control groups/pooled standard deviation) should be < 0.05.
Even better, consider matching students first and then randomizing at the matched-pair level. This is called fully-blocked RCT, which provides the highest statistical power. Matching in an observational setting follows the same principle. A compromise is the so-called stratified randomization, where randomization is performed at a student-segment level, such as new vs. returning students, online vs. onground students, and STEM vs. non-STEM students. Fully-blocked RCTs are not sensitive to attrition since matching pairs can be dropped based on attrition patterns if unknown outcomes associated with attriters are a problem. For persistence, attrition is not a problem for RCT.
Disadvantages of RCT
Besides the usual ethical consideration associated with the “do-no-harm” assumption, the foremost challenge is that RCTs are difficult to execute properly and can be very expensive. We saw RCTs, where the experiments were not truly randomized at the measurement level when we went through our validation test to ensure that randomized pilot and control students were truly random in predictions, historical student success metrics, and key covariates.
The most glaring problem with RCT is that, unless fully blocked, it is very hard to obtain drill-down intervention insights, such as impact estimates for various student subgroups or intervention strategies/modalities to improve targeted engagement strategies and to identify further opportunities for future learning. Such precise, drill-down intervention insights are of paramount importance in improving student success rates and estimating ROI for resource allocation optimization.
In the next part of this series, we will investigate how modern signal processing and machine learning algorithms can overcome these methodological challenges.
Dave has more than 20 years of experience in building various analytics apps and solutions spanning nonlinear time-series analysis to predictive analytics, outcomes research, and user experience optimization. He and his team are working on (1) improving predictive algorithms to provide much more actionable insights, (2) adding new capabilities to automate ROI and outcomes analyses as part of action analytics, and (3) making the Civitas Learning analytics platform self-learning and more intelligent over time. He holds 13 U.S. patents, is the author of a book on pattern recognition and predictions, and has published a number of articles in journals. He currently serves as chief data scientist with Civitas Learning.