Good morning, my name is Jen Hession and I want to welcome you to the NIH Office of Disease Prevention’s Mind the Gap webinar series This series explores research design, measurement intervention data analysis, and other methods of interest to prevention science Our goal is to engage the prevention research community in thought-provoking discussions to promote the use of the best available methods and to support the development of better methods Before we begin I have some housekeeping items to submit questions during the webinar there are two options First you may submit questions via WebEx by clicking on the question mark in the WebEx toolbar Please adress your questions to all panelists Second you may participate by Twitter and submit questions using the hashtag #NIHMtG at the conclusion of today’s talk we will open the floor to questions that have been submitted via WebEx and Twitter Lastly we would appreciate your feedback about today’s webinar upon closing the WebEx meeting. You will be directed to a website to complete an evaluation We would appreciate your feedback as it will help us improve this webinar series. At this time I’d like to introduce Dr. David M. Murray Associate Director for Prevention and Director of the Office of Disease Prevention Thank you Jen. Today’s speaker is Dr. Monica Taljaard a Senior Scientist at Clinical Epidemiology Program at the Ottawa Hospital Research Institute and Associate Professor in the School of Epidemiology and Public Health at the University of Ottawa She has a PhD in epidemiology and biostatistics from Western University and her main research interests are the design analysis and ethics of cluster randomized trials as a methodologist with the Ottawa Method Center Dr. Taljaard works with clinicians and researchers from a variety of disciplines and medical specialties and the design and analysis cluster randomized trials standard clinical trials and observational studies She also supervises master’s and doctoral students in epidemiology and teaches biostatistics courses at the University of Ottawa It’s my pleasure to welcome. Dr. Taljaard Hello everyone. Well, I’m absolutely thrilled to be presenting about stepped wedge cluster randomized trials today and I’m especially thrilled to be doing it with Dr. David Murray on the line although a little bit intimidated nevertheless Dr. Murray’s green book on the design and analysis of group randomized trials has been a trusted resource throughout my career so thank you very much for the kind invitation Dr. Murray and for the introduction So I think I’ll get right into my presentation here And so what I want to do is start with a brief reminder about what cluster randomized trials are before I describe What the stepped wedge version of the cluster randomized design is. I then want to highlight some key methodological considerations in stepped wedge trials I’m going to describe the methods of analysis and sample size calculations for this design and I’ll conclude with some advantages and disadvantages of this design So, I’m sure this audience lag Might need no introduction to cluster randomized trials or to learn group randomized trials But these are trials in which the units of randomization are intact groups for example hospitals medical practices or even entire communities But the outcomes are observed on multiple individuals within each cluster so a key characteristic of this design is that multiple observations from the same cluster are usually positively correlated which of course reduces the effective sample size Now the strength of the correlation within clusters is usually measured by parameter called the intercoastal correlation coefficient or ICC very important parameter which I will be referring to throughout my presentation So it’s well recognized that we have to account for the ICC in both the sample size calculation and the analysis We don’t account for the ICC during so precise calculation our sample size will be too small and we have an increased risk Type 2 error on the other hand. We don’t account for the ICC in the analysis. Our sample size will be Overstated and in fact, we’ll be at risk of an increased type 1 error rate So given that the ICC is very important I have provided here a very simple definition for the ICC So let’s just assume the outcome is continuous the variance is sigma square so it’s very easy to show that the total variance can be decomposed into two components the variance between cluster means as well as the variance of Individual responses from their own cluster means so there was in cluster variance So we have the between cluster variance, sigma square B and the within cluster variance, sigma square W So with this simple setup the ICC consisted be defined as the between cluster variance divided by the total variance in other words the proportion of the total variance that’s between clusters and here the ICC varies between 0 & 1 Okay, so what is the impact of this clustering on inferences from a trial Well again, it’s easy to look at the variance of the mean in a standard clinical trial where n independent Individuals are randomized to each on I think most of us will recognize this expression for the variance of the mean It’s simply sigma square over N However, if we have a cluster randomized trial when we have K clusters on M individuals per cluster There is an extra piece attached to the variance of the mean which I highlighted here So this is called the variance inflation factor or the design effect So the design effect is simply a 1 plus M minus 1 where M is the cluster size times ρ where ρ is the intercluster correlation coefficient. So that is very effective It’s a very simple approach for calculating sample bias for a cluster trial We can perceive just multiplying the end required under individual randomization by the design effect now usually In a cluster trial there’s also the need to make a small sample Adjustments to account for the fact that we are using critical values from the normal distribution rather than routine distribution Right, so that’s all about cluster randomized trials. So what is the stepped wedge cluster randomized trial? This is a relatively new type of cluster randomized trial design Commonly used to evaluate public health, health system, and service delivery Interventions, so I think it’s not really a single design. It’s really a collection of Many possible design variations and when one reads published stepped wedge trials there are so many variations But it really becomes quite confusing This design has seen a rapid increase in popularity but I’m sorry that it’s used in practice has our faceless methodological development and I think We need substantial methodological expansion in methods for this trial over the next few years So let’s explain what the stepped wedge is. Here is a basic diagram. This is for the most basic version of the stepped wedge design Note that at the beginning of the trial all of the clusters are in the control condition They then cross over sequentially and in random sequence through the intervention condition until By the end of the trial all of the clusters are exposed to the intervention condition so The outcomes are assessed repeatedly in each cluster over time so that’s the basic layout of stick which cluster randomized design So I just need to establish the terminology that I’m going to be using throughout my presentation First of all, the rows of this design is referred to as sequences In this design, there four sequences. Clusters are randomized with respect to different sequences with control and intervention indicators I said that repeated measures are taken on each class, and these are taken within different periods so the columns of this design are referred to as Periods the cells are the cluster areas and in fact the cluster period Sizes and the number of periods as well as the number of clusters of course are critical when it comes to power calculations So we use the word step to refer to the times at which the intervention is implemented and the duration between the steps Is referred to as the step length Okay, so that is the terminology that we’re going to use now the stepped wedge assumes that The measurements are taken as soon as the cluster is switched over to the intervention condition reflects the effect of that intervention But sometimes the effect of the intervention cannot be observed immediately For example, it may take a while for the intervention to become actually implemented and to be embedded in the cluster Or there can be a natural lag before the intervention can even plausibly affect the outcomes so if we do take observations during those periods, there could be a bias towards the null so it is recommended to explicitly consider in your design whether or not you need a transition period and I will explain later during the analysis how we might handle such transition periods in the analysis now This transition period can just be a small portion of the entire Interval or it could be even an entire step mix now Generally don’t have too much time to elaborate on this but generally it is preferable to have The transition period as it’s a small fraction of the entire interval rather than illuminating an entire step length and the reason for that is that these cells on the diagonal Actually contribute most to the power of the design so it is preferable to choose the step length such that the transition period is just a small fraction of that step Now there are two main types of stepped wedge designs depending on how the outcomes are collected so in repeated cross-sectional design Different individuals are measured each time In a cohort design on the other hand the same individuals are measured each time Okay, and we can differentiate further between closed cohort designs in which no new individuals may join the trial and An open cohort design in which some individuals may leave and others may join the cohort during the trial. So those are the main types of designs based on how the outcomes are connected I have two examples here which illustrates some of these features. So the first one is a stepped wedge design To assess the cardiovascular health effects of a Managed Aquifer Recharge initiative. That’s an engineering system that was designed to use the salt content in drinking water supplies So the background to this is that because of salt water intrusion water supplies in coastal regions may become contaminated With salt and this can lead to high blood pressure and all sorts of other bad health effects so the investigators developed a Technology called the Managed Aquifer Recharge system, which utilizes rain water to restore equilibrium and Reduce the salt content of the drinking water. So they wanted to evaluate this technology They designed a stepped wedge trial in 16 communities over five months So here is a diagram of the design which was presented in the manuscript Notice that at the beginning of the trial all 16 communities are in the control condition and then on monthly intervals a random selection of four communities for new communities install the system until by the end. All 16 communities will have received this technology So the outcome is blood pressure measured on an average of 60 adults in each community and these individuals were recruited before the communities were analyzed So they all agree to participate in the trial and the same individuals were measured each month from the beginning to the end of the trial So all of the individuals were exposed to both the control condition and intervention condition And so this clearly is an example of a closed cohort design The second example is established trial that was designed to evaluate the impact of a clinical decision rule called the HEART risk score in patients presenting to emergency departments with acute chest pain, so The objectives basically was to determine if the use of this scoring system on this clinical decision rule Can be helpful in real-world clinical decision making also does it help to decide which patients can safely be sent home which patients need to be admitted and which patients need immediate intervention so The primary outcome in this trial was the incidence of major adverse cardiac events within six weeks from the initial visit So they designed the trial as a step wedge trial they had 10 participating emergency departments and Emergency departments Started in the control condition in the first month and then on a monthly basis one new clinic Adopted the heart risk score to guide clinical decision-making So in they expected around 60 again Just coincidentally the same cluster period size they expected an average of 60 patients to arrive at each emergency department per month with chest pain, but clearly they go away so they have a very short exposure They go away and at 6 weeks their outcomes are measured but these are different individuals who are measured each month, so this is an example of a cross-sectional design Right So stepped wedge designs have several key characteristics that really complicate their design and analysis So I want to describe what these characteristics are Before I go into details of the analysis and the sample size calculation Now these characteristics Do increase the risks of Various forms of bias with this design and I believe that we need to carefully consider Whether this design is appropriate in each particular scenario We also need to take special care in how we report the results of a stepped wedge trial and I’ll just say that there is now a consort extension for stepped wedge trials led by Dr. Carla Hemming, which is impressed after the MJ So the first key characteristic is that in a stepped wedge trial the intervention effect is partially confounded with time So this is very important by design The intervention effect is confounded with time And the reason for that is that, as you can see, on average the observations in the control condition occur at an earlier calendar time than the observations in the intervention condition and If there is a secular trend Present meaning that there’s a natural Improvement or deterioration in the outcomes over time these facts of the secular trend become confounded with the effects of the intervention So as a result Analysis for the stepped wedge must always adjust for time even if it’s not specifically significant The second Issue we need to consider is that there is an increased risk of contamination with this design so each cluster as I said is exposed to both the control condition and Intervention condition now, it could happen that some clusters do have to wait a very long time for the intervention They decide to implement the intervention earlier than they should because they weren’t willing to wait Or it could be that the actual time to deliver the intervention may take longer than expected in some of these clusters So that could lead to a contamination of course as long as we observe and track with The actual exposure in each cluster in each period we can conduct an as treated analysis But we should note that this would deviate from a true intent to treat approach The third key Characteristic is that the effect of the intervention may vary over time so this could be according to calendar time so for example, the clusters Implementing the intervention of an earlier calendar time may have a different response To the clusters implementing the intervention at a later calendar time But the effect was also vary according to the time since the intervention was introduced so for example the effect of the intervention may become stronger as the clusters become more experienced but The opposite is also true. The effect of the intervention could actually decay Perhaps because there’s decrease in adherence or because the training that was delivered at the beginning is forgotten so Either way an analysis which assumes that the effect of the intervention is constant through time may be biased we need to consider this a Fourth characteristic is that the effective intervention may also vary across clusters remember each cluster is exposed to both the control and the intervention condition so we could Estimate the effect of the intervention within each cluster and this effect may vary This could be because of differences in the way The intervention is actually implemented or differences in appearance or other characteristics associated with these clusters so again an analysis which assumes that the intervention effect is the same across clusters may be biased and We have to account for this in the design because if we do account for this in the analysis It can reduce the power for the trial the first and final characteristic that I want to highlight is that Intercluster correlations are much more complicated in a stepped wedge design as I said a stepped wedge takes Repeated measures on the same cluster and possibly also on the same participants over time So there are two types of correlation coefficients that arise so we have a with period ICC which is the correlation between two different individuals in the same cluster and in the same period but we also have a between period ICC which is the correlation between two different individuals in the same cluster but in different period so one would naturally expect the correlation between individuals in different periods to be weaker than the correlation between two individuals measured in the same period Now in a cohort design it gets even more complicated because there is a third type of correlation Maybe the correlation in repeated measures on the same individual okay, so Anyway, bias can be introduced by misspecifying the correlation structure So we have to be careful about how we handle inter-cluster correlation coefficients in this design So now come to methods of analysis for the stepped wedge now there are many possible methods of analysis that has been considered for example there are some methods based on cluster period summaries such cluster-level methods of analysis and Which kind of tend to reduce the complexity of these correlations But there are also individual level methods, which is what I will be focusing on other approaches have looked at doing comparisons horizontally meaning within each cluster looking at a before and after comparison within each cluster. We have to be careful that those comparisons, of course have to somehow take account of the secular trend And then there are other approaches that have looked at vertical time slices in other words Looking at between cluster comparison separately in each time slice of course these Avoid problems with the secular trend because all of these comparisons are randomized But we have to figure out a way of pooling all of these estimates across time So there isn’t really a consensus over the best method But the method that I want to focus on is probably the most commonly used method and that is based on pooling both within and between cluster information and using a general or a generalized linear mixed model for the analysis GLMM So the simplest and the most commonly used analytical approach for the stepped wedge is probably the model proposed in the seminal paper Hussey and Hewitt in 2007 so specifically for the stepped wedge design cross sectional death which design with a continuous outcome They proposed a linear mixed effects regression model so this model has a fixed effect for the time as It should because we already said that the intervention effectiveness death, which is confounded with time So we have to account for time and note that this model assumes time as a categorical variable Then we also have a fixed intervention effect. So X here is a time varying binary indicator indicating the condition of the I cluster in the J period either Intervention or control So the intervention effect that we are assuming here is a constant and then we have a random intersect to account for the intracluster correlation coefficient So a very simple model with a random intersect which accounts for the intracluster correlation coefficient We have to consider carefully the implications of this model So first of all, let’s think about the fixed effects with function are imposed by this model So the blue line here represents the response in the control condition over time so that’s the secular trend and we have assumed a common secular trend across all the sequences and across all of the clusters and because We are modeling time as a categorical variable This is an arbitrary trend over time So we have not assumed a linear or some other parametric trend over time, although that of course is possible the red line represents the response to the intervention and please note that is parallel to the blue line in other words, the intervention effect is Constant through time. It does not vary these are the assumptions imposed for the fixed effects by this very This initial model that was proposed for the stepped wedge design I should also note the assumptions regarding the random effects so as I said there’s a single random effect intercept for each cluster and this implies that the ICC is constant over time. So no matter how far apart the observations are These correlations are fixed over time in other words there was in theory an ICC is the same as the between period ICC so this Assumption is unlikely to hold in most practical situations unless of course the time span is very short Usually we would expect some decay in the strength of this correlation other time So two alternative models have been proposed that try or that And now these correlations to decay over time So I will refer to the original Hussey & Hughes model is model one Model two I will describe this model was proposed by Richard Hooper and colleagues Queen Mary University of London Which allows for a between period correlation and then I’ll also describe model three which is a more advanced model proposed by Jessica Kasza and Andrew Forbes at Monash University So let me review each of these two models the first one the Hooper model Allows the between period correlation to be different than there was in period correlation But note that the between period correlations themselves are fixed. So there is no further decay in the between period correlations while that assumption might be implausible as well. This model is very easy to fit I brought some SAS code in a supplementary slide We use bars also very easy to certain data or R or any other software package and also we have very simple design of it that we can use that can allow us to calculate sample size for this model The alternative model, model 3, the Kasza model allows the between period correlations to decay exponentially as a function of the distance in time between the observations so the correlation between observations that are separated by one period is higher than the correlation between observations separated by three periods or four periods and so This model is probably more plausible However, it’s more difficult to suspect and I understand that currently SAS is the only software that can fit this model. Neither SATAN nor R can fit this model in my understanding And also we have no simple design effects for it. So I am however going to point you to An R shiny app, which can do sample size calculations has matched this model But the complexity is there and even in fast this model can take quite lying to converge So to show you the details for the Hooper model The only difference between the Hooper model and the original Hussey and Hughes model is the addition of this random term a Random cluster by period effect. That’s the only difference so we still have the fixed effect foot time Which is discrete we have the fixed intervention effect We have a random intercept and then we have this random cluster by period effect Right, so straightforward this model as I said Defines a within period ICC which is defined in terms of the variance components for this model as I have here, so I will denote the width in period ICC as row 0 The between period IC speeds for the correlation between two individuals Different periods is denoted. Row one here and Clearly unless this variance of the random cluster by period interaction is zero The between period correlation will be less than the within period correlation Which is exactly what we wanted, but the between period correlation itself is not a function of time So it is constant through time Now the ratio of the between period and the within correlation is Given a special name and special meaning it’s called the cluster autocorrelation coefficient CHC So the cluster autocorrelation coefficient Essentially measures the extent of the correlation decayed. For example, if we have a CAC 0.8 it means that there is a 20% to take in the strength of the correlation and the cluster autocorrelation coefficient is Not to be an important parameter in our sample size calculation method the method that I will be reviewing shortly Please note that if we assume plus the autocorrelation coefficient of 1 It applies that the between period and within period intr cluster correlations are equal in other words We are back to the Hussey and Hughes model right, so then I say that the more complex model which allows the correlations to decay is more difficult to since It’s important to investigate the implications of not specifying the correlation structure so in other words What are the implications if we do for the Hussey and Hughes model when that more complex model holds or one of the implications if we split the Cooper model when the more complex decaying correlation model holds Now in the supplementary material I point you to a nice lookup table developed by Jessica Kaiser and Andrew folks Which can allow you to look up the implication of my specifying the correlation To summarize first of all if we’re few model one So if we feel across it with the correlation coefficient of one Meaning the Hussey and Hughes model where the correlations don’t decay. We will always Underestimate the variance. So that’s bad because our p-values will be too small and our confidence intervals will be too narrow Interestingly when we assume models to Meaning the Kruger model whether the twin theories correlations don’t decay But there is a difference between and with empirical correlation Then we can sometimes get the opposite effect But in most circumstances we will underestimate the variance as well So the impact actually depends on the strength of the correlation decay There were Imperial ICC and the cluster various sizes So as I said, there is a lookup table waves and lookup for specific design scenarios what the implications are generally over specification in other words Assuming a more complex model model that allows for a correlation decay does not lead to bias Right, so that concludes my analysis section I will just briefly mention some additional things You should consider in the analysis. First of all, I mentioned earlier. It could be clustered treatment heterogeneity So you may want to consider including a random cluster by treatment effect As I said earlier as well, there could be a time varying intervention effect So you may want to consider including an intervention by fine interaction? This could be either calendar time. Assuming you’ve got enough clusters crossing over each period Or you may want to model that Time on treatment effects meaning looking for a difference with increasing duration of exposure to the intervention It’s also important to consider You will handle the leis or transition serious. So there are some options there For example, you can simply open it observations during the transition period of course that will decrease your color so it’s better to Accommodate this in your design so that you don’t lose power you Can also consider analyzing the observations rather than just a binary exposed or unexposed? variable you may possibly be able to model the effect of the observation where the intervention Indicator is specified as a fractional variable. So for example, if you expect that during the transition period Clusters are only 30% exposed. You may code your intervention indicator as 0 0.5 and 1 it’s a little bit difficult to imagine how one might be able to fully pre specify what the extent of exposure would be so perhaps this matches more and as Treated that type of analysis rather than intent to treat right sample size calculations It’s not the both calculations for the fifth which can be based on Simulation on an analytical formula or on the scientific as I described at the beginning for cluster randomized trials So I only have time to review the very simplest approach which uses design effects based on the super model Note that there are no design effects available for the Kaiser model But there is a nice shiny up in preparation, which I will point you to so if we are able to use the design effect and we are Willing to assume the Cooper model then that there is surprisingly easy to do the sample size Calculation the first step is to calculate the total required sample size, assuming individual randomization The next step is to multiply by the design effect due to clustering So this is the design effect that I introduced right at the beginning Except that I’m using the within period ICC, this is the within period ICC not the total ICC over the duration the study. And M here is the cluster period size not the total cluster size I then have to multiply by a second design effect to account for the fact that we have repeated measurements I’ve got the formula for this repeated measures design effect on the next slide Once we have multiplied by both of these design effects We divide by our cluster size the period and that gives us the total required number of clusters for the stepped wedge now we may need to round up to a multiple of the number of steps that can be inconvenient but that is the safest thing to do because if you don’t do that you’re realized power could end up being less than what you expect because the location for any extra cluster is the location where the imbalance goes where it goes in the first, second, third, or fourth step That can affect your power So what is this design effect for the repeated measures look like was really straightforward to calculate it’s a function of the number of sequences T and this parameter R and R is interpreted as the correlation between cluster means at two different times the ordinary Pearson correlation between cluster means what we calculated as a function our outer superior size our in-period ICC and our cluster autocorrelation coefficient Remember I said earlier the cluster autocorrelation coefficient is an important parameter for sample size calculation For a thermal design there is also an additional parameter and individual autocorrelation coefficient which needs to be specified so I have an example of the application of this methodology But let me just say that of course our calculations critically depend that are on the achieved correlation structure And also we need a very good estimate for these correlation coefficients. So This can be challenging of course for a standard cluster randomized trial we also need an estimate for the ICC So that’s no different But here for the stepped wedge we additionally need an estimate for the cluster autocorrelation coefficient, and that can be challenging So what I have done in previous situations that I was able to obtain longitudinal data from routinely collected data sources and Calculate these correlation coefficients using an analysis of variance approach. It’s more easy for Continuous outcomes for binary outcomes. You may have to use a linear mixed model or some other method It’s more complicated in that case, but you have to take care of course that the period length is appropriate If you have absolutely no information about those just autocorrelation coefficient some rules of thumb have been suggested for example If the CAC between 0.6 to 0.8 And either way it’s essential to examine sensitivity to arrange alternative possible assumptions, so Here is a worked example for the trial in Bangladesh. I don’t really have time to go through all of the details These are more or less the assumptions that they present in the manuscript. They assume the Hussey & Hughes model, so they did not account for such the autocorrelation coefficient, but I have seen some random values here and the calculation as works out according to what they present in the trial sixteen clusters I had to round up from 14.8 to 16 so this can sometimes be a little bit inconvenient For example, if you have 10 steps and your answer comes out to be 11.5 You might have no choice but to round up to 20 unless you have a method that can accommodate the imbalance in a design and so Diagram just shows four interest the comparison between the stepped wedge Mission which has I said requires 16 communities total of 960 individuals Who design over one month, of course, this trial will end a lot sooner but they would require a substantially higher number of communities and this design would probably not be feasible It is not that before and after design They can dramatically reduce the required number of communities But of course this design it’s not surprising that it requires more communities because it has fewer measurements but what if they did it as a parallel on design with the same number of measurements and even here the required number of communities is still more than for the stepped wedge design so it usually in most circumstances Respect which requires fewer smaller number of clusters than parallel armed designs So, how can you do the sample size calculations? While there are some resources here this is our package developed by Jim Hughes, which does not allow for correlation decay And there is the R-shiny app, which is a beta version And which has great flexibility for a wide variety of possible designs It does permit allowing for the correlation decay according to the case of model It also allows for custom treatment originality I should just mention that all of the methodology that is currently implemented here assumes that you have a large number of Clusters may not be appropriate for a small number of clusters Right, so I will conclude now with just some advantages and disadvantages first of all Advantages so my experience here at the OHRI there is a lot of enthusiasm For the stepped wedge design among clinicians and I think the main of reason it is so popular Is that all clusters have the opportunity to receive the intervention during the study so it’s a lot easier to recruit clusters family practices or hospitals or community if Everybody will receive something during the trial and it may even be a requirement of you know a health system’s stakeholder or a funder that everybody has to get something that’s supposed to improve outcomes and In that case at least the stepped wage is stronger than the alternative which might have been an uncontrolled before-and-after design Also ever mentioned the stepped wedge usually reduces the first of a number of clusters that are needed for this trial And a third reason why it’s often are you this because people say that it simplifies the logistics only one intervention the intervention can be implemented in only one cluster at a time But of course that region doesn’t really hold water because we can do the same thing with a parallel design as well now there are many disadvantages and I think that’s the reason why we have to be a little bit cautious and careful about adopting this design first of all, as I said by design the intervention effect is confounded with the secular trend so it can be difficult to adequately model the effects of the secular trend Secondly, even though people think that it’s majestically simpler Actually, it’s logistically more complicated All of the participating clusters have to be recruited at the beginning so that they can be randomized so all of the ethics approvals and all of the preparation has to be done upfront because you don’t know which of the clusters has to implement the intervention first and this can be logistically challenging as well to ensure that all of the clusters are ready to implement the intervention when they are supposed to implement it Unless you have routinely collected outcomes available. This design can increase the data collection burden it can also take a lot longer to complete that’s funny because it is a Longitudinal design and that in itself can increase the risk of clusters dropping out It may increase the risk of contamination or various external events influencing the outcomes and then finally this design is a lot more complicated to analyze and interpret there are many assumptions as I mentioned earlier So in conclusion I Will say stepped wedge is rapidly increasing in popularity, but it has many methodological challenges there isn’t really consensus its design and analysis at the moment. We need a lot more methodological development and We probably have to carefully consider whether adoption of this design is justified and I have a special concern about the use of this design for a very small number of clusters because that complicates the interpretation and also doesn’t really easily allow Investigating whether the model assumptions are satisfied And for that reason, I think it’s prudent to be cautious about adopting this design. So I have a few key references I mean some supplementary material which you may access that shows all the code and some additional information And that is all I have so thank you very much, and I’m happy to take questions Thank you Dr. Taljaard. Very interesting and I we’ve gotten questions from our audience and Certainly. I have questions of my own To ask about the material you presented. Let’s start with a question about you know, should you use stepped wedge? Should you use a parallel group request a randomized trial design? That’s the first question that many people consider. At the end of your presentation you identify the strengths and weaknesses of the stepped wedge and often in relation to the parallel group randomized trial the factors that people usually talk about when they’re thinking about the trade-offs are political considerations, you know often It’s difficult to sell The stakeholders on a design if they know that half of the groups or communities are going to have to be in a control condition for the whole study and The stepped wedge has an advantage there because everybody gets the intervention before its over Another issue with sample size and you’ve presented the case that we often can get away with a smaller study using stepped wedge and I will certainly agree with that sometimes the Differences are dramatic as in the example that you showed. Sometimes the differences are far less dramatic Where the savings might be a few clusters rather than some you know half another factor that’s Often discussed is the time that’s required for the study often You can get the answer faster with the traditional parallel group randomized trial than with a stepped wedge Another factor that’s discussed is cost there’s certainly a lot more data collection in a stepped wedge than in the traditional parallels of randomized trial and then the last one That that we’ve encountered at NIH and some of the studies that we’re supporting is the risk of history Effects that is something happening During the course of the study that could greatly affect the outcomes and that certainly can happen in a parallel group randomized trial but it it seems to be particularly problematic in stepped wedge because those studies Often take longer and so the risk is greater In any comments that you’d like to make about Any of those issues beyond what you’ve said already? Well, I think that was a Really great summary of the strengths and weaknesses. I think the only thing to add which which I find people struggle with is just a logistical complexity of getting everything organized on time And on schedule and right at the beginning Because all of the clusters have to stick to the randomized timing of the intervention I would agree that that is an additional complication a question that that comes up often as people are thinking about planning these studies Usually that the two questions how many clusters do I need? How many steps do I need? Can you offer any general advice about advantages disadvantages of more steps or or more clusters Yes I do have some recommendations regarding the choice of Settling right at the very end so the siblings, I think the first consideration usually is logistical because it depends on the funding for the study and the total trial duration, so that usually Plays an important role The recommendation is to try and maximize the number of steps So you get the most power with this design when each cluster has its own step So in other words if you have 10 clusters, you ideally want to have this step however, there is kind of a diminishing returns in that you get the most benefit from increasing from three to four to five steps and in a kind of tends to Level off a little bit so The step length, of course also Needs to be taken or chosen with due consideration for how long it actually takes for the outcomes to show an effect of the intervention in other words that lag time that I spoke of earlier so if there is a lag before the intervention effect is realized you will have to make sure that your step length and kind of Take in that lag and that it’s longer than the lag because the very first time point after the cluster switched over to the intervention has to reflect the effect of the intervention because of this assumption that the effect of the intervention is constant through time. So from the very beginning the effect is immediate and assisted through time so your lag has to be accommodated within that and so I find that it’s it usually requires a lot of discussion with clinicians and the principal investigator in terms of logistics in terms of power calculations just the practicalities of how long it takes before the outcomes actually show an effect of the intervention and then there are power considerations as well The comment about being sure to include the transition period within the step does that apply if I’m fitting the model that allows for a gradually increasing or decreasing intervention effect, so in other words a time by treatment interaction So they are unrelated issues in my understanding so when you have a transition period Which you decide to accommodate with a fractional treatment indicator Well, that’s just a way of dealing with that one time period where the effect of the exposure is partial But you still need an immediate effect once the cluster is really exposed to the intervention, right? It’s not as if we are now fitting a time-varying effect it’s simply just deals with the fact that from Unexposed to exposed there is a gradient of effect But once the cluster is fully exposed that is your single estimate of being dimension effect And again, it’s constant through time. So we have no methodology currently That accommodates a time varying effect in the sample size. Although we do know how to analyze that We are being overwhelmed with questions Dr. Taljaard. We have a very interested audience One of our reviewers has asked what about survival outcomes time to event outcomes where intraclass correlation doesn’t really exist and so a design of it doesn’t really How do you do sample size calculations for a stepped wedge where you’re going to have survival outcomes Yes, so this is one of those areas where we need more work So I personally have not had experience with time to event outcomes in stepped wedge trials there are some published examples. However, there is the trio’s study where I believe Larry Moulton was involved in that study and there is some methodology that was described for dealing with time to event outcomes but the sample size methodology for survival outcomes is not very well developed and I’m unable to offer further guidance about that. Sorry Simulation methods might be helpful in that in that case another question from one of our listeners And I’ve encountered this question in parallel could randomized trials as well How do you feel about creating what you might call artificial clusters? That is in some cases your cluster is a hospital But some of those hospitals might be quite small and is it ever appropriate to put some of those together into a single cluster where all of the hospital all of the small hospitals in that cluster are treated in the same way And analyzed in the same way as though it’s one cluster instead of several small hospitals? So in other words Treating a group of hospitals as a single cluster rather than as different clusters Oh Yeah, it would make more sense if if those Hospitals might be all affiliated so it’s not only might be different campuses that are all part of the same Organization and often there is contamination between the different sides in which case it would make more sense To treat them as a single cluster Now how to deal with that in the analysis. So there are methods For doing multi-level modeling in stepped wedge trials so that wouldn’t be too difficult. For example, you could consider adding An additional I random intercept for a lower lower level unit But I would imagine that some of those models might be more complicated to fit if you only have a few Clusters that are grouped together like that Usually Yeah, I don’t think I have any further further insights about that, okay, you mentioned that you have particular concerns about Stepped wedge designs with a small number of clusters. What what would you consider to be a small number? Yes, I Often get the question, what is the minimum number of clusters that you would consider for a stepped wedge trial and so I have I Think Peter Morrow has a toolkit for field trials in which there is actually a published recommendation that the minimum number is six for a stepped wedge trial I don’t know. I think the vast, actually close to a third of published stepped wedge trials Thus far have fewer than10 clusters Personally, I don’t really feel comfortable doing a stepped wedge with fewer than 10 clusters but You know problem is that the sample size calculation also tells you that you need a very small number Especially when the effect size is quite large and sorry It’s really difficult to convince people once they see the power calculation that there are additional considerations that you have to consider There are additional concerns with having a small number of clusters It’s not just about you know, the power considerations. It raises challenges in the analysis as well. So I Feel that 10 probably would be a minimum that would be reasonable but there are examples, many, with fewer than 10 And I I think you would agree Dr. Taljaard and you suggested it in your presentation that a power calculation is just an exercise with the calculator and everything depends on the quality of the parameter estimates really using the power K and in the investigator would be Well advised to do sensitivity analyses associated with the power calculations varying those parameter estimates over a reasonable range and then being rather informed about selecting the ultimate design and sample size Yes exactly and especially for the stepped wedge. I think it’s very important because we are unlikely to have The necessary information to inform these power populations, especially for their correlation coefficient All right, I wish we had more time but we’re going to start losing our audience if we don’t wrap up lots of interest in this material I want to thank you very much. Dr. Taljaard for your wonderful presentation today and I’ll turn it back to Jen Thank you Dr. Murray and thank you to everyone who participated in today’s webinar On the Mind the Gap website prevention.nih.gov/mindthegap You will find several resources for this talk including slides and a list of references We will also be posting a recording of today’s webinar on our website next week You’ll receive an email with a link to the recording when it is available. Thank you