PART III — MULTIPLE OBJECT DESIGNS and PART IV — SINGLE OBJECT STUDIES; DESIGN VARIATIONS

PART III — MULTIPLE OBJECT DESIGNS and PART IV — SINGLE OBJECT STUDIES; DESIGN VARIATIONS


We’ll move on to the last section, which is
going to include parts three and four. Part three involves multiple object designs. We’re going to be looking at how to use a
design check sheet to ensure that you thought of the most important aspects of your design
and you’ve gotten them written down. This is definitely going to help you a lot
when you go back to writing the final paper. We’re also going to talk about how sketching
out the design or creating some kind of flowchart or diagram can be very useful as a summary
tool for looking at your design and identifying whether or not you have any holes in that
design. Starting with the design check sheet. These are the aspects I like to have on my
check sheets. The very first thing is to record the observation
that led you to undertake the research in the first place. This section could also include a brief summary
of the background and basic facts. Then comes the research problem or question. What is it that you really want to know? It’s important that you don’t try to test
too many things in one experiment. Here’s the place where you can prioritize
what it is you most want to know or need to resolve. Then your hypotheses, or the possible answers
or solutions that you’ve come up with through your brainstorming process, however you carried
that out. The rationale for each hypothesis would explain
why this is a potential answer to the research question. The implications of those hypotheses are going
to state: If this hypothesis is true, what should you expect to have happen in an experiment? What would you expect to see? If the hypothesis is not supported, what would
you expect to find instead? Again, a rationale would clearly explain why
you expect those results. Then come the details of the object protocol. First, a definition of what constitutes an
experimental unit is needed; are they real objects or facsimiles? Are your experimental units entire sheets
of paper covered with adhesive? Or you cut up an entire sheet of adhesive
covered paper into ten squares, with each square then being counted as a separate experimental
unit. Your final interpretations will depend on
how you defined an experimental unit, and so will the design of your statistical analysis;
or the decision as to whether or not you can even do any statistical analysis. The number of replicates is another issue
you need to be clear with from the start. How are you going to define replicates? Where do you expect variation might occur? Have you defined your replicates in such a
way as to be able to capture that expected variation? The measurement protocol will include the
types and methods of outcome measures, such as visual evaluation, measurement of color,
gloss change, etc;. These should all relate back to the hypotheses
and their implication. Clarifying this at the outside is helpful
to ensure that you’re testing for the right things so that you can clearly tell if a hypothesis
is eliminated or supported; that you haven’t missed something crucial for making that decision. At the same time, you may find that some of
your tests don’t address any of your hypotheses and would be a waste of your time. Consider whether you can randomize a measurement
order so that all replicates of one group are not measured one after the other. For the treatment protocol, it’s important
to clearly define your groups and your goals. For example, if you’re comparing the performance
of two coatings for protection of silver in museum collections, say, [Agatine 00:04:20]
and B 72, and you have two different application methods, spraying and brushing, you’d likely
then define four groups: Agatine brushed, Agatine sprayed, B 72 brushed, B 72 sprayed. If you weren’t clear about that and were thinking
that you only need two replicates of Agatine – one brushed, one sprayed – and two of B
72 – one brushed, one sprayed – you wouldn’t be able to look for differences between spraying
and brushing because you wouldn’t have those application methods being replicated within
a coating type. To analyze for that factor, you need to have
at least two of each. Having un-coated controls here probably wouldn’t
be very useful if you can predict that silver will of course tarnish if it’s exposed to
pollutants with no protection, and if your main research question understudy is which
of these two coatings works better? Defining your test factors means clarifying
and listing the type of things your testing. For example, coating type, application method,
metal type, etc;. These are the general categories that you’re
going to need to be able to be crystal clear about when setting up any statistical analyses
and making interpretations. Finally, your randomization method should
be planned out. In your final paper, it’s helpful to describe
what randomization method you use, because then it’s going to be clear to readers you
actually did a randomization method versus the more common haphazard selection or picking
of things that’s sometimes called randomizing. Then, what methods or interpretation or analysis
do you have plan? How will you determine that the data produced
by your measurements have supported or refuted your hypotheses? You may very well need statistical analysis
to get a solid interpretation. If so, you have a statistician to collaborate
with. If you do, I can guarantee you he or she will
be quite happy to see this experimental design check sheet, because it will make collecting
the statistical test and setting up that test very fast and very easy. If you don’t think you need statistical analysis,
what other methods are you going to use to interpret the results of your study? Finally, sketching out the experimental design
or creating some sort of research flowchart can help you clarify the logic in your plan
and identify holes in your design. If you find it difficult to sketch out the
design, it may be because your design is unclear or it’s too convoluted. Drawing helps you find that out before you
get started on any actual experimentation. Sketching out the design of an experiment
you’re reading about in the published literature, I find also is helpful because it will help
you better understand exactly what they did and how they carried out the work. For example, here’s a sketch for a simple
experiment that was comparing the aging performance of four adhesives. Each experimental unit here is defined as
a sheet of paper with adhesive applied. Four replicates were required for each adhesive. Color measurements are to be taken before
and after accelerated aging, with color change and yellowing quantified. In addition, peel strength is to be measured. Because this test is destructive, an additional
four replicates are going to be prepared for the initial before aging tests. The final after aging tests can be done on
the original four replicates after the non-destructive coloring change and yellowing measurements
are completed. We could describe this particular experiment
as having one test factor, which is: Type of adhesive with four treatment groups. None of them being control, because paper
alone without any adhesive isn’t likely to be informative. Instead we probably, likely, are including
at least one adhesive that’s already routinely used in conservation, and comparing it with
a performance of at least one newer adhesive being proposed for similar uses. Even a very complicated multi-year project
benefits from a research design sketch. You don’t have to try to read all the detail
here. This is a sketch to illustrate a large project
I planned out and submitted for funding. Many of the proposal reviewers mentioned that
there was a lack of clarity in the research design. Of course, naturally, my first instinct was
defensiveness: “Of course the research design was well thought out and clear,” but obviously
it wasn’t clear to the readers. I realized that I had violated my rule: I
hadn’t sketched out the research plan. I figured, “I’ll sketch it out, and that will
help readers see more clearly my brilliant research plan.” My attempt to sketch it out, I realized the
readers were actually right: There were a lot of holes in the research plan. After a lot of revision, rethinking, re-planning,
I was able to sketch out a research design that clearly shows a central research question,
a central hypothesis, nine sub-hypotheses, and all of the tests we’d use to see if these
hypotheses are negated or supported. Once I had written out this sketch, it was
much easier to describe the research plan in a way that was much more clear to reviewers. Many experimental designs that we use originated
in agriculture, where a lot of research has been done testing variables such as: seed
varieties, pesticides, watering practices. A typical design used in agricultural research
is called the split plot block design, where multiple agricultural fields are used for
testing, with each field divide into sections so that each factor is tested both within
a single field and between fields. This makes for very robust and reliable comparison. This is a very appropriate design for conservation
research, and it’s sometimes seen in the literature. A very good description of the experimental
design elements for this type of design is given in this paper by Green and Lease that
appeared in Studies in Conservation. We’ll go through exactly how this project
was set up. They started with the initial observation
that paper often becomes acidic over time, and that this is caused by multiple factors,
hence, the acidification is carried out to stabilize the paper. Often, an aqueous solution can used, but if
you water fugitive media present, a non-aqueous solution may be needed. At that time, the British museum where they
worked was using a solution of barium hydroxide and methanol. This wasn’t considered satisfactory because
it was toxic, and the treatment produced an initially high pH within the paper, which
they were afraid could cause hydraulisis and damage the cellulose change. Although, the pH did gradually decrease in
exposure to air as the hydroxide was converted to carbonate. The research problem here was that additional
non-aqueous solutions were needed. The main hypothesis to be supported or refuted
was that methyl magnesium carbonate could be a successful alternative non-aqueous de-acidification
agent. The main rationale for that is that it’s non-toxic. Also, the decomposition of magnesium carbonate
occurs rapidly on exposure to air, so they thought maybe that the paper would not have
this initial high pH and that this should hold up over time. An implication would be that the treatment
would not show adverse effects on tinsel strength, either immediately after treatment or after
accelerated aging. Experiments when done on two types of paper
– Whatman Filter paper and 19th century printed book pages. Here, the experimental units were defined
as including both facsimiles and real objects. The larger papers were then divided into sections
for treatments, so these are the plots which makes it analogous to agriculture. There were three replicates for each combination
of factors, paper type and treatment type. There were seven treatment groups: the commercial
methyl magnesium carbonate and methanol; methyl magnesiate prepared in the lab in methanol;
barium hydroxide in methanol; calcium hydroxide in distilled water. This was included because it was another comparative
standard treatment that would be used as an aqueous treatment. A control that was treated with only distilled
water, another control treated with methanol alone, and an untreated control of just paper. There are two test factors: Paper type and
de-acidification method. Here, the randomization method was clearly
and explicitly described. Measurements included pH, measurement by surface
electrode, and tinsel tests to check for paper damage. Both of these measurements were done before
and after a period of accelerated aging. Interpretation of the data was by statistical
analysis using analysis of variance. They were able to conclude that effects of
methyl magnesium carbonate on loss of tinsel strength and pH were similar to that produced
by barium hydroxide. However, the new treatment has the advantage
of much less toxicity. It turns out it could be a very good alternative
treatment. This last design included two factors: Paper
type and adhesive type. We can increase the number of factors to some
extent, provided there’s good reason, which means we’re giving clear rationales for every
factor we’re adding in, and as long as we don’t let it get too complicated. This one example: I had a recent project with
a goal of developing protocols using software for image analysis for ceramic thin sections
that could reliably and reproduce-ably measure the area of percentage of sand-sized particles
in the fired clay. Rather than start with unknowns, which would’ve
been real, archaeological ceramics, we prepared test tiles in the lab using known proportions
of clay and sand, and with purchased sand that had a known particle size. Since image analysis tends to vary for different
basic appearances, we use two types of clay, typical found with one that fired to a red
a color, and one that fired to a white color. Since we thought protocols might show some
variation, depending on particle size, we used three sand sizes: fine, which was 70
mesh; medium, 30 mesh; and coarse, 16 mesh. The sands were primarily quartz. Each of these were added to the clay in three
different target proportions: 10%, 25%, 40% by volume of loose sand mixed into the wet
clay, in the same manner that a traditional potter might add in temper material. We prepared five replicates of each combination
of clay type, sand size and sand amount. This is because we felt that it was likely
to be a lot of variation. There’s also three replicates for each clay
type with no sand added as our controls to check for any natural sand that might be present
as a background. After drying and firing the tiles, small slices
were cut from each, and a thin section made for each of the 96 tiles. These were mounted on glass slides with the
blue dyed epoxy so that quartz grains could be easily separated from pores for the image
analysis because in plain, polarized light, both would appear clear if we used a clear
epoxy. The thin sections were scanned in a high resolution
film scanner, seven microns per pixel, and then the images used for image analysis. We used a software package called Image Pro
Premiere by Media Cybernetics. The protocol that proved to be fastest and
easiest to accomplish while giving accurate and reproducible results was then applied
to real archaeological surroundings. Now, since the amount of sand added to the
tiles needed to be known for our protocol development, one way to incorporate blinding
and randomization into the process was to check on the results once the protocol was
finalized. Then samples could be randomly re-analyzed
without knowing the added sand amounts, which ones were replicates of each other, or what
any of the previous analyses were. Randomize and using blinding. These results were compared back to the original
results. When excellent comparability was found, that
increased our confidence that the protocol was actually ready to apply to real archaeological
samples of unknown sand contents. This is one way we can sketch out the experimental
design with this whole combination of factors. We have the white clay, the red clay. We have 10%, 25%, 40% added sand using different
symbols. We have the fine, medium and coarse sand shown
by different size of each of the symbols. If we try to get too complicated with too
many factors, sketching, even planning for and keeping track of the many combinations
would get extremely difficult. For example, if we try to run experiment testing
coatings on metal, and we want to test it on several metal types – sterling silver,
fine silver, copper, lead, bronze and brass, perhaps of various surface shapes, flat and
curved – then when we want to test for different methods of surface preparation, several coating
application methods, perhaps several different thickness as well, all for nine coatings or
so, things are quickly going to get very complicated to try to interpret. It’s likely we’re going to find at the end
when we’re trying to do our analysis that we actually didn’t keep all of these variables
in mind, so we didn’t get good replication of each. Even then, trying to discern which variables
are important for the final results is going to be difficult. We’d be better off doing one experiment with
the most crucial variable, such as metal types and coating types. Then for those coatings that appear promising
for specific metals, we could do another experiment to define things regarding the effects of
surface preparation, application method, coating thickness. One major problem with maintaining silver
objects in historic houses or in museum storage and exhibit spaces is that many construction
or decorative materials that might be used somewhere in the vicinity of silver off gas
hydrogen sulfide. Even very small concentrations of the pollutant
is going to being the tarnishing process, and can quickly turn silver black. Regular cleaning of your silver artifacts
is not a good solution, because that’s very staff intensive and time. Each cleaning is going to remove some silver,
and will eventually create damage to the surface. An example of an elegant, well-designed, well-described
experiment in conservation that was intended to address this problem can be found in a
paper by Grissom, et al, published in the journal of the American Institute of Conservation
in 2013. It focuses on the evaluation of coatings for
the protection of silver exposed to hydrogen sulfide. One thing that makes this an elegant experiment
is that they kept the focus on the most important factor for preventive conservation of historic
silver collections: the ability of coatings to protect silver from hydrogen sulfide exposure,
which is by far the most problematic pollutant for silver. They didn’t try to also test the effects of
other pollutants. They kept things simple so that the most important
factor could be clearly tested for. They used only sterling silver coupons of
one size and shape. They didn’t also try to test fine silver,
adding copper or lead, any kind of other geometries. They tested 12 coatings. All of them were given a stated rationale
as to why that coating was chosen for testing. They selected one standardized application
method. Each coating had four silver coupon replicates. The thickness of the coatings was measured
with repeated measurements around the coupon. Coupons were exposed to hydrogen sulfide for
125 days with periodic removal for continuous evaluation because they wanted to capture
when change initiated for each coating. Randomization was performed wherever it was
appropriate, such as in the selection of spots for testing and in the order of coupon placement
within the hydrogen sulfide chamber. Coatings were evaluated by visual observation,
image analysis of digital photos, gloss measurements and color imagery. Each step of the experimental design is very
clearly described with all rationales, implications, measurement and treatment steps described
with enough detail that there’s no need to guess as to what was done and why. As a result, some clear and useful results
were obtained that are going to help move conservation practice forward. It would also be easy for anyone else to replicate
the study or to carry it further. While there are many possible variations in
experimental designs, what’s most important to keep in mind are the basic ideas of aiming
for a simple, elegant experiment that will provide clear results, rather than trying
to do too much in one experiment and having very confusing results. Keeping focused on the very observation that
led to the research, and having a clearly defined research question or problem with
clear hypotheses, implications and rationales stated; defining the experimental units and
having appropriate replication; keeping clear what the test factors; then restricting these
to the most important, carrying out randomization so that uncontrolled factors can be spread
around and not affect interpretation and results; and then using statistical analysis where
needed to aid interpretations as needed. I recommend this paper as a good model for
an example where all of these issues are clearly stated. We’ll move on to the last part here where
we’re looking at first, briefly, single object studies, then at some variations in experimental
designs. Replication is usually important to identify
variability, and ensure that results of the experiment or study are generalizable to some
sort of larger populations. Sometimes you may have only a single object
available for study. Generally this is going to be the case when
you have a real cultural heritage object you’re working with, since experience with facsimiles
can always be replicated. With a single object study, you may primarily
be interested in making conclusions that are best for the treatment of that one object. Here, the generalizability to other objects
is of secondary interest. The most important tool for begin able to
design a statistically valid study using only one object is to find a way of randomizing
treatment in your control areas with one object. In essence, you’re making an object its own
control by treating it more than once using a formal experimental design. You set up a design where by there are several
times or places at which one might apply the treatment, then random assignment of treatment
to one of those times or places gives an experimental design that can be used for statistical analysis. It has probability. Especially if you can include blinding, then
you have a really good study. This is simply an extension of the idea of
test patches, but instead of haphazardly testing a few patches on an object, you randomize
which patches receive one or more test treatments and which ones remain as controls. This allows you to reliably compare the results. For qualitative assessments, it’s even better
if the assessment can be done by someone who’s knowledgeable enough to be able to make the
judgement, but who doesn’t know which patches received which treatments and which ones are
the controls. For qualitative results, you now have the
statistical probability assumption met, so you can reliably do statistical analysis for
comparison; an analogy as seen in medicine and the idea of test patches for testing a
person for allergic reactions. While there are also larger scale experiments
about general causes of allergies, a number of people in the population tend to have an
allergic reactions to specific food items or other specific exposures. In the range of allergic reactions, when an
individual person is having a problem, they see a physician to receive tests about their
own very specific allergic reactions. Test patches might be defined on their arm
or back with patches numbered, then the numbered used on a coding sheet gives information about
what possible allergens were applied to which patches. Placement is often randomized with replication,
and then the results are assessed by a patient or physician blindly with respect to knowledge
about which allergens were used with which patches. A case study of a single object in conservation
that followed this formal design is a paper by Eric Hanson and Stan [Derillian 00:28:43]. It focused on an 18th century tapestry woven
in wool and silk that was installed in the Huntington Library between 1911 and 1920. Their observations included that it was very
dirty and needed to be cleaned, but the silk portion was dry and brittle to the touch. While the effects of aqueous solutions or
dry cleaning solvents on the physical and chemical stability of silk hadn’t yet been
well investigated, there had been some reports of negative effects, especially of repeated
aqueous treatments. Another tapestry in the group that been recently
cleaned using a standard wet-cleaning technique with a non-ionic surfactant, that one they
found that after washing and drying, the silk appeared very differently. It seemed actually to be more supple with
less breakage, less damage on handling. Their research question then was: Does cleaning
cause a significant difference in tinsel properties of silk? The main hypothesis was that silk threads
could be more resistant to fracture after washing. The implication or rationale for the hypothesis
were: First of all, that tinsel properties of silk change with the pH. Second, reintroduction of water to dehydrated
silk fibroin might affect tinsel properties. Third, silk may undergo physical aging, but
those changes might be reversible if it’s plasticized by water to the extent that the
glass transition temperature is depressed near room temperature. Here, the experimental units were defined
as threads from one real 18th century woven tapestry that had never been treated before,
using multiple threads taken from within this single tapestry. A grid was superimposed over the tapestry,
then a number was assigned to each coordinate on the grid. Using a random number table, threads were
randomly selected for sampling. Replicate sampling was done before washing,
where part of the thread was sampled. After the tapestry was washed, they returned
to take more sample from the same thread or from an adjacent thread. In sampling of single threads of two-ply [inaudible
00:31:18] were unwound at the random locations and about five millimeters was removed. They ended up with 20 paired samples of before
and after treatment. There were two treatment groups defined: Treatments
and controls. The controls were the samples taken before
any washing occurred. The treated samples were those that were taken
after the entire tapestry had been washed. The test factors were: washed or not. Measurement for effects was done by tinsel
testing, and then interpretation by statistical analysis using analysis of variance. The results show that the threads were actually
significantly stronger following the washing procedure. The results pointed toward the need for more
basic research and the effects of washing it. They thought this could be especially relevant
to handling of archaeological textiles in [Sitsho 00:32:15]. They did recommend that there should be a
further study on long-term strength, or other potential negative effects of washing other
than the ones they tested for, and the effects of washing on other factors such as dyes. However, because they kept their experiment
simple, without trying to pull in other of these potential factors into one single experiment,
they were able to get clear and useful results. In preservation, single object studies can
also include a survey or assessment of the potential degree and type of problems found
in a library or a library system. In this case, the library itself is conceived
of as the single object. The results are intended to primarily provide
conclusions needed for making treatment plans within that one library. We’re not trying to generalize to other libraries
in general. Here, the observation might be that deterioration
is a crucial problem in large libraries where there’s a mix of age and type of collection
items. The research question might be: What are the
preservation and conservation needs of this particularly library? The experimental units would be real books
from that collection. The number of replicates you would need is
likely to be pretty large, because you can expect a lot of variation. If you think about all the different potential
types of items in the collection: bound journals, older deteriorated books, new books, lots
of different sorts of things. In order to capture the full range of variation,
you’re going to do a lot of replication. You could use a random number generator that
could use your already existing call numbers on file. If there are different types of collections,
you could do stratified random sampling. Some libraries, such as different departments,
might be under different environmental conditions. Again, you could do stratified random sampling
amongst those. This design could be used for observational
types of surveys where you’re qualitatively assessing. For each sample book, certain things such
as: Is the primary protector intact or not? Are the printed pages intact or not? Does there appear to be environmental damage
such as fading or water damage, or not? Is there immediate treatment needed? If so, what type? To get an overall profile of the deterioration
present and the preservation needs of this library. It could include some basic quantitative data
such as pH, but probably basic descriptive statistics such as calculating percentages
of each category of deterioration is likely to be sufficient for your analysis. The main thing that would be needed for reliable
results in this case is to be sure that all of the assessors receive joint training so
that you’re going to have consistency in the qualitative judgments. They use a randomized selection and high replication
for each potential type of library materials also means that you’re going to have results
that are likely to give you an accurate preservation and treatment needs of that library. A haphazard selection of books is never likely
to give you accurate results, because there are a lot of things that can go wrong. For example, if you were to assign the staff
to select, haphazardly, 1,000 books, someone might walk along the aisle and pick things
that are easily accessible. Others might be drawn to things that have
more attractive covers. Somebody else might think they should be selecting
the older, worn out looking books. Others might be attracted to selecting the
newer books. Either way, you cannot assume this selection
is going to be truly representative of the amount and type of problems found in the collection. Now, conservator is not likely to want to
design a true single object study for every object that’s treated, but sometimes, you’re
going to be working on an important object of which there are at least two treatment
options, and you’re going to want to be able to reliably choose which of those and have
confidence in your results. You may want to feel that you can prove to
the owner or curator that, in fact, the best option was indeed chosen. In this case, it could be worth spending the
time to define multiple patches, do randomization, and have a more formal test. Strictly speaking, the results of a single
object study are only applicable to that object. However, the more reliable conclusions can
give you a more solid foundation for planning a treatment for that object. Another advantage is that over time, a series
of completed single object studies can lead to a more general conclusion. Screening experiments are often used in medical
research, for example, to do initial screening of a lot of different compounds that might
be useful in fighting cancer. Those that appear promising are pulled aside
for more intensive, in-depth research. Those will be very quick testing. You don’t do replication generally in screening
test. They’re useful for forming hypotheses, identifying
possible treatments, but it always requires followup testing with replication before going
forward to anything to do with treatments. I’ve tried out a rapid corrosion test that
was originally developed in industry as a quick screening test for potential corrosion
inhibitors. Here, a small amount of water with an inhibitor
is held against the surface of the metal for 24 hours. The ability of the inhibitor to prevent corrosion
is assessed visually or quantified by visual analysis, then anything that looks promising
in this screening could then be used for more in-depth testing and evaluation by a variety
of other analytical tests that would take a lot more time and a lot more expense. A somewhat different goal for a screening
test is described in a recent Studies in Conservation article written by two conservators. They developed a rapid, simple test for detecting
chlorides that can be used during conservation surveys of collections that have a lot of
porous material such as stone, ceramics, or un-fired clay objects such as [inaudible 00:39:35]
forming tablet like we see here. These are situations where a large number
of objects have to be evaluated very quickly and at low cost. They developed a quick test with a soft brush. A small amount of surface dust is transferred
from the object to weighing paper, and then into a vile containing a silver nitrate solution. A white precipitant that’s best seen using
your smartphone light will form if chlorides are present. This test is intended to quickly confirm the
suspicions of an experience conservator, and detect surface chlorides that sometimes might
otherwise be overlooked. More precise quantitative tests that take
a lot more time and money could be added in for a selected smaller number of objects for
a backup confirmation test. Having the simple, low-cost rapid screening
test available can improve the conservator’s confidence in their observation or results,
and improve decision making in setting treatment priorities. A treatment trial is another type of study,
and the last one we’ll talk about. This one is developed in medicine. Medical and conservation research share the
problem that usefulness of laboratory experimentation is limited. Eventually treatments have to be carefully
tested in the context of a real practice; on human patients for doctors and cultural
materials for conservators. This will almost always mean an introduction
of additional, uncontrollable variables, but the results for careful experimentation with
real subjects are usually a lot more applicable, the treatment practice in general than the
results of less realistic laboratory tests alone. In medicine, the physician is the PI, or at
least a co-PI, rather than or in addition to a scientist, since only the physician can
determine the best treatment for a person, and only the physician can actually perform
the treatment on that person. In our case, the person who can best make
the treatment decision for an object and carry out the treatment is a conservator; the scientists
can’t do that. The medical profession has put a lot of effort
into perfecting and refining the optimum treatment trial procedures, and I think we ought to
be able to benefit from that experience. Treatment trials are really a response to
ethical concerns of applying treatments. The author of a book on treatment trials in
medicine, Clifford [Meinart 00:42:31] – his book is on your reading list – notes that
the history of medicine is filled with drugs, devices and other treatments that were originally
announced as great advances, but were later shown through treatment trials to be useless
or even harmful. These mistakes include thousands of prescription
drugs reviewed by the FDA with about 1/3 of them found through treatment trials to actually
be ineffective even though they looked very promising in laboratory and animal studies. For this reason, the medical profession has
devoted a lot of effort to developing and refining the concept of the treatment trial. I think that’s really the basis of modern
medicine. Today, no drug can be approved by the FDA
until it has been documented to be effective and safe in at least two separate clinical
trials. Clinical trials just don’t mean haphazard
application of treatments to see what happens. There are certain protocols that must be present
for a valid clinical trial. It’s a planned experiment that takes place
in a clinical setting rather than a laboratory setting, using multiple patients exhibiting
the same medical problem. An important aspect of the treatment include
that both groups – your treatment and your control – are going to be treated and followed
over the same time period. That takes away a lot of uncontrolled factors
that might show up if you try to use historic studies. The control can either be no treatment or
a standard treatment that’s already known to be effective and safe. A trial has to include enough patients so
you can adequately evaluate the results. Most clinical trials compare two treatments;
an average of about 25 patients in each group. This number is both generally practical and
doable in terms of time and resources. It’s also very important to clearly define
the class of patients, or in our case it would be objects, eligible for the study so that
other researchers can assess whether or not the results apply to their patients or objects. If you restrict eligibility too much, for
example, instead of saying, “We’re going to test photographs with mold,” if you restrict
it and say, “We’re going to test Albumin prints produced in a given 20-year period with a
particular species of mold,” it’s going to get more difficult to come up with enough
objects in a reasonable period of time, and that might end up requiring a multi-institution
trial. The important components of a treatment trial
are things that should all look familiar to you now, because these are the same things
that are important components of any other sort of experimental design. You have: randomization, blinding, controls. You’ve described your treatment protocols
in detail. You have good measurement protocols that are
going to be relevant to testing your treatments. You have adequate sample size and replication. You’re going to be describing your implications
and rationales for these at every stage. How the treatment trial is formulated and
evaluated is really interesting. It’s constantly reassessed and reevaluated
through specialized journals on the methodology of treatment trials and medicine. They’re also thick books focused only on clinical
trials, and they provide discussion about even the smallest detail of a clinical trial
design. There’s also the Society for Clinical Trials
that helps to provide constant evaluation and improvement of methodologies. For example, some issues they consider include:
additional design thought that has to go into the process if you have multi-center trials
versus single-center trials; if these are needed to increase sample size. With multiple centers and a lot of different
people involved, it becomes extremely important to have detailed forms for patient evaluation
with training sessions and pilot studies to make sure everyone is following the same consistent
protocol before beginning a larger study and more expensive clinical trial. If you think back to that study of milk with
schoolchildren that I talked about, if they had done a pilot study before they launched
into their study of 10,000 treated children and 10,000 control children, they would’ve
very quickly found that there was a major problem with allowing teachers to undo the
randomization and make their own assignments. They could’ve found that out before doing
the entire 20,000 children study. Books such as these also note that it’s crucial
to avoid having to change your forms and procedures midway through the trial. Again, a pilot study will help you refine
what your procedures are. If you didn’t do a pilot study and you’re
partway through, it’s better to continue the way you started because you can always do
a followup trial in a different way if you changed your mind partway through about what
should be done. They also talk about the need to avoid undisciplined
data collection, the idea that, “Oh, while we’re at it, why don’t we record everything
we can think of to say about each patient or object? Because someday we might want to ask additional
questions about the data.” You shouldn’t have too many secondary projects
going on. What often happens with clinical trials, the
main question is you’re trying to do a trial for cancer cures, for example, sometimes one
of the investigators is also interested in using that same set of patients to study the
relationship between smoking and baldness. Another wants to look at coffee drinking and
high blood pressure. This very quickly becomes too complex and
you lose track of the original questions and data that you were trying to collect. You also don’t need to worry too much about
whether you’re going to have sophisticated research analyses. If the research questions can be answered
through simple observations and simple measuring techniques, that’s fine. What’s important is good design, appropriate
treatment protocols and outcome measures and clear analysis and results. Basically, we’re looking at the same sorts
of issues that are needed in any experiment. A simple elegant one that gives a clear result
is always best, and over complicated with trying to do too much in one experiment tends
to produce un-clarity. This just becomes more important when you
have multi-center treatment trials where many researchers may be involved. Now, there are obstacles in applying this
to conservation. In fact, I’m not personally aware of any true
treatment trials having been carried out in conservation and preservation. I’ve seen a few studies label treatment trials,
but these have not contained the research design elements that I just mentioned. One problem may be funding, because true treatment
trials usually are going to be multi-year projects. If they require participation and coordination
of multiple centers so that you can enroll enough objects into the project, that requires
even more funding. This level of funding is harder to come by
in conservation than it is in medical research. The NIH and drug companies will fund large
scale multi-year treatment trials, but that’s difficult to find funding for in conservation. I think they could be a very useful tool to
add to the experimental design arsenal of cultural heritage conservation and preservation. This concludes our survey of scientific method
and experimental design in preservation and conservation research. Thank you for your attention. I hope to be reading about many more wonderful
research designs in the future, coming from you. I also want to think NCPTT again for hosting
this webinar and making it available. If there are any additional questions-

Leave a Reply

Your email address will not be published. Required fields are marked *