Work Product Descriptor (Artifact): Controlled Experiment Protocol

According to Wohlin et al. “experimentation is not something simple, we need to prepare, conduct and analyze experiments properly” [Wohlin:2012].

Therefore, this experimental protocol presented in this process seeks to assist in the construction of the usability evaluation protocol, as well as to obtain a successful experiment.

The protocol follows the steps presented by Wohlin et al. and Jedlitschka et al. with the addition of some information regarding the evaluation of usability.

Experimental Scoping

1.1. Experimental Goals

Study Object (Activity E4)
HCI Evaluation Describe the usability evaluation was executed (Activity P5)

The scoping follows the framework by Wohlin:

Object of Study (What is studied?) (DSL)

Purpose (What is the intention?) (Evaluate)

Quality focus (Which effect is studied?) (Activity P6)

Perspective (Whose view?) (Activity P1)

Context (where is the study conducted?) (Activity P9)

The scope of the experiment is set by defining its goals. The purpose of a goal defined is to ensure that important aspects of an experiment are defined before the planning and execution take place. By defining goal of the experiment are defined according to template below [Wohlin:2012]:

         Analyze <Object(s) of study>

          for the purpose of <Purpose>

         with respect to their <Quality focus>

         from the point of view of the <Perspective>

         in the context of <Context>

Objective of the Measurement (Activity E4)

What do you want to measure? (Activity P6)

Research Questions (Activity E4)

Questions about the research.

metrics do you want analyze

2. Experiment Planning

2.1. Context Selection (Activity P9)

Off-line vs. On-line

Student vs. Professional

Toy vs. Real Problems

Specific vs. General

2.2. Hypothesis Formulation (Activity E4)

The experiment definition is formalized into hypotheses. Two hypotheses have to be formulated:

Null Hypothesis - A null hypothesis is represented by H0. This hypothesis states that there is no evidence, evidence or patterns that confirm this statement in the scenario of the experiment, and the only reasons for the differences found in our observations are coincidences.

The null hypothesis is one that the experimenter wants to reject with the greatest possible meaning. We can exemplify this hypothesis when a new DSL finds on average the same number of failures as the one the experimenter is comparing. i.e.

H₀ : µ _N old = µ _N new. where μ denotes the mean and n is the number of faults found. [Wohlin: 2012]

Alternative Hypothesis - it is represented by Ha, H1, …, is the hypothesis that is accepted when the null hypothesis is rejected. For example, when evaluating a new DSL, or a new version of a given DSL and the average of failures found is lower than the old DSL, or the old version, in this case the null hypothesis is rejected and the alternative hypothesis, i.e. H₀ : μ _N old < μ _N new

2.3. Variables Selection (Activity E4)

Before any design can start we have to choose the dependent and independent variables [Wohlin:2012].

Dependent variables - refer to process output and are identified by result. For exemple: the number of errors found in the execution teh DSL and the number of faults found per time unit;

Independent variables - refer to the input of the experiment process and present the cause that affects the result, calling the treatment. For example: two DSL will test.

2.4. Selection of Subjects (Activity P1)

Determined by the Activity P1 by the Usa-DSL Process

2.5. Experimental Design (Activity E4)

2.5.1. General Design Principles

When designing the experiment, it should be considered the general design principles. These principles are composed by randomization, blocking and balancing, as well as their combination.

Randomization - “The randomization applies on the allocation of the objects, subjects and in which order the tests are performed. Randomization is also used to select subjects that are representative of the population of interest” [Wohlin:2012].

Blocking - Is used to systematically eliminate the undesired effect in the comparison among the treatments (DSL). This technique increases the precision of the experiment. For example: When the participants of the experiment are different, experience should be used as a blocking, that is to separate participants into groups. If some of participants have experience in DSL in a domain in particular and some have not. To minimize the effect of the experience, the persons are grouped into two blocks (groups), one with experience in DSL and one without.

Balancing - The balance of treatments is given by assigning the treatments a number of subjects. This happens when the Process Executor defines this principle of balance to the total of participants equally among the groups. “Balancing is desirable because it both simplifies and strengthens the statistical analysis of the data, but is not necessary” [Wohlin:2012].

2.5.2. Standard Design Types (Activity E4)

One factor with two treatments - In this experiment, we want to compare the two treatments against each other. The treatments are new DSL vs. old DSL, the factor is a DSL.

One factor with more than two treatments - In this experiment, we want to compare all the treatments, the participants are assigned randomly to the treatments. The factor is a DSL.

2.6. Instrumentation

Instruments of the experiment (Activity P7 and P8 and P3)

Profile data (Activity P1 and E1)

2.7. Validity Evaluation (Activity E4)

A fundamental question concerning results from an experiment is how valid of the results. For this are considered four threats validity: conclusion, internal, construct and external.

Conclusion Validity: This validity is related to the question that affects the ability to make a correct conclusion about the relationship between treatments (DSL) and the results of the study.

Internal Validity: Threats to internal validity are influences that can affect factors. In case a relationship is observed between the treatment and the results it is important to make sure that this relationship is casual and that it is not a result of a factor that we have no control or do not measure.

Construct Validity: this validity concerns the relationship between theory and observation. That is, if there is causality between cause and effect, we must ensure that the treatment (DSL) reflects the cause construct and that the results of the study reflect the effect construct. With this we try to minimize the measurement bias.

External Validity: this validity relates to the generalization of the study, or to the ability to generalize this experiment outside the scenario that was applied. This validity can be affected by some choices, for example design of the experiment, selection of wrong participants, running in a wrong environment and with a time that can affect the results.

3. Operation

The operation section presents the stages of execution of the experiment. In this section are described: the preparation and execution of the experiment, as well as the validation of the data obtained for analysis.

3.1. Preparation

In the preparation we must consider two important aspects: selecting and informing the participants and preparing the material (Guides, Questionnaires..).

3.1.1. Commit Participant

Invite participants and convince them to participate (Activity P1)

Check the availability of Participants (Activity E5)

Confirm Evaluation Date and Time (Activity E5)

Confirm Receipt of Online Questionnaires (Activity E5)

3.1.2. Instrumentation

Select the Instruments Data Gathering (Activity P7)

Select the Instruments Training (Activity P8)

Select the Informed Consent Term (Activity P2)

Evaluation Place Definition (Activity P9)

Organize the Instruments and Equipments for Evaluation (Activity E5)

Choose the Data Storage (Activity P10)

Analyze the documentation to be used in the evaluation (Activity A11)

Review the Evaluation Study Protocol (Activity E4)

3.2 Execution

In this section will be executed the activities that were previously planned and those that are listed in the sequence.

3.2.1. Data Collection

Introduce Consent Term or Introduce Data Access Agreement (Activity E2)

Collect Signature of Subject (Activity E2)

Apply Profile Questionnaire (Activity E1)

Complete Questionnaire Pre- Evaluation (Activity E1)

Deliver the DSL Guide (Activity E8)

Deliver the Use Scenario (Activity E8)

Conduct training to DSL (Activity E8)

Perform Modeling of the use Scenario (Activity E9)

Complete Questionnaire Post-Evaluation (Activity E9)

4. Analysis and Interpretation

This section will analyze all the data collected, the profile of the participants, the tasks performed, the post-evaluation questionnaire, among others. The data should be tabulated so that conclusions on the results of the study can be drawn later, and thus the correlation between the knowledge of the participants and the data collected can be made.

At the first moment in the interpretation of the data, descriptive statistics, analysis of visualize central tendency, dispersion, etc. In a second moment the reduction of the data set should be performed by checking abnormal or false data points, hypothesis testing shall be carried out with the aim of rejecting or accepting the null hypothesis.(Wohlin:2012)

Analysis of the Profile Subjects (Activity A1)

Analyze the data collected during the evaluation (Activity A7)

Analyze the images and logs (Activity A7)

Analyze the developed models (Activity A9)

Verify the error rate committed by the participantes (Activity A9)

Verify the tasks incompleted (Activity A9)

Perform data standardization(Activity A7)

5. Presentation and Package

After completing the experiment, everything that was done during the process must be stored, such as protocol, guidelines, artifacts, data, among others. The results should also be presented to different audiences, through paper for conference or journal, to the report decision-makers or as educational material.

This process suggests the structure proposed by Jedlistska and Pfahl to report experiment, which can be seen in the Experiment Report.

Present the evaluation according to the scientific paper template (Activity R11)

Present the evaluation according to the report template (Activity R11)