The Road to Validity #2 (Assessment Series)

Validity Techniques

In my previous post, I highlighted some of the literature that discusses pre- and post-test assessment. In this post I will discuss the challenges of creating an effective online questionnaire. One thing I have learned over the years is that acquiring quality data can only occur by creating and using valid and reliable assessment tools. For online questionnaires, poorly written questions will throw off the results and make analysis almost impossible.

The reason we create and administer questionnaires is to help us find answers to broad overarching questions. If a library, a team, or an individual puts forth effort to gather data and potentially publish the results, it is far better to take the time up front to develop well-crafted questions rather than to find out after the fact that the data gathered cannot be used because the original survey questions were not the appropriate questions.

Validity of questions, and ultimately an assessment tool, ensures that the data gathered represents the stated purpose and goal. The questionnaire as a format for research, while having many benefits, has several disadvantages. One of the most significant drawbacks is that they often have poorly formed questions due to the ease in which these instruments can be developed.[1] Valid questions are clearly written and eliminate any possibility that the individual taking the assessment could misinterpret or be confused by the question.[2] Alreck and Settle state that survey questions should have focus, brevity, and clarity.[3] Multiple-choice questions are particularly prone to being invalid if the question writers are uninformed about what makes a question valid or invalid.

Developing overarching questions before gathering data is imperative and a good technique to writing appropriate questions. For instance, if an overarching question is whether or not students’ skills improved after an instruction section, asking students if they liked the session and instructor’s teaching style cannot answer that question. When I started at my current library, one of our first attempts at instructional assessment was to create a questionnaire with pre-existing tutorial quiz questions because we thought this would be a time saving measure. Even though, generally, we knew we wanted to know if students’ knowledge had improved after a library session, we didn’t take the time to create overarching questions and identify what it was we really wanted to know. Not surprisingly, once we started looking at the data, all we really could conclude was the number of questions students got right and wrong. The tutorial quiz questions, as written, were out of context and the library sessions didn’t specifically address many of the skills or competencies linked to the questions. Also, in discussing the data, it became clear that everyone on the development team had different opinions about what we were supposed to be measuring.

When embarking on the development of an assessment questionnaire it is important to be aware of the different levels of assessment: Classroom Assessment, Programmatic Assessment, and Institutional Assessment.[4] The data gathered for each of the assessment levels tells a different story. A mismatch between the assessment level and what questions need answering has numerous consequences but primarily produces invalid data and the inability to conduct proper analyses.

Administration options within classroom assessment are quite numerous. Radcliff, et al. points out that these can be categorized in the following ways: informal assessment, such as observations or self-reflection; classroom assessment techniques (CATs); surveys; interviews; knowledge tests; concept maps; performance and product assessments; and portfolios.[5] Each of these has advantages and disadvantages. Informal and CATs, often used within library instruction settings, fit well into a one-time guest lecture scenario, due to quick, easy administration and analysis but the drawback being the difficulty in gaining a well-rounded picture of students’ skill set and transference. Assessments such as interviews and portfolios, while providing the most in-depth data, require significant amounts of time for data gathering and analysis. Other types of assessments, like surveys and knowledge tests, can address the time factor and often provide more information than informal or CATs. When administered as a pre-/post-test, acquisition or improvement of skill sets can be tracked.[6] However, depending on administration and data analysis, these assessments may or may not address the question of transference to other courses or within real-life scenarios.

One Example of a Validity Process

Being aware of all of the different assessment options just for classroom assessment was really important for the instruction department at my library. When developing a questionnaire, understanding the strengths of each helped bring into context what questions we could realistically ask and answer. Since an online questionnaire was really our only administration option, we concluded that the questions needed to be in the form of a knowledge test and given as a pre- / post-test.

We chose a knowledge test because the questions in this type of assessment do not focus on self-reporting of skills by students or self-efficacy of capability, nor do they focus on the effectiveness of an instructor. Instead, they focus on specific knowledge, competencies, and skill sets.  For administration, a time series design was selected as it utilizes giving several posttests over a designated time period after the pre-test and library instruction sessions conducted.[7] This would provide the opportunity to gather data of student knowledge at specific times of the academic year and if developed well, involve a minimal time commitment on the part of the course instructors and students.

However, we didn’t just want multiple-choice questions, as we were also interested in getting some insight into how students approached researching a specific topic. We decided to create a two-part questionnaire, Part A that included multiple-choice questions and Part B that gave students a scenario and open questions asking them to describe how they would research the scenario. To avoid the same situation of gathering completely invalid data, we engaged in two activities. One was using a validity chart on the multiple-choice questions and the other was to map the questions to our established learning outcomes.

After our first attempt, we did rewrite several questions but there was uncertainty if they were written appropriately. One statistical validity/reliability analysis that is often used on questionnaires is Cronbach α (alpha). However, we wanted to keep the number of multiple-choice questions to ten, which is too small of a set for a Cronboch α or other validity analyses. The team used a slightly modified validity chart from Radcliff, et al.[8] The validity chart is a yes/no checklist that clarifies how questions should be constructed and what types of answer options should be present (see Figure 1). Any presence of a ‘no’ is an indication that the question is invalid and should be rewritten to generate all yeses.

FIGURE 1—Example of the modified validity chart checklist using one of the original assessment questions[8]

modified validity chart

Applying the validity chart to all of the multiple-choice questions revealed that none of them (even the rewritten questions) were valid. It was a great exercise in revealing assumptions on our part as librarians and really forcing us to clarify what we really wanted to ask. All of the questions were rewritten until they generated a check in every Yes box. Once the validity chart showed that all of the questions were clearly valid, we still did some wording revisions to increase reader comprehension. During this process we discovered that one question when first revised was valid, but advances in library database search algorithms ended up making the question invalid in that two answer options became correct instead of just one. This reinforced the need to regularly check the questions and answer options to make sure they are in-line with current tools and services. Below is an example of how one question was revised from its original version to final version (Figure 2).

FIGURE 2—Example of how one question was revised over the course of the tool development

outcomes map

The other activity we did was to make sure the questions were linked to our learning outcomes. To verify this, each question was mapped to one or more learning outcome. Our initial mapping revealed that the first set of questions developed was almost completely grouped with the outcomes that concentrated on finding and searching for information. Other outcomes, such as addressing source types and plagiarism, were completely omitted from the question set. The question set was revised to encompass all of the outcomes. After a second review, some slight adjustments were made to the questions to create an even stronger alignment between the questions and outcomes.

Even though we did not utilize extensive reliability and validity testing of this assessment instrument, the processes we used served our needs at the time. Should there come a time where the University wants to have a standardized assessment instrument of research skills, this process will better situate the library to evaluate or develop such an instrument. In consulting the literature during our validation process, I did come across some good articles (listed below) that articulate more rigorous validation and reliability processes.

Recommended Articles on Validity Testing

Ondrusek, Anita, Valeda F. Dent, Ingrid Bonadie-Joseph, and Clay Williams. “A Longitudinal Study of the Development and Evaluation of an Information Literacy Test.” Reference Services Review 33, no. 4 (2005): 388-417. doi: 10.1108/00907320510631544

Ondrusek, et. al. discussed the development of an online quiz associated with a group of online tutorials as part of their University’s first-year orientation seminars. In this article, the authors highlighted how the quiz went through multiple iterations and testing to develop valid questions in addition to using various statistical analyses such as score summary, standard deviation, and item analysis for test reliability. This extended and thorough development process helped establish the assessment within the university curriculum.

Mery, Yvonne, Jill Newby, and Peng Ke. “Assessing the Reliability and Validity of Locally Developed Information Literacy Test Items.” Reference Services Review 39, no. 1 (2011): 98-122. doi:

Mery, Newby, and Peng described the methodology used in the development of an information literacy test associated with an online credit course. To determine validity and reliability, they used classical test theory and item response theory in correlation to SAILS test items. The data was gathered over two semesters and administered as a pre- and post-tests to students enrolled in the course.

Cameron, Lynn, Steven L. Wise, and Susan M. Lottridge. “The Development and Validation of the Information Literacy Test.” College & Research Libraries 68, no. 3 (2007 2007): 229-36. doi: 10.5860/crl.68.3.229. <>

Cameron, Wise, and Lottridge reported on the development of the James Madison University Information Literacy Test (ILT) and the methods used to create a reliable and valid instrument. The questions where based on the original ACRL Information Literacy Competency Standards. Their statistical analysis included content validity and construct validity. Additionally, they used standard-setting methods to determine expected proficiency levels and performance standards so the test could be administered across a variety of student cohorts.

Mulherrin, Elizabeth, and Husein Abdul-Hamid. “The Evolution of a Testing Tool for Measuring Undergraduate Information Literacy Skills in the Online Environment.” Communications in Information Literacy 3, no. 2 (2009): 204-15.[]=Vol3-2009AR12

Mulherrin and Abdul-Hamid provided an overview of the processes taken to develop a valid and reliable final exam for the information literacy credit course provided as part of the general education curriculum. As with similar articles, the authors discussed the use of the content and construct validity, item difficulty and discrimination, Cronbach α (alpha), and item characteristic curves (ICC) analysis methods. A clear and ongoing theme in these articles is the importance of using reliable and valid instruments when conducting large-scale assessment.


  1. Bill Gillham, Developing a Questionnaire (London: Continuum, 2000).
  2. Linda A. Suskie, ed., Assessing Student Learning: A Common Sense Guide, 2nd ed. (San Francisco, CA: Jossey-Bass, 2009).
  3. Pamela L. Alreck and Robert B. Settle, ed., The Survey Research Handbook. 3rd ed. (Boston: McGraw-Hill/Irwin, 2004).
  4. Elizabeth Fuseler Avery, “Assessing Information Literacy Instruction,” in Assessing Student Learning Outcomes for Information Literacy Instruction in Academic Institutions, ed. Elizabeth Fuseler Avery (Chicago: Association of College and Research Libraries, 2003).
  5. Carolyn J. Radcliff et al., A Practical Guide to Information Literacy Assessment for Academic Librarians (Westport, CO: Libraries Unlimited, 2007).
  6. Carol McCulley, “Mixing and Matching: Assessing Information Literacy,” Communications in Information Literacy 3, no. 2 (2009),
  7. Alreck and Settle, The Survey Research Handbook, 414.
  8. Radcliff et al., A Practical Guide to Information Literacy Assessment for Academic Librarians, 94-95.
This entry was posted in assessment. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s