Hutchins Library

Blue Line

Bibliographic Instruction Program Evaluation


Blue Line



BIBLIOGRAPHY EVALUATIONS
for
SENIOR REQUIREMENT PAPERS


The Process


Initially, the papers for the bibliography evaluation section of this overall evaluation process were to have been gathered as a part of the focus group interview process (See paragraph following Question 12 in the script, Appendix D.1). The number of papers gathered through this process was insufficient for our evaluation process, so the decision was made instead to obtain the final papers for the five fall 1992, sections of the Senior Requirement course. In the spring of 1992, discussion was begun regarding the process to be used in the evaluation of the bibliographies and papers. The discussion continued throughout the fall of 1992.

Early in the spring semester of 1993, the criteria to be used in the evaluation of the papers and bibliographies was finalized (See Appendix E.1) and an assessment tool was developed for use in the evaluation (See Appendix E.2). Each of the papers was assigned a number and all identifying marks (i.e., name, section number, etc.) were removed. Student reference assistants completed the top section of each evaluation check list for each paper, allowing the readers (instruction librarians) to concentrate on the evaluation process itself. Each of the four readers received one-fourth of the papers in the initial process. As each reader returned a paper with the completed check list, the data was recorded in a database, a second check list was attached and the paper was sent to the second reader. The initial group of a reader's papers were equally divided among the remaining three readers as second readers so that all papers were distributed equally. Papers from each of the five sections were also evenly divided among readers.

After both readers' scores were entered into the database, if a difference of 2 or more 'points' occurred on any scale, the paper was then sent to a third reader for evaluation only on the scale(s) indicated. Twenty-five papers were read by two readers and 36 were read by three. Whether the paper was read by two or by three readers, the average score for each scale for each paper was calculated and recorded in the database.

Of the 74 papers received, 13 were removed because they were judged as being reflection papers and not appropriate for this evaluation process. The remaining 61 papers each had six scales, for a total of 366 scales. Of that total, 51 scales had scores with a difference of two or more (14%). The following table shows the figures for the scales where differences occurred.



Differences of two or more
      No. of                   Criteria Scale              Percentage
      scales

        16                Integration of citations             31%

        12               Appropriateness of sources            24%

        10              Appropriate form of citations          20%

         8                   Currency of sources               16%

         3              Appropriate number of sources           6%

         2               Variety of types of sources            4%



In several cases, the scores given on a particular scale by each of the two initial readers differed by three 'points.' Those occurrences are detailed below.



Differences of three
         Scale                            No. of Occurrences

         Appropriate form of citations          5 times

         Integration of citations               4 times

         Appropriateness of sources             1 time



Two particular concerns in conducting this segment of the evaluation were that the criteria developed for the bibliography assessment would be an accurate reflection of the quality of the bibliography and that the criteria would be evenly and consistently applied by all readers. The individual graphs for each scale show well-balanced bell shaped curves, with the exception of the first scale, appropriateness of sources. (See Appendix E.3. This Appendix is not available as a WWW document - please contact the authors for copies.) The curve for the scale "appropriateness of sources" shows a mean of 3.598, a standard deviation of .615 and mode of 3.5, meaning the majority of papers scored above the average score of 3, or "appropriate" on this scale. There are two explanations for this trend: the papers were all above average as measured by this scale and the readers were accurate in the assignment of scores to each paper in this area, OR the readers were not as comfortable assessing this aspect of the papers and consequently the scores for this scale were inflated. With the exception of this one scale, the statistics indicate that the readers were fairly successful in their attempt to apply the criteria in a consistent and even manner.


Results and Analysis


For each of the six scales, several types of statistics were calculated, including mean, median, mode, standard deviation and variance, as well as a few others. The scale with the highest mean score was appropriateness of sources (3.598), while the lowest mean score was recorded for the scale measuring variety of types of sources (2.303). In addition, correlation coefficient analysis was done between all the combinations of scales. The charts that include this information in detail are included in Appendix E.3. (This Appendix is not available as a WWW document - please contact the authors for copies.) However, the following chart summarizes the section on correlation coefficient statistics.


Correlation Coefficient Statistics
             APP NBR        FORM        INTEGR.      CURR.      VARIETY

APP SRCS       .67          .02          .636        .399        .386
APP NBR                     .001         .526        .542        .626
FORM                                     .028        .24        -.071
INTEGRATION                                          .26         .448
CURRENCY                                                         .434



There appears to be a high positive correlation between scores on three scales in particular: 1) appropriateness of sources and appropriate number of sources, 2) appropriateness of sources and integration of citations, and 3) appropriate number of sources and variety of types of sources. A high score on the scale "appropriateness of sources" seems to predict a high score on both the scale "appropriate number of sources" and the scale "integration of citations."

A moderate correlation was seen also between scores on the scale "appropriate number of sources" and scores on both the scales "integration of citations" and "currency".

The weakest correlations all involved the form of citation scale. Its correlation with both the scale "appropriateness of sources" and the scale "appropriate number of sources" was very low (.02 and .001 respectively). Moreover, the correlation between this scale and all other scales was weak. The score received on this scale appears not to have any direct "correlation with any other scale.


Conclusions


While the students appeared to do a slightly better than average job in selecting appropriate sources for papers, the readers were disappointed with the overall level of scholarship evidenced in these papers. Higher level skills such as the integration and solid support of ideas through documentation of sources were too infrequent in occurrence. Use of a consistent form of citations, which would seem to be the most concrete or basic of the skills measured by this segment of the evaluation process was disappointingly lacking in too many papers as well; although the mean score of 3.275 (5 point scale) would not indicate complete failure, it is a lower score than what the readers had expected.

Along with the disappointing scores evidenced on the form of citation scale, another low mean score was evidenced on the scale measuring variety of types of sources used. Both the mean score (average) of 2.303 and the mode (most frequent response) of 2 reflect use of sources of two types, most commonly books and periodical articles. Other types of sources one might expect would be government documents, parts of reference sources, essays in books, etc.

While improvement should be encouraged in each of the areas measured in this section of evaluation, these two parts of the research process, citation format and variety of sources should be the first to be addressed. The library staff, working closely with the teaching faculty should stress the importance of consistency in citation format. Enhancing the students' awareness and knowledge of the variety of sources available as reference and research tools should also be a focus in future library instruction.



Susan_Henthorn@berea.edu
mroyse@utk.edu

Blue Line

Hutchins Library Bibliographic Instruction Program Evaluation Home Page
Hutchins Library Home Page | Berea College Home Page


Updated 5/23/17
Mail comments or questions to susan_henthorn@berea.edu