Firefighter Physical Ability Testing

Validation of Firefighter Tests

There are two basic approaches to testing firefighters. The first method is known as the work sample test and is often misnamed as a "physical agility test." The second type of physical test uses measures of human physical ability such as sit﷓ups, push-ups, pull-ups, grip strength, 1.5 mile run, one-mile walk, etc. to assess those abilities related to the job. Many types of ability tests are available that measure various known aspects of human performance. Both methods have been successfully useful for employment selection. There are advantages and disadvantages to both types of tests. Work samples can be criticized because they expect job applicants to be physically capable of performing firefighter functions prior to even a single day of training. On the other hand, since these tests "look like the job," they are challenged less frequently. Tests of human ability such as grip strength or the one mile walk are safer than work samples but require additional research steps to link them to the job. Using both types of tests in a validation study can offer the fire department a great deal. If a set of work samples were developed based on the job analysis and the physical abilities to perform those work samples were identified, a set of ability tests could be simultaneously developed. For example, a common firefighter work sample is to quickly ascend and descend several flights of stairs while carrying a hose pack. Scores on this work sample should correlate highly with a treadmill test, the one mile walk, and the 1.5 mile run. Another common work sample is a dummy drag. Scores on the dummy drag should correlate highly with performance on tests such as the bench press, grip strength, and isometric strength measurements. Work samples and the ability tests can be validated at the same time. A group of incumbents would take both the work sample tests and the ability tests. Scores could be compared to determine the relationship between the ability tests and the work sample tests. The work sample tests could be used for selection and the ability tests could be used in training and fitness program development. When a given firefighter has a poor score on a bench press or grip strength test a clear training regime can be developed. The training prescription might include be weight training where progress is periodically assessed. Another value to the ability test validation method involves the treatment of injured workers. For example, it is well known that physicians have a difficult time with the evaluation of the functional capacity of firefighters. How much can they lift? Can they do the job? By administering simple ability tests, the functional ability of the firefighter can be predicted facilitating return to work or advising against it. This is another example of how the relationship between ability test scores and work samples can be useful since scores on safe ability tests can be used to predict performance on less safe firefighter work samples.

Adverse Impact

Because most females do not perform at the same level as most males, almost any test of physical ability is vulnerable to charges of sex discrimination. In order to survive such challenges, employment types of tests require validation studies. It is no exaggeration to state that without a validation study, no physical ability test could survive a competent court challenge. Judges rely primarily on three sets of standards in assessing the validity of tests. These include the Uniform Guidelines on Employee Selection Procedures or the professional testing standards promulgated by the APA and SIOP. The Guidelines prohibits employers from testing for skills that are easily trainable. Thus it is important to test for skills and abilities that firefighter applicants must possess on the first day on the job. If firefighters are trained on a task after, say, six months on the job, such a task would be a poor item on which to assess persons who are unfamiliar with firefighting in a selection test. Any validation study must address:
  • Job-Relatedness. Is there a documented linkage between the results of the job analysis and the content of the work sample test?

  • Reliability. Will the scores of individual test takers significantly differ should they retake the test? Tests which show a high variation in scores with the same individuals upon repeated administrations are unreliable and not valid. Work samples can be highly susceptible to learning effects and intra-trial reliability analysis can assess each element of the work sample on this critical dimension.
  • Validity. To what extent does the test measure behaviors that have been shown to be critical for successful job performance?

  • Efficiency. Can the test be easily administered to large numbers of job applicants few as small a number of proctors as possible?

  • Safety. Is the test likely to injure job applicants? Are there less risky alternatives?

  • Scoring. Is the test scored unambiguously? Successful performance on a test should be objectively measured on not on the basis of how an applicant appears during the test.

  • Simplicity. Test that have clear and simple directions are preferred over tests with complicated or convoluted instructions.

  • Training and Techniques. Tests which measure job behaviors that can be improved upon with minimal training must be avoided since not all applicants will have an equal access to such training. In addition, techniques which will be learned during a formal training process are not suitable for employment testing since the employer intends to conduct training for those very skills.

MED-TOX can validate work sample tests or physical ability tests. For firefighters, a content valid work sample test is the least expensive type of test to develop. Work sample tests are also preferable since they allow for individuals to be tested while wearing turnout gear and SCBA.

CPAT and the Importation of Tests from Other Agencies

Tests imported from other jurisdictions are probably more difficult to justify than if the employer conducted its own study. This is because the Uniform Guidelines on Employee Selection Procedures stipulate three conditions for transporting selection standards from one job to another job not included in the initial validation study. These three requirements are: 1) validity evidence - Does the available study demonstrated that the selection procedure is valid?; 2) job similarity - Does the job analysis information from both jobs show similar content?; 3) fairness evidence - Does the study address fairness for each race and gender group? This requires additional work in matching the firefighter job in the new jurisdiction to the original study.

Several fire departments have adopted the CPAT test.  Fire Departments should consider both the advantages and disadvantages of transporting the CPAT Test

Advantages include:
  • In some regions, fire departments and local community colleges have teamed up to permit the local college to perform the testing and certify that applicants have passed the test. In other areas, regional testing centers exist whereby the fire department can contract out the testing for its applicants.

  • There is a wealth of internet resources that can be accessed which provide training materials on how to prepare for the test.
Disadvantages include:
  • The test is the most expensive ever developed at nearly one million dollars. The equipment costs for the local department are extremely high and can run from $25,000 to $28,000.

  • Unfortunately, one of the CPAT's greatest weakness is that the task analysis included only 23 physically demanding tasks to cover the entire firefighter occupation. This is one of the smallest sets of firefighter tasks ever developed in a validation study. Matching this limited number of tasks to a local fire department can be difficult, especially if that local department has tasks that are even more physically demanding than the CPAT's 23 tasks.

  • The test is used by many departments. If one of these departments loses a court case, this loss can mar the test for every other user by casting doubt on its validity. The test could fall and individual departments would also be in a difficult situation (like a stack of dominos) if a court aggressively attacked the methodology.

  • The CPAT is based on a very tiny sample of roughly 60 individuals including a handful of fewer than ten females. For a large expensive test, the number of subjects was extremely small.

  • The test was never assessed for reliability. Work sample tests are notoriously susceptible to training effects. This means that an applicant who takes the test one day may perform far better on a second trial simply by learning something about the test events. This can be a serious problem since the scores rendered by the test are unstable.

  • Departments that use the test report that it has adverse impact against females. This means that the burden of proof in defending the validity of the test is left to the local department casting doubt on the original reasoning behind transporting the test in the first place. 

  • Administration can be complicated. There are many rules and the test equipment must be identical to that specified in the validation study. If a fire department does not use the exact same equipment, problems can arise in justifying transportability.

  • The jurisdiction must apply for a license from the IAFF to use the test.

The navigation of physical ability testing can be difficult for jurisdictions of any size. 

Determing the Passing Point

The passing point should be set in a manner consistent with professional recommendations.  The MED-TOX methodology for setting the passing level is based on multiple forms of evidence collected in the job analysis. One important aspect to setting pass points on all public safety physical ability tests is the need to link the test scores to the requirement that tasks be completed in a certain length of time. Most job analysis methods fail to assess this important attribute. MED-TOX has design a formal job analysis approach that ensures that speed and proficiency are linked to the test pass point. The form here shows how incumbent data can be collected in the job analysis to assess this critical apect of the job.

