Overview of Physical Ability Testing


The need for physical testing for workers in manual materials handling jobs has been recognized by risk managers, personnel specialists, physiologists, occupational physicians and ergonomists increasingly over the past few years. Each of these groups have independently come to recognize the benefits to both the individuals being tested and the organizations themselves. Risk managers have an interest in job safety and reducing workers' compensation costs. Personnel specialists seek to hire the most qualified individual available, reduce absenteeism and sick leave, and yet comply with state and federal EEO mandates. Physiologists and physicians seek to reduce unnecessary injuries and find ways to better predict in advance those most likely to become injured. Ergonomists seek to study individual jobs and find ways through either job redesign or job selection systems that can better match the worker to the work.

Cognizant of these issues, MED-TOX Health Services has developed an approach to assist employers in the validation of physical ability tests for new hires. Since overexertion injuries account for a large number of all work-related back injuries, it makes sense to reduce the potential for overexertion. Hiring workers with the adequate strength to perform the job is one way of reducing these injuries. A valid strength test, therefore, can reduce injuries in jobs for which high levels of strength are required. The MED-TOX approach has two goals:

    • Provide employers with a valid and legally defensible job analysis of the critical and essential, frequently performed, and physically demanding tasks associated with the occupation.

    • Provide employers with a physical ability test that is job-related, valid, and reliable that can confidently be used in the selection of individuals for physically demanding jobs.

The MED-TOX approach is chosen on the basis of safety, reliability, and validity. Validity and reliability is discussed below. Ability tests are safer than work sample tests because it is preferred to determine how much weight an applicant can lift rather than asking the applicant to lift a heavy weight. If the applicant does not have the necessary strength to lift the weight he can become injured during the test. Using an ability test allows the employer to determine if the applicant can only lift 20 lbs or 200 lbs. in a safe, efficient and standardized manner.

The MED-TOX approach presented here is a criterion-related validation methodology. The approach is designed to ensure that the ability test (selection device) is empirically demonstrated to be related to the job. The empirical linkage between a given job and a given test is not insignificant. Often testing (physical or otherwise) is based on "face validity." Face validity is not recognized by the courts and has been described by experts as a claim made in the absence of meaningful data.

Because physical ability tests can be subjected to a high standard of legal and administrative review, empirical evidence is usually necessary to show job-relatedness. A high standard of evidence is also necessary since all tests of strength show adverse impact against females. Since a showing of adverse impact requires the employer demonstrate job-relatedness, documentation confirming such a relationship becomes crucial.

Job Analysis

Once and employer's job has been selected for study, it is necessary to conduct structured group interviews with workers from that job. The job analysis inquiry is directed at collecting tasks which require static strength. Static strength involves the continuous exertion of maximum muscle force for a brief period time. Tasks that involve the lifting, pulling, pushing, or carrying of objects and materials require static strength.

Following the structured group interview, workers and the MED-TOX representative go to warehouses, storerooms, and other work areas to directly examine tools, equipment and materials that had been described by workers during the meeting. An industrial scale and/or a force gauge is used to directly weigh as many of the relevant objects as possible. If additional materials or tools are found that are also lifted, these objects are weighed, the weights recorded, and the lifting tasks added to the task listing. Multiple job analysis meetings may be necessary if there are several different geographical locations or significant differences facilities or number of employees in each location. Once these meetings have been conducted, a task inventory is produced. A task inventory is a listing of all the task collected. Task inventories are generally subject to several review phases prior to worker surveys.

In order to measure a job, one needs a measuring tool. Rating scales are the most useful measuring tools when performing job analysis activities with task inventories. MED-TOX rating scales can have a number of customized features depending on the job and specific organizational needs. To validate a strength test, however, at a minimum it is important to illicit from workers:

    • Whether or not the task is performed;

    • How far the object is carried;

    • How often the task is performed;

    • How important the task is to the job; and

    • Whether persons who efficiently perform the task are more capable than workers who have a difficult time performing the task.

A random sample of workers completes the task inventory. The responses are entered into a statistical software program for data analysis. The first step in data analysis is the computation of the percentage of workers that perform each individual task. Tasks performed by less than 50% of the workers are eliminated from further consideration. Means are next computed for each dimension (How Far, Frequency, Importance and Proficiency). Next, a criticality index is calculated. This index is the product of the mean Importance rating and the mean Frequency rating. This product is then multiplied by the percentage of workers who indicate they performed the task and divided by 100. Thus, greater weight is given to tasks performed by all workers and less weight is assigned to tasks performed by fewer workers. Tasks performed by all workers are more likely to be critical and essential core job tasks than those performed by only some workers. Tasks with high criticality ratings are identified as essential job functions.


Work Sample Development

Having determined which static strength tasks are critical for the job, it is next necessary to determine which tasks are suitable for utilization as work samples. Ideally, the tasks selected should be among the most demanding tasks workers are expected to perform. Additionally, other criteria should be considered including:

    • Safety to incumbents.
      Tasks selected should be safe to perform in a testing situation. Some tasks might not be dangerous to experienced workers, but could be to a novice.

    • Reasonable time to administer. The tasks selected for work sample development should be those which can be completed in a reasonable amount of time.

    • Unambiguous scoring and clarity of results. Tasks selected should be amiable to an unambiguous scoring or rating system. There should be no disagreement as to what constituted various levels of performance. Subjective ratings on "style of lifting" or "ease of lifting" are less suitable when objective measures are possible.

    • Regional availability and low cost of equipment. The materials necessary for task performance should be readily available and inexpensive.

    • Simplicity.
      The tasks selected should be as simple as possible from both the point of view of instruction to incumbents and administration of the work sample.

    • Commonality. The tasks selected should be commonly performed by as many workers as possible.

Critical tasks that meet the criteria can be categorized in a variety of ways. For example, all tasks involving the use of a wheelbarrow might form a group or task category. Alternatively, all tasks that involve work at particular work site, or all tasks performed while unloading a box car or repairing heavy equipment could form other groups. The nature of the job and tasks performed typically lend themselves to the selection of appropriate task categories. Task categories are important because they help the analyst organize the work and ensure that a variety of lifting tasks can be used to construct work samples. An example of a category might be:

Five Gallon Container (Paint, Joint Compound, Floor Sealer) Tasks
    • Lift/carry a five gallon can of floor sealer (approx. weight 46.3 lbs.).

    • Carry a five gallon bucket of paint (approx. weight 55.4 lbs.).

    • Lift/handle a five gallon bucket of joint compound (approx. weight 51 lbs.).

    • Lift a five gallon bucket of paint into the back of a vehicle (approx. weight 55.4 lbs.).

    • Lift a five gallon bucket of paint up onto a stack of other five gallon paint buckets (approx. weight 55.4 lbs.).

Work samples may then be developed from these categories of common critical tasks. For example, a work sample constructed from these tasks might be constructed as:

Five Gallon Bucket Stack

Approach a row of four five-gallon buckets of material. Stack three of the buckets on top of one of the buckets of paint. Take the top bucket of paint off the stack and carry it to the truck bed. Set it down and release grip. Regrip paint can and return to the stack of three. Place the can beside the stack and replace the two remaining cans on the ground in a row, as they were initially.

Selecting Appropriate Static Strength Tests

MED-TOX has used the Jackson Strength Evaluation System (JSES) in several projects and has found it to be a valid and reliable predictor of the ability of individuals to perform lifting, push, pulling, and carrying task.

The JSES was developed by Dr. Andrew S. Jackson of the University of Houston. It features an electronic load cell to ensure accurate and reproducible readings of isometric strength. Large readouts allow determination of both peak and average strength in pounds. The system includes the control and load cell, a hand dynamometer fixture for the measurement of grip strength, and a heavy duty lifting platform, bar and chain. The manufacturer reports that the JSES is widely used to measure static strength using the National Institute of Occupational Safety and Health (NIOSH) protocol.

The JSES has three qualities that make it ideal for employment testing. It has been shown to be safe, reliable (r = .90), and practical. Results should be obtainable within 15 minutes. The JSES is widely recognized as a reliable and valid indicator of the amount of static strength possessed by individuals. At the present time many industrial medical clinics and employers are obtaining the JSES. The test is relatively inexpensive (it can be obtained for less than $4,500), it is practical, safe and portable. Normative data for the JSES can be viewed by clicking here.

In 1995, the EEOC issued guidelines which attempted to clarify the difference between a medical test and a physical ability test. According to the EEOC a medical test was more likely to measure an individual's "physiological response" to performing a task whereas a physical ability test measured task performance directly. MED-TOX questioned the EEOC as to whether the JSES would be considered a medical or physical test since it measured an individual's strength (a physiologic response) and was not a direct measure of a particular task but could predict performance on a variety of lifting tasks. In a December 7, 1995 letter to MED-TOX, the EEOC stated:

The answer to your inquiry concerning the Jackson Strength Evaluation System (JSES) depends on the context in which this test is given. For example, if the JSES is used simply to determine whether a person is capable of lifting a thirty pound box and carrying it twenty feet, the test would not be considered "medical" and could be administered pre-offer. In this context, it would not be dispositive that the test is used or interpreted by a health professional. Similarly, the score one achieves on the test -- provided the only thing being measured is the amount of weight the person can lift -- does not render the test a medical one even though, as you put it, "strength" is being measured. As the Enforcement Guidance illustrates at pages 14 and 15, an assessment of whether a person can lift a fifty pound box is a physical ability test as distinguished from a medical exam. If you were to measure the person's heart rate after the act of lifting, you would then be engaging in a medical examination. While the distinction may appear subtle, it is legally significant and constitutes the difference between what can and cannot be done under the ADA before a conditional offer of employment [EEOC, personnel communication to MED-TOX, December 7, 1995].

In other words, the EEOC would permit the use of the JSES (in the pre-employment context) to the extent that test performance was related to the ability of an individual to perform a specific task or group of tasks. In order to establish a relationship between the JSES and work performance, it is necessary to conduct field testing with workers (participants).

Field Testing and Data Analysis

A stratified random sample of experienced workers is typically chosen for testing. The sample should consist of individuals from various ages, racial groups and both genders. Of course, many organizations will not have a significant number of females for testing nor will they have individuals employed who cannot perform the job. Without representatives from these groups, it is more difficult to set a defensible cut-off score. Therefore, we suggest that administrative and clerical workers participate in field testing as well.

Field testing consists of a brief medical screening, informed consent, an explanation of the testing, and height and weight measurements. Next the participants are administered the JSES. Participants exert a constant force for three seconds on the four tests which used the lifting bar and for three seconds using the Jamar hand dynamometer in accordance with the manufacturer's instructions.

The electronic monitor connected to each load cell records the amount of force exerted in pounds of force. Peak and average force is recorded.

For the Grip Strength test, participants squeeze on the hand dynamometer first with the dominant hand and then with the nondominant hand. For this test, peak grip strength is recorded.

During the Arm Lift, participants stand erect with palms up, their elbows at the side, and forearms at a 90 degree angle to pull up on the lifting bar.


The Shoulder Lift also requires the participants to stand erect but with their palms down. The participants then pull up on the bar as if lifting a jackhammer.


The Torso Pull requires the participants to sit on the ground with their legs extended and their feet flush against the lifting platform which is placed against a wall. Participants pull back with their arms and legs extended.

The Leg Lift test requires the participant to squat with the arms extended downward. The lifting motion is entirely in the legs as they are straightened.

Three trials are conducted for each participant on each of the five tests, with the average of the last two trials used as the score. Scores are recorded for each trial.

Next, the simulations are performed by the participants. The simulations consist of actual work samples of the job. Several events such as the Five Gallon Bucket Stack described above will have been constructed. Participants are given ample time to rest between events and to decline testing at any time. Two timers use stop watches to record the time it takes for each participant to complete each work sample. Times are averaged for both stop watches and recorded as the score.

Participants are instructed not to run or to perform the work at an unnatural pace. Participants are asked to envision a day in which they had a lot of different tasks to perform. When one task was completed, other important tasks are to follow. Participants are instructed to work at what might be considered a heavier than average pace, but not one that was unrealistic or unrepresentative of the pace at which they might work on a busy day.

Following testing, participants estimate their personal fitness level, the minimum level of performance that they would consider acceptable for each work sample, how realistic each work sample is, and additional questions that are utilized to assist in setting the cut-off score.

Statistical Analysis

Reliability of the JSES is assessed by comparing the scores of the two recorded trials on each test. Reliability typically varies from a low of .94 to a high of .97.

Correlation coefficients are computed for all tests to determine their interrelationships and lack thereof. Multiple regression analysis is used to derive equations to predict the performance of individuals on the work sample test who have only taken the JSES.

Validity is assessed by statistical analysis as to how well each regression equation is predictive of work sample performance. A perfectly predictive equation would have an R-squared of 1.0 and a R-squared of 0.0 would indicate that the equation had no ability to predict job performance at all.

Passing Levels (Cut-off scores)

Setting cut-off scores is a particularly complex area of test construction. MED-TOX utilizes multiple forms of evidence to arrive a cutoff level that is consistent with business necessity. The cut-off scores permit the selection of qualified workers, are based on the results of the task analysis, and on the performance of currently employed workers and their judgments as to what constitutes acceptable performance. As each test validation situation is unique, no perfect formula can be offered in advance here.


MED-TOX offers services in the criterion-related validation of physical ability tests. The tests are based on a comprehensive job analysis and field testing of workers performing work samples and their scores on the JSES. The tests will permit the inclusion of individuals most likely to be able to perform the tasks without undue risk of injury to themselves and to screen-out persons who do not possess sufficient physical ability to adequately perform the job.