This rather lengthy description of two related psychological tests is a sub-page part in the PSSPQ and in the
PHI conteent focused subsections of  Dr. LeRoy A. Stone's Web Site.  The Internet address for Dr. Stone's
Web Sites Home/Index page is:  http://www.home.earthlink.net/~lastone2/home.html; the address for the
PSSPQ Test's Home/Index page is:  http://www.home.earthlink.net/~lastone2/psspq.html
Integrity Tests:  An Argument for the PSSPQ and PHI Tests of Trustworthiness

LeRoy A. Stone, Ph.D., (Forensic Dip.) ABPP
Harpers Ferry, West Virginia

Note – Since the PSSPQ test has recently been the subject of a good deal of interest, to the degree that it can now be considered to be a commercially viable product, it now seems to be timely to introduce the PHI test to those who "click on to" to Dr. Stone’s Web Site (http://www.home.earthlink.net/~lastone2/home.html) and who have read or otherwise become familiar with his Web Site sub-pages that have focused upon the PSSPQ test (i.e., http://www.home.earthlink.net/~lastone2/psspq.html).  The major purpose of the following presentation is to introduce the PHI test and also provide some detailed description of the PSSPQ as it represents that from which the PHI was
derived.  However, it is important to note that the PHI test was designed for an entirely different purpose than was the PSSPQ.  Also, it is hoped that PHI test will be seen as being unique among integrity or trustworthiness type tests as it has been normed using a very unique group of persons, for whom it can be easily argued, who possess a high degree of trustworthiness. 

Introduction

     The tale of how Diogenes (412?-323 BC) devoted almost an entire life, traveling almost constantly, and making use of some kind of lamp while searching for the so-called honest man. This story apparently has been told for perhaps a couple thousand years.  If my memory serves me right, I cannot recall that Diogenes was ever successful in his task that captured almost his entire life.  Is this tale some type of metaphor for what the situation is in real life?  Is there such a thing as an honest man and if there is, is there any way to identify him/(her)?  Throughout history, around the world, there are described different means for identifying lying and other impediments that can get in the way of attempting to identify whether an individual is honest or not.  Horrible procedures have been used to determine whether persons are guilty or not, whether they are good or evil (such as being a witch or warlock), whether they are truthful or not, as well as other dichotomies regarding the integrity, honesty, goodness, trustworthiness, etc. of mankind.

     Presently, we (in our Western society) also employ a wide variety of techniques to help us identify a person who we can trust when we are interested in evaluating people for employment, for getting bonded, for obtaining security clearances, etc.  What quickly comes to mind is polygraphing, psychological testing/interviewing, complicated vetting techniques, various levels of background investigations, recommendations and evaluative impressions of other people, etc.  Interestingly enough, all of these techniques/procedures are utilized by the U.S. Government when evaluating its citizens for high-level security clearances.  When evaluating them for lower level clearances, generally only one or two of the listed procedures are normally employed.

What Type of Psychologists?

     With focus upon evaluation of persons because they are being processed for high-level security clearances, any involvement of psychologists to conduct evaluations of those being processed, usually (in the past) has involved and has been generally limited to clinical or counseling type psychologists.  These particular type psychologists are typically trained to work (i.e., make diagnoses
and provide treatment) with persons having mental health type problems.  When employed by the U.S. Government (as Federal employees or on contract) to psychologically evaluate persons who are being processed for employment and/or for possible security clearance status, it has been, and is, not at all surprising to find that these psychologists have used psychological tests that they are most familiar with, i.e., those instruments that have been designed to evaluate mental health status.  In almost all situations, none of these type tests have ever been explicitly designed to discover who is honest, can be trusted, has high integrity, is a good person, etc.  Yes, some of the tests that clinical/counseling psychologists typically use do sometimes contain sub-scales that have been developed to hopefully spot a person who is attempting to "fake-good" or "fake-bad" his/her responses to the test items.  The latter mentioned deception direction is one that really is not considered when evaluating persons who are seeking employment or for the obtainment of security clearances.

     One problem with most of these "fake-good" (or also sometimes known as "lie scales") is that they do not usually work very well.  Persons of average or lower intellect are the ones who are generally ‘caught’ but high-intellect individuals normally can see-through the purpose of many of the items in these so-called lie scales.   Highly intelligent people are not too often caught by the sometimes called deception or lie scales.

     Psychologists in the specialty area of industrial and organizational psychology (who are generally regarded to be trained quite differently than clinical/counseling psychologists), in the past decade or so, have assisted in the development of what is generally known as integrity tests.  In the past decade, these integrity tests have been quite successfully marketed, especially to human resources and security offices in the private business/industry labor market.  However, most users of these tests (normally labeled as integrity tests), in the past decade of using them have, in the past several years, become disenchanted and disappointed with the results obtained with using these tests.  With some of those who have been administered these integrity tests, the results obtained are generally believed to be somewhat correct and valid.  However, dependent upon the characteristics of the populations from who those being tested came, many potentially "bad" or unsatisfactory type employees simply are not spotted or identified, based upon their test performances.  In other words, many people (e.g., job applicants or others being evaluated for some type of position of trust) simply were found to be able to ‘beat’ these integrity tests.  Another problem, and not overly small, was that use of the integrity tests sometimes could be challenged using the Americans for
Disability Act of 1994.  This law prohibits asking a job applicant about certain personal background information prior to making a job offer.  As a consequence, use of integrity tests today (i.e., in 2003) has declined significantly and it appears that this loss of interest and use will continue in the coming years. 

     Does this mean that use of psychologists (of any kind or type) in the evaluation of persons to diagnose honesty, trustworthiness, possession of high integrity, etc. is something that should be done away with or minimized?  No, not at all – if the matter of mental health is something that can affect whether a person can be trusted or not, then the use of clinical/counseling psychologists, with their mental health/illness detection tests, can prove to be of great value in certain respects.  However, they should be limited to evaluation of mental health and not of trustworthiness, in its totality. It should be noted that mental health may be one possible component of trustworthiness, but this concept is more complex than a mental health evaluation might reveal.  Users of the so-called integrity tests, unless they are tasked to evaluate integrity of persons, limited by possession of quite low intellect, their integrity tests leave a great deal to be desired.  With more intelligent persons responding to them, these tests simply have not sufficient validity for use in the real world.  Psychological
research, published in bona fide psychology journals, and not in law enforcement, police  or security oriented magazine journals, has not been very supportive of use of integrity tests.

Validity Evaluations

     One of the most major problems with the integrity tests is that they have really had little validity evidence supporting them except for face or content validity.  If an individual answers an item that asks about committing past thefts from his employer and he indicates that he had stolen something of value from a past employer, we then decide that such is true without actually knowing whether it truly happened or not.  A number of different causes can be argued that might explain why someone would admit to thievery when in fact no such behavior had actually taken place.  Lack of required reading skills, eyesight problems, a simple misunderstanding of how the item was written, English being the individual’s second language, have some mental health difficulties, etc.  All of these kinds of matters very frequently are are encountered that might cause someone to respond to a question or statement in an invalid fashion.  This type
of providing invalid response that suggests a guilty background is frequently encountered in the forensic psychology field.  For all kinds of reasons people admit some people admit to past guilty experience when in fact such is not true.  More and more people are being found sentenced to long prison terms when later it is found that their false confessions were the major evidence for their being found guilty.

     However, the greatest source of difficulty with integrity tests is that, so far, there has been a great deal of difficulty in using empirical or predictive validity type validity demonstrations with these type tests.  One of the major difficulties has been the definition of what group of people can be utilized to meet the definition of an honest man, or one who unquestionably represents an adequate standard of good integrity.  Police persons come to mind, but they very quickly can be ruled out simply due to the wide understanding, in just about all cultures, of all of the immorality, dishonesty, and non-law-abiding that is encountered, almost daily in the popular press, that pertain to this particular occupational group.  Unfortunately, the understanding of the so-called "bad cop" is a widely held concept.  The same can be said for those in the clergy; especially recently in the sexual offense domain that seems to have been an ever-occurring story line that is seen, heard and read about in our popular media.  Just about a
occupational or professional group of people that can be named or ‘nominated’ simply have to be rejected for use as being validation standards for the "honest [or trustworthy] man" concept.

    One of the major problems in the creation of an honesty or integrity test is the establishment of definitional criteria for what constitutes an honest individual.  As suggested in the previous paragraph, one could merely assume that certain occupational and social groups were mainly made up of honest and trustworthy people.  However, this is a rather dangerous and easily challenged position to take. Who (as a defined group of people) can be perhaps generally believed to represent an honest and trustworthy groups who then could be employed as a criterion defining sample for such a construct?

A New and More Narrow Definition for Trustworthiness

     Dr. LeRoy A. Stone may have found the solution to the problem described in the preceding paragraph.  In his employment as a senior clinical psychologist for a decade and a half, in this Country’s largest intelligence agency; in his last eight years of employment in this agency he was promoted to the position of
Chief Research Psychologist.   In his almost 24 years with the agency he developed the opinion that one group of Government employees, of which he had access, might be regarded as being, in the real world, be acceptably regarded as being trustworthy. 

     In our country there is a group of individuals who are quite small in number and who have been the focus of the most stringent initial and then ongoing investigation and monitoring program imaginable, which is concerned with the human characteristics of honesty and trustworthiness.  Most Americans are very familiar with the terms "secret" and "top secret," and that they are descriptive of levels for security clearances, but very few have ever encountered or heard of "Sensitive Compartmented Information" (or SCI) in association with the Top Secret concept.  Top Secret – Sensitive Compartmented Information (or TS-SCI) is just about the highest level, security clearance, access level granted by the USA Government.  It is difficult to obtain the current number of USA citizens who hold TS-SCI access but back in the late-1980s, the Washington Post (i.e., June 8, 1986 issue, A Section, page 18) wrote that, as of March 1985 there were only 98,715 persons within the 
Department of Defense (both civilian and military) and another 9,576 in industry who held the TS-SCI access clearances.  It should be remembered that the mid-1980s were a time when the Cold War was at a very high level.  It can be believed that now, even though we are involved in the so-called "War on Terror," the number of those who hold TS-SCI status are probably less than back in the mid-1980s.  Although there may be a very few security clearance levels that can be considered to be even higher that the TS-SCI level, only a very few individuals are believed to have such lofty elevated clearances.  It is safe to say that the TS-SCI access level is the highest, major security clearance granted by the USA Government.

     The process followed in the initial obtaining of TS-SCI access for any given individual is a rather long and expensive process.  Actually on a small fraction of those who are initially considered for employment that requires successful obtaining/granting of this particular clearance are ever successful in obtaining the employment.  It is not at all unusual for the entire investigation process to take two years or so.  There are some cases in which almost three years were required before a final decision to grant or not grant TS-SCI access to the involved individual(s) could be made by the Government.  The process involves extensive psychological testing including psychological interview, polygraph
examinations (sometimes up to several separate repeated polygraph examinations take place), elaborate background investigations in which friends, associates, neighbors, one’s past teachers, physicians, employers, etc. are personally interviewed in face-to-face encounters..  In addition to this, complete record checks with local police as well as with national agencies (i.e., FBI, and other Federal police type agencies), military records are carried out.  There is widely held belief which suggests that this particular level or kind of investigation, which is directed towards matters mainly concerned with honesty and trustworthiness, is about as extensive and complete as can be imagined.  As might be expected, the financial costs associated with an initial TS-SCI investigation process can be staggering.  A number of years ago, the author of this presentation heard a rumor (which was and still is very believable) that about a $10,000 cost was associated with every completed TS-SCI clearance adjudication.  Such a cost estimation was based upon far more valuable dollars that is the case today.  Today, sometimes a comment that many completed TS-SCI clearance adjudications each cost the Government around $25,000 is frequently encountered.

     Individuals who have been granted TS-SCI access are reinvestigated every five years.  Reinvestigations involve polygraph examinations, security interviews, and additional and complete up-dated background investigations that require investigating agents having face-to-face interviews with neighbors, work associates, and other possible significant persons in the employee’s life.  If an individual having TS-SCI access has lived in various locations during the previous five-year period, interviews with pertinent persons are carried out with respect to all locations.  This of course is also true with respect to the initial (i.e., pre-employment) interviews which were carried out prior to obtaining TS-SCI access.  If an individual has lived in a number of states (or even in foreign countries), persons associated with each of those locations are interviewed regarding the subject individual.  As part of the reinvestigation process and updated agency records check is also again carried out; much involves matters of financial credit status and law enforcement records.

     If an individual obtains TS-SCI access and successfully maintains this clearance status, it is safe to conclude that the subject individual is, for all practical purposes, an honest and trustworthy person.  There really is no other clearly defined group of USA citizens in our society who have undergone such extensive and expensive scrutiny (which is constantly ongoing) regarding 
matter of honesty and trustworthiness as those who have been granted TS-SCI access status.

    For the very first time when a psychological trustworthiness focused test  has been constructed, a criterion sample representing the possession of high levels of honesty and trustworthiness has been employed.  The possession of high levels of honesty and trustworthiness has been operationally defined based on the fact the employed sample(s) possessed TS-SCI access status (as granted by the Government of the USA).  In other words, the Diogenesian tool used to find "an honest man" was whether he or she possessed TS-SCI access status.  Of course, we are aware that even the possession of TS-SCI access status does not totally or absolutely guarantee complete honesty or trustworthiness.  In fact, during the past couple of decades, several dozen USA citizens have been caught as traitorous spies, a couple of them did (or had in the past) possess TS-SCI access  Still, such TS-SCI access status, probably represents the very best that an open society can do to designate the possession of a high level of honesty and trustworthiness.  No other honesty/trustworthiness measuring psychological test, currently on the market, has had honesty and
trustworthiness operationally defined in such a fashion, i.e., by the possession of TS-SCI access status. 

Establishment of Trustworthiness Norms for the PHI

     The males (N =114) and female (N = 92) samples were obtained over a period of a couple years.  All were civilian employees who held TS-SCI access and all volunteered to respond to the PHI in an anonymous fashion.  Their anonymity status was promised for a couple of reasons.  One reason was that such facilitated volunteering for this project.  The major reason though was that such anonymity status promoted more honest and candid response to the PHI items.  Upon initial contact, these employees were requested to complete the answer sheet, on their own time and not while at work, and to then return the answer sheet and the list of 50 items. 

     The mean age for the male sample was 37. 07 (SD = 9.35); for the female sample the mean age was a little younger, 29.65 (SD = 6.78).  The age range for male was from 20 to 55 years; for the females the range was more restricted, from 21 to 43 years.  For male their mean number of years of formal education was 15.91 years (SD = 2.72); for females their mean number of years of formal
education was almost the same, 15.8 years (SD = 2.17).  The range of years of education for males went from 12 to 25 years; for females the range was from 12 to 21 years.  The mean number of years for which TS-SCI access status was held, for males was 11.88 years (SD = 9.49); the mean number for females was 6.55 years (SD = 5.45).  The range of years which TS-SCI access had been held, from males, was from one to about 33 years; the range of years which TS-SCI access was held, for females, was one to about 20 years.  In general, it can be inferred that the male sample was several years older than was the female sample.  Their educational background lengths were quite similar.  Males seemed to have been employed in TS-SCI access status for longer periods of time that was found with females.  Of course, this is might be expected when one was aware that the male sample was a bit older than was the female sample.

     As noted in the above paragraph, it is understood that the PHI was responded to by each subject in non-group situations; such is believed that this ‘private’ responding to PHI items also somewhat guaranteed anonymity.  In general, almost none of the subject employees asked any clarifying questions pertaining to any of the PHI items.  The great majority of the subjects, when asked, indicated that it only took no more than about 10 minutes for them to
respond to the 50 items.  In just about all cases, as per the instructions they received, no items were  left non-responded to.

     The PHI item responses, from the 206 subjects (males = 114, females = 92) were scored so that scales scores for the seven PHI sub-scales were determined.  With each of these PHI sub-scales a t-test was computed so as to examine differences between male and female mean values.  The computed t values, along with the involved degrees of freedom, and the associated probability values are shown in Table 1. [Note – the male and female mean values, along with the standard deviations, for each of these seven PHI sub-scales comparisons are purposely omitted here as they are regarded as being proprietary information having commercial value.]

 _______________________________________________________________________
 


Table 1

Degrees of Freedom and Associated t Values for Differences
Between Scales Means for Gender Groups


 


     Scales                                                 t           df                           p
a.  Undesirable Character                  2.30       204              <  .05,    .02
b.  Financial Irregularities                  1.94       204                            .05
c.  Alcohol Abuse                                0.95       204                            .10
d.  Drugs Abuse                                  0.22       204                            .10
e.  Law Violations                               3.93       204              <  .001
f.   Security/Confid. Violations           1.47       204                            .10
g.  LIE                                                 0.07       204                            .10
Total (sum of a .  .  .  f.)                      2.54       204             <  .02,    .01
__________________________________________________________________
 

     These gender difference results suggest that perhaps it would be unwise to combine the male and female PHI data for at least three of the PHI scorings; namely, that for Scales a. and e., as well as for the Total (summation of Scales a.– f.) Score.  However, there is nothing here that would suggest any problem with combining the male and female normative data for the following listed PHI sub-scales:  b., c., d., f., and g.  As a result, an individual’s eight PHI scorings (for Scale a. – g. plus the Total Score) are to be transformable and reportable as eight different T-scores (having a mean of 50 and a SD of 10).  Males and females will have their own T-scores for Scales a. and e., as well as for the Total Score computed using gender-based norms.  Actually, the Total Score for the PHI will prove to be of very little value as it’s equivalent Total Score for the PSSPQ has shown to possess little diagnostic or predictive value.  T-scores for the remaining other five PHI scales are computed using normative information based on combined male/female normative data.  A raw score conversion table (i.e., from PHI sub-scale scores to T-scores has been developed and because it is considered as being proprietary information it is not presented here.

     As noted in an earlier paragraph, highly favorable reliabilities have been repeatedly found and reported for the PHI’s antecedent psychological test, from which it was developed, the PSSPQ.  Test-retest reliability for the PSSPQ has been estimated at about 0.94 and there has been very good reason to believe that this may actually have been a low estimate due to the fact that the sample employed were most likely not overly interested in the task at hand and this may have been a factor that only could have lowered any estimation of
reliability based upon ‘stability’ of measurement over time.  Kuder-Richardson Formula 20 estimations (which is an internal consistency form of reliability) have also been reported for most of the sub-scales comprising the PSSPQ and generally seem to be in the 0.90s.  Split-half (corrected for length using the Brown-Spearman model) reliability attempts for most of the PSSPQ sub-scales also seemed to produce reliability estimations that were all statistically significant; many of which were in the 0.80s and 0.90s.

     Similar reliability estimation results have now been obtained for the PHI itself.  Using PHI data obtained from the already described normative sample of 206 civilian, government employees, all holding TS-SCI access, several attempts to estimate reliability for this instrument were completed.  Coefficient alpha, which is one of the Kuder-Richardson models for estimating an internal consistency form of reliability, was computed for each of the sub-scales of the PHI.  Coefficients alpha for these sub-scales are as follows:  a.  Undesirable Character Traits, 0.65; b. Financial Irresponsibility, 0.38; c. Alcohol Abuse, 0.82; d. Illegal Drugs & Drug Abuse, 0.41; e. Record of Law Violations, 0.44; f. Security/Confidentiality Violations, 0.56; and g. LIE, 0.78.  Product-moment correlation coefficients between the sub-scales scores and the Total Score of the PHI were also calculated as another form of internal consistency; these correlation coefficients are as follows:  a.  0.81, b.  0.62, c.  0.91, d.  0.41, e.  0.67, f.  0.75, and g.  0.88.  Those scales that had the greater number of items did correlate the highest with the Total Score measure, as should be expected. 

     For each PHI sub-scale, all the items were cast into two equal-item number of groups in all possible item combinations.  With each combination, a product-moment correlation coefficient was computed; using Fisher r to z transformations, an average correlation was computed and this average correlation was regarded as a type of split-half reliability estimation for the involved sub-scale.  These split-half estimates were corrected for length using Brown-Spearman logic.  These corrected split-half reliabilities for the PHI sub-scales are as follows:  a.  0.53, b.  0.51, c.  0.80, d.  0.50, e.  0.44, f.  0.70, and g.  0.78.  Of these split-half reliability estimations, three were statistically significant at the .001 level, one at the .02 level, two at the .05 level, and one at the .10 level (this lowest level of statistical significance was in association with Scale e, which only has three items, hardly lengthy enough for this kind of reliability estimation approach).  Another form of internal consistency was calculated; this was simply the average item/sub-scale score correlation calculated for each PHI sub-scale.  The item/sub-scale score correlation coefficients were averaged using the Fisher r to z transformation model.  These average item/sub-scale score correlation coefficients, for each PHI sub-scale, are as follows:  a.  0.51, b.  0.53, c.  0.77, d.  0.51, e.  0.48, f.  0.80, and g.  0.55.

     Some of the above reliability estimations for the PHI sub-scales appear
reasonably high and sufficient, however some are only in the 0.40s and 0.50s, and not in the usually hoped for ranges that are usually deemed as sufficient for a personality measuring instrument.  It should be noted, however that just about all the observed estimations of reliability would be regarded as being statistically significant.  The major problem here was the truncation or restriction of range due to the very high homogeneity of the sample employed.  Inspection of the standard deviations for each sub-scale, computed from the same sample which was used in the reliability estimations, reveals that these particular values are almost all smaller that those seen with comparable scales (almost exactly the same items) of the PSSPQ in its normative sample.  The sample employed to define statistically, with the PHI, normative values for the possession of high levels of honesty and trustworthiness were ideal for this purpose, but sorely lacking in sufficient heterogeneity for purposes of estimating reliability.  More heterogeneous samples will have to be obtained so as to better estimate reliabilities for the PHI.

     Another one of the major problems may have to do with the possibility that some of the PHI sub-scales, even though they may only contain several items, may be measuring markedly multidimensional qualities.  This has been shown to be the case with the PSSPQ.  Many of the PSSPQ sub-scales have been shown, through use of factor analysis, to measure multidimensional characteristics.  An additional psychometric matter which may be causing some of the reliability estimations to be lower than one would like to see is that most of the sub-scales are in fact extremely brief and contain on a few items.   Scale f., for example, contains only three items.  The longest sub-scale of the PHI is
the LIE Scale and it only contains 13 items.  Undoubtedly, higher scale reliabilities could be obtained if the scales were lengthened.  However, this would defeat some of the designed purpose for creating the PHI in the first place.  What was wanted was a short, quickly and easily administered instrument designed to measure what the PHI was designed to be  The earlier raised issue (in this paragraph) regarding multidimensional structure of each of the PHI sub-scales has been systematically studied using some of the same procedures (i.e., factor analysis) as was employed with this same matter with the PSSPQ.  Similar results, as seen with the PSSPQ, do tentatively appear when the analyses are carried out with PHI data. 
 

An Attempt to Obtain/Use Another Even More
Appropriate Normative Group

     This perhaps even more trustworthy designated group, from the same very ‘sensitive’ federal agency was composed of both male (N =43) and female (N = 20) employees who had volunteered and applied for a new employment assignment into a very special program.  In order to be considered for entry into the program the already employed persons had to be again polygraphed and have his/her background again systematically studied.  In addition to this, the employee, as well as just about his/her entire family (the only exception being young children) who would accompany the employee on a ‘permanent change of station’ assignment (which normally was for two/three years).  After every assignment was completed, then the whole entire evaluation/vetting
process was completely done again.  If any possible problem or any personal or family change that might affect the employee’s trustworthiness arose, for the employee (or for any of his accompanying family, which in a few cases even included grandparents), again the employee would be polygraphed, as well as having to again undergo contact with an evaluating psychologist or any other agency human resources or security professional deemed appropriate to the particular situation.  Much of this reevaluation process also again included the employee’s accompanying family.  In a few rare cases, it was recalled that even some possibly accompanying older children (who for various reason raised some suspicions that they presented some likelihood for causing trouble) were also polygraphed. 

     The relatively small cadre of agency employees in this rather special personal standards/requirements program, most certainly were not perfect people, they were not 100% honest nor did they show 100% perfect integrity.  However, it could be argued that they were about as trustworthy, as one could be so defined, based upon a broad, continuously ongoing, expensive, and repeated evaluation system that  classified them as being trustworthy in a very general sense.   When the agency was ‘recruiting’ candidates (usually from it’s already employed ranks) for this special program, it explicitly indicated that only those with very exceptionally ‘clean’ and ‘straight’ personal backgrounds were encouraged to apply.  Therefore, it could be assumed that the agency only wanted to recruit the ‘best of the best’ (i.e., those with the highest levels of trustworthiness) for entry into this special assignment program.

     The PHI was administered to 63 agency employees who had been through the above described, elaborate, trustworthiness-evaluation process and who already had been actively working in the program, many of them for a number of years.  As a consequence of many of them having been officially assigned in the program for as number of years, many had completed multiple changes of
duty station assignments and consequently had been through the complete re-evaluation process multiple times.  These employees, on their own time, completed their PHI item responses and returned the testing materials to Dr. Stone.  No names nor any personal identifying information was requested in association in addition to their responses to the PHI items.  As indicated earlier in this presentation, it is believed that this kind of anonymity status helped promote increased validity in their responses.

     In this sample of 63 special program employees, most (i.e., 68%) were males and only 32% were females.   Because, of the reduced number of females (about two to one ratio, which was reflected numerically in the relatively small special project work force), some attempt was made to statistically test for differences between the male and female gender PHI sub-scale means.  It was not anticipated that the very same arrangement of statistically significant and non-significant gender differences was found that was originally seen with the first normative group (i.e., N = 206).  However, upon further reflection, this seemed not overly surprising.  When the PHI sub-scale means, based upon this smaller group (i.e., N = 63), were compared to the sub-scale means, computed from the larger group (i.e., N = 206), it initially surprised the developers of the PHI.  All of these sub-scale means  (for both male and female groups when it had been previously determined that the differences between male and female means were statistically significant; a single mean based upon combined data when it had been shown that the observed differences between male and females means were not statistically significant) were then compared to the sub-scale means based upon the earlier larger standardization group.

     The findings, based upon the results obtained from what is described in the last sentence of the previous paragraph really were not at all expected.  In fact, what was expected were different intensity scorings obtained by the employees who had been assigned in a very special project group and who (along with most of their immediate family) had originally had gone through some sort of super type evaluation for trustworthiness and who again went through reevaluations that were quite similar to the original ‘entry-to-project’ evaluations, every time they were reassigned to a new location.  What actually was seen were that the sub-scale means, obtained by this special project group of employees were basically the same mean scorings that had been earlier obtained by the larger group who had been defined as government employees who had been granted TS-SCI access status.  However, the more we thought about this, our original expectation of differing scorings was not overly logical.  For one thing, possession of TS-SCI security clearance status really was not much less a level of trustworthiness than what was expected for those employees in the very sensitive special project.  Only two major differences really exist.  For one thing, the special project employees additionally had most of their immediate families go though psychological interview/evaluations, whereas ‘regular’ agency employees do not normally expose their immediate family members to most of the evaluation steps when they are originally processed for being granted TS-SCI access.  Another matter is that those employees in the special project normally were completely reevaluated about every three years (i.e., when reassigned to a now location) as compared to every five years for all other agency employees holding TS-SCI security clearances.

     The obtained PHI sub-scale data, from both of these two groups, suggest that even though this Government agency has its special project employee go through more frequent and sometimes more thorough trustworthiness evaluation than it requires for its more regular employee, who also has been granted TS-SCI access, both of these groups are just about equally trustworthy.  In fact, with respect to the matter of trustworthiness, the smaller special project group would appear to merely be, as far as what can be assumed from the PHI obtained data, a representative sample from the larger employee grouping.

     As a result of what was concluded immediately above, the PHI data from the special project group was combined with the earlier obtained PHI data.  This combined data (i.e., now based upon an N = 269) now represents the standardization norms for the PHI.   Now, only a single mean for each PHI sub-scale was computed and presented as normative data for the PHI.  As noted earlier, for a couple of the sub-scales, mean scores for both male and female genders are presented as normative data.  Again, due to the proprietary value of these sub-scale means, along with their associated standard deviations, they are not being presented here.  This decision is based upon a belief that this information possesses commercial propensities.

     With what has been described and stated in the previous pages of this presentation, it can be noted that actually for the very first time, when an honesty or integrity type test has been constructed, a criterion sample representing the presentation of what can be argued as being high levels of honesty and trustworthiness has been employed.  Although the term, "trustworthiness," has been previous used, with some previous commercially offered integrity tests, Dr. Stone would like to perhaps redefine the term when used with these type tests.  If the tests are actually empirically validated using groups of people as the criterion measure, then they should be referred to as being ‘trustworthiness’ tests.  If a test’s validity has been mainly defined in terms of simply ‘face’ or ‘content’ validity, then the test should be regarded as being an integrity test.  Only the PHI, and of course the PSSPQ from which it derived, by the above given definition then are tests of trustworthiness.  Many of the so-called integrity tests, that have been successful, in a commercial sense, were originally only justified by face validity.  Only after they had been in use for some time, did some of the correlational relationship research begin to be seen that related the integrity tests scores, obtained by persons prior to their being hired, with matters such as later becoming some type of problem employee, length of employment, alcohol problems emerging etc.  In this fashion, these integrity tests did show some association with whether employees were later regarded as being good or bad – such could be regarded as a type of construct validity.  Unfortunately, most of these associations that have been reported between integrity test scores and later found incidences of problem employee difficulties are not very high or useful, although they have been usually reported as being statistically significant.  Not infrequently, in applied psychology, is there found statistically significant relationships between variables of interest, however the relationships are so low that they end of being seen as just about useless in ‘real world’ considerations.  It is this type of understanding that can be inferred from the several reported mega analyses studies of the so-called validity of integrity tests; the greatest number mainly having been reported in the 1990s decade. 

     Getting back to the PHI, it was with this group (N = 63) of very special agency employees (i.e., those who had been extensively evaluated and who had been employment assigned in a special, ‘highly sensitive,’ program for a least a couple of years), who Dr. Stone choose to serve as one of the empirical standardization groups for the Probity/Honesty Inventory (PHI).   The PHI had been hoped to be regarded as being a test that measures (or predicts) degree of trustworthiness.  Scores from the very special employee group served as a norm or ‘anchor’ group that defined an exceptionally high degree of trustworthiness.  How did Dr. Stone know that this group was trustworthy?  The answer is very simple.  The U.S. Government defined the group as possessing exceptionally  high trustworthiness and this very-hard-to-accomplish regard was based upon past and ever-ongoing evaluative investigations that made (or make) use of just about all that can be done (in the ‘real world’) to accomplish such a goal.  In other words, according to U. S. Government national security standards, this particular employee group of people was such that it could be believed that they were all characterized by possessing an unusually high degree of trustworthiness.

     Therefore, any test or questionnaire type measuring instrument that might be administered to a sufficiently-sized sample from this particular occupational group would result in the establishment, for that test or questionnaire, something that could be regarded as representative of the type of response to that test or questionnaire that might be expected from highly trustworthy people.  This is consistent with what "norms" represent with a standardized psychological test.  If someone, not in that particular occupational group "took" the test or questionnaire in question, then that person’s testing results could be compared, for evaluative purposes, to the average (or other parameter) scores obtained by the group.  If his scores closely resembled the group’s mean score, it could be said that, with respect to the test scores, his looked like those that were obtained from testing a group of highly trustworthy people.  Conversely, if his score was marked deviant (with direction of the deviation being taken into account) from the group’s mean score then it could be said that, based on his test performance, he did not appear too much resemble, in a general sense, those in the group.  All of this would have even more meaning if it were understood that the content, of the test in question, were just about all items that involved matters that based upon their content alone on what could be conceptualized as pertaining to the trustworthiness concept.

     In this type arrangement, then the test in question, in this situation the PHI, could be said to possess at least two quite different types of validity.  It, like the integrity tests, possesses face or content validity; but much better, it can also be regarded as possessing empirical validity.  In this latter validity arrangement, it can be used to determine whether a test-taker scored, based on an empirical comparison, resembles the type or kind of persons who made up the validity criterion group.  In the case of the PHI, this criterion group was those government employees, in the very special group that was described in some of the previous paragraphs.  This group could rather easily be conceptualized as being trustworthy, and being almost continuously examined for this very characteristic.  It would be hope and fully expected that future research efforts to show construct validity for the PHI would be very successful.  Some limited early attempt to show some construct validity have so far been quite successful.

     Well then, just exactly what is this so-called test of trustworthiness, the PHI?  Actually, it is just a portion of another slightly more lengthy test that has been repeatedly researched regarding reliability and validity determinations.  The 50-item PHI is a moderately shortened version of the Personnel Security Standards Psychological Questionnaire (PSSPQ), about which a number of Web-site pages have been written.  Some of these pages can be found at the following Web addresses:

http://www.home.earthlink.net/~lastone2/psspq.html
http://www.home.earthlink.net/~lastone2/individualsales.html
http://www.home.earthlink.net/~lastone2/hrandsecdirectors.html
http://www.home.earthlink.net/~lastone2/psspqreliabilityvalidity.html
http://www.home.earthlink.net/~lastone2/psspqfaq/html
http://www.home.earthlink.net/~lastone2/increasesuccesschances.html
http://www.home.earthlink.net/~lastone2/onlyonepsspq.html
http://www.home.earthlink.net/~lastone2/atyourservicepsspq.html.

Although a number of other additional Web pages have been devoted to presentation and discussion of the PSSPQ, these above listed several pages can be regarded as the most ‘major’ ones.  For anyone who may want to view all of the Web pages that are focused upon the PSSPQ, the reader is encouraged to enter "psspq" (including the parentheses) into the Google search engine.

     Basically, the PSSPQ can be regarded as being a very well researched and well developed psychological test.  It has been shown to possess at favorable levels the following types of reliability: test-retest and internal consistency (i.e., Kuder-Richardson Formula 20).  The following types of validity have been
shown for the PSSPQ: face or content, empirical (predictive), construct, and factorial.  Since the PHI is nothing more than a shortened version of the PSSPQ, it can be strongly argued that it too possesses some of same very desirable reliability and validity characteristics as does the PSSPQ.  The most major change effected when the PSSPQ was shortened about a third, other than deleting 22 items is that its whole rationale for being also changed.  Shortening it rendered it no longer appropriate   and usable for predicting who would or would not be granted high-level security clearances.  The first standardization or normative group that was employed were 206 U.S. Government workers, all with a very ‘sensitive’ intelligence agency.  Each of these in this group held TS-SCI security clearance status.  Such a level clearance can be considered to represent just about the highest level clearance granted to any fairly large group of U.S. citizens by their government.  In addition to this, the PHI was also administered to another 63 employees, who were employees in the very same intelligence agency, but who were employment associated with a very special project group within the agency and had  been repeatedly and very aggressively evaluated for trustworthiness.  The aggressiveness and thoroughness of the evaluation process for this group was even greater that followed when evaluating persons for TS-SCI access status.  It can easily be believed, that this special group of 63 constituted a very unusually good trustworthiness standardization group for the PHI.

     It is important to understand just how the PSSPQ was shortened into becoming the PHI.  In order to explain how and why this was accomplished, it should be noted that the PSSPQ was composed of 11 different scales plus one LIE (i.e., positive dissimulation) scale.  All of the scales, except for the LIE Scale, were designed to be based upon the 11 adjudication concerns that are considered in the processing of an individual who has been nominated for possibly being granted a Top Secret – Sensitive Compartmented Information (TS-SCI; and which is very high) security clearance by the U.S. Government.  These adjudication standards or concerns were originally stated in a government document that is titled as: Director of Central Intelligence Directive 1/14 (or DCID 1/14), which was replaced, with very few changes, in 1998 by the DCID 6/4.  The 22 items that were omitted when creating the PHI were entirely all from the elimination of five complete PSSPQ scales.

     As a more brief form of the PSSPQ, the PHI was also developed so as to be relevant for assessment of honesty and trustworthiness in the general applicant for employment sector.  The PHI was not designed for use with evaluating persons being processed for security clearance status consideration as was the PSSPQ.  The PHI test consists of seven of the original PSSPQ scales.  These seven are:
a. Undesirable Characteristics
b. Financial Irresponsibility
c. Alcohol Abuse
d. Illegal Drugs & Drug Abuse
e. Record of Law Violations
f. Security Violations
g. LIE

     Most of the original PSSPQ scales, in these above listed scales were modified a bit from the form that they had as PSSPQ scales.  Some items in the scales were modified very slightly and others modified to whatever extent was needed to make them more subject relevant.  A few new items were created and they were all for the LIE Scale (which now has 13 items instead of the 10 in the PSSPQ).  The number of items involved in each of the now six remaining scales were (same scale designations as above):  a. (nine items), b. (six items), c. (six items), d. (six items) e. (seven items), and f. (three items).  The three items, comprising scale f., were modified in a fashion so as to make this brief scale sensitive to something other that it was in the PSSPQ.  In the PHI, these three items now measure violations of employment confidentiality instead of only past governmental security violation matters .  This new focused scale is now more appropriately named as the Security/Confidentiality Violations Scale.  All of the other items (in scales a, b, c, d, and e.) were almost unchanged from their representation in the PSSPQ.  As one can see from this given description of the items that comprise the PHI, it is rather incorrect to simply describe the PHI as a shortened version of the PSSPQ – it is actually essentially a new test.  However, it is enough similar to the PSSPQ, with respect to most of its items and general purpose, that much of the reliability and validity determinations, which have been made for the PSSPQ, can perhaps be somewhat generalized to the PHI.

     As indicated earlier, the number of items that comprise this major revised PHI test are 50.  Therefore this revision of the PSSPQ, into a somewhat new test, which is now known as the PHI test, is only about two-thirds the length of the PSSPQ.  As a consequence, its administration and scoring times are reduced proportionally.  These 50 items generally only take about 8-15 minutes to fully respond to.  Actually, some fast readers have only required about five
minutes to fully respond when tested.  Reading skills level for the PHI items was examined and was found generally to most likely be a bit lower than for the PSSPQ as some of the omitted PSSPQ items were those which did appear to require about a high-school reading level.  It can be expected that a 7th or 8th grade reading skills level would be most likely the minimum required for valid response with the PHI instrument.

     Therefore, with the exception of only three additional, new items, the remainder of the PHI (i.e., the other 47 items) is almost totally from the well-researched PSSPQ, which has been shown to possess excellent predictive validity and good reliability in the screening of candidates for high-level security clearances.  With some of these 47 ‘original’ items, some slight or mild modifications were made for purposes of making the items more suitable for more general use outside of security clearance adjudication matters.  When in the PSSPQ, these 47 items had been ‘proven’ to be up to the test for which they were created.  In the PHI, they are entirely suitable.

     A factor analysis of the seven PHI sub-scales was carried out in an attempt to explore whether some complex factor structure might be present.  The correlation matrix involving the sub-scales’ inter-correlational coefficients was submitted to a principal components analysis, with the limiting eigenvalue set at unity.  The resulting commkunalities ranged from 0.56 to 0.94 (average was 0.71).  The obtained three factors were rotated following a varimax  strategy.  The first and largest factor was a bit bipolar; the largest positive loadings (in descending order) were with the Undesirable Character, Financial
Irresponsibility, Alcohol Abuse, Drugs Abuse, Law Violations, Security Violations, and LIE scales and their loadings were:  0.78, -0.07, 0.14, 0.75, 0.73, 0.60, and –0.43, respectively.  The second extracted factor also was bipolar with a loading of 0.89 for the Alcohol Abuse sub-scale at one end and a load of –0.78 for the LIE sub-scale at the other end.  Factor three consisted only of one sub-scale, the Financial Irresponsibility sub-scale; it had a very high loading of 0.97.  Interesting enough, the LIE sub-scale had a zero-order loading with this third factor.

     What the above principal components analysis can be believed to show is that the PHI sub-scales basically measure three quite independent constructs,  It would appear that what is mainly measured is some sort of generalized character concept exemplified by an undesirable character display, alcohol abuse, a history of law violations, as well as difficulties with past security violations.  Interestingly enough, admissions of such past behaviors seem to be negatively influenced by denial and deception.  A second construct, measured by the PHI seems to be mainly centered on alcohol abuse and admission of such behavior also seems to be negatively influenced by denial and deception.  A third construct appears to be solely involving problems showing a financial irresponsibility history and surprisingly admissions of such type problems seem not to be influenced by denial and deception.  This reported principal components analysis of the PHI can be regarded as showing a type of factorial validity for this instrument.  The emerging factor structure was quite
interpretable and what was especially interesting was how the LIE sub-scale was negatively associated with the first two (and largest) factors.   This by itself could be regarded as a very convincing form of both construct as well as factorial validity for this particular LIE sub-scale. 

How the PHI and the PSSPQ Differ from Other
Similar Focused Psychological Tests

    As previously mentioned throughout this presentation, both the PHI and the PSSPQ represent psychometric testing instruments, argued to test for degree of honesty, integrity, or other similar type synonyms, that for the first time have been constructed having a bona fide criterion sample representing the possession of high levels of what the test was designed to measure.  With these two tests, the possession of high levels of trustworthiness has been operationally defined based on the fact that the employed sample(s) possessed TS-SCI access status (as granted by the U.S. Government).  In other words, the Diogenesian tool used to find "an honest man’ was simply whether he or she possessed TS-SCI access status.  In the case of the PHI, one of the criterion defining groups was a rather rare group who possessed TS-SCI access status and then on top of this extremely high security clearance status also had been evaluated and had been reassigned to a very special program that involved even higher security clearance status that was above (or on top of) the TS-SCI level.  Of course, we are fully aware that even the possession of TS-SCI access status does not totally or absolutely guarantee complete honesty or trustworthiness.  However, can anyone come up with any better groups to use in trying to operationally define honesty, integrity, trustworthiness, etc.?

     For anyone having familiarity with the U.S. Government security clearances structure design, there are a number of clearances that are above the TS-SCI access level, however they are granted to only very small groups of people at the very highest levels in our government; many times only under 10 or 20 people so involved.  These are for the very top White House, DoD, Congressional civilian officials along with the top generals/admirals.   Those in what has been explained in this presentation as being the special program in which one had to already possess TS-SCI status, along with some very needed skills, in order to even be considered to start processing for possible entry into the program.  If successful, then the candidates for entry into this special program are then granted even ‘higher than TS-SCI’ clearances.  If persons in this very special program, in a very sensitive governmental agency, cannot be considered as being very trustworthy, then it is difficult, in a real world situation, to come up with any identifiable group that could be considered as having this distinction.

Summary Comment

     Both the PHI and the PSSPQ test instruments have been described and discussed in this presentation, hopefully to essentially communicate the unusual nature of the validity criteria for both tests.  Although both tests are in fact rather similar in form; actually they share a large number of the same items, they were designed for very different purposes.  The PSSPQ was designed and successfully validated to accurately predict, for persons who might be processed for possible granting of high-level security clearance status, who would and who would not be successfully in finally being granted such clearances.  In contrast, the PHI was designed to establish a response standard (i.e., standardized norms) for trustworthiness based upon a large data-base constructed using test responses obtained from a group of employees of a very
‘sensitive’ government agency who possess TS-SCI access status.  The PHI then can be used, based upon objective test response and norms, to compare degree of similarity of between an obtained set of PHI subs-scales scorings, Total score, and various ratios and combinations of sub-scales scorings with normative group test score information.  Scoring deviation from the normative standardization information can be interpreted to suggest, in varying degree, a lack of similarity with those who have been defined as possessing a large degree of trustworthiness.  The PHI test is unique among just about all so-called integrity or trustworthiness type tests in that it has been actually empirically validated using a normative group that can be, without any argument, as surely possessing a large degree of trustworthiness.  The PSSPQ test, being designed for a quite different purpose is also quite unique in that it is the ONLY testing instrument designed to predict success/failure to be granted high-level security clearance status.

_______________________________________________________________

For those readers who might wish to communicate with Dr. Stone regarding the PHI test, his Email address is:  lastone2@earthlink.net.


______________________________________________________________________