Psychometric Tests - Pinning The Donkey On The Tail

In Brief

  • Tests like Myer Briggs and DiSC should not be used in leadership candidate assessment.
  • The Big Five personality traits are the most reliable and valid taxonomy of personality we know.
  • Conducting any kind of psychometric test can potentially lead to misinformed biases.
  • The only way to assess a leadership candidate is with a properly structured interview.

The Details

The title may have conjured up an image of a blindfolded kid at a birthday party trying to pin a printout of a donkey’s body on the tail. Clearly, it’s already the wrong way around. Of course, the correct game is to pin the tail on the donkey. However, to describe psychometric tests in leadership assessment as like trying to pin the tail on the donkey would not be doing justice to the logic of the famous child’s game.

Trying to pin the donkey on the tail much better represents psychometric tests’ misleading and chaotic nature as they relate to leadership hiring and succession planning. Blindly wandering around trying to reverse engineer the wonderous logic of statistics into the infinite complexity of human behavior is impossible enough. That’s what psychometric tests try to do, though, or at least those tests invented after the 1950s, which exclude DiSC and Myers Briggs. If we go all the way back to Myers Briggs, we’re not even trying to pin the donkey on the tail; we’re just blindfolded and standing there saying that tails have donkeys. That’s really not very helpful.

What also isn’t very helpful is trying to do any psychometric test after a properly structured interview. If you do, you may confirm what you already know, which doesn’t add anything, or, at worst, you’ll find out about traits that don’t manifest in the candidates day to day career, which really isn’t useful to know anyway. Besides, what you think you may know from these tests is highly likely to be engineered by the careful consideration of the candidate’s responses which can paint whatever picture they choose. Therefore, the best we can hope from psychometric tests is to create a bias of one candidate in favor of another based on results from tests that don’t tell us anything new or tell us things that aren’t relevant, or worse still, things that are not true.

I believe that one of the main reasons we’re so keen to believe the ‘science’ of psychometrics is that we don’t trust our own judgment. Also, we like the thought of there being something scientific to back up our opinions. However, psychology is different from traditional sciences like mathematics or chemistry. In behavioral science, two plus two doesn’t have to equal four all the time for it to be considered a fact. With psychometrics, the measure of behaviors, scientists base their ‘facts’ on statistics, where the bar is relatively low compared to the traditional sciences. If two plus two equals four 50% of the time, that is considered a good result. The issue is that scientists know how to differentiate between specific statistical outcomes. Still, the average LinkedIn reader does not, so they see ‘science’ and believe it to be facts, no matter how dubious the findings are.

People think personality tests are so good because they’re never done blindly and without any regard for any other information like an interview. It’s also due to the Leadership Assessment Fallacy (LAF). I identified the LAF in the research I conducted interviewing over 1700 leaders, which led me to write “The CEO’s Greatest Asset – The Art and Science of Landing Leaders.” In essence, the LAF theorizes that people consider specific assessment methods to be effective because candidates who get hired go on to do a good job, most of the time, thus validating the assessment method. However, this is a classic ‘correlation doesn’t equal causation’ scenario. Just because more people drown in the ocean on hot days, because more people are out there, and more people eat ice cream on such days, we cannot infer that eating too much ice cream carries a risk of drowning.

The LAF proposes that after a crude distillation of a long list of candidates to arrive at a short list, the three to five candidates who go on to get interviewed usually are sufficiently qualified to do the job. There are exceptions, of course. Therefore, it is typically the case that more than one of these candidates, if appropriately shortlisted, will go on to do well if selected for the role. The candidates who are qualified and motivated to do a great job make the assessment look good, not the other way around. This is the Leadership Assessment Fallacy. You can find out more about this here.

The reference to pinning the donkey on the tail was about the endless and painful plight of psychometric companies trying to insist their tests can peek beyond the curtains of a candidate’s psyche and infer something that’s both new and useful about the candidate. This article is about leadership candidates specifically.

To illustrate how useless these psychometric tests are in leadership candidate selection, we’re going to look at Myers Briggs, DiSC, and the Big Five. Myers Briggs and DiSC fall into the same category of pre-scientific tests and are based on the theories of Carl Jung. I write pre-scientific because the standards for validity and reliability weren’t even invented when these two tests were first created back in the 1940s. The Big Five, on the other hand, is the most valid and reliable taxonomy of personality we have.  Therefore, if we can show that it is also not very useful in leadership candidate selection, then we don’t have to ponder too much on the Myers Briggs and DiSC methods.

The Pre-Scientific Tests

If you look online, you’ll see no shortage of articles making claims about the validity and scientific rigor of the Myers Briggs and DiSC tests. In fact, a lot of effort has been made to fill up the pages of Google searches, so people are left with no doubt about their validity. I’m guessing that this is in the hope that we all read these reports and think, “Well, these are psychologists, so we should trust them.” You don’t have to be a psychologist to understand what’s really going on here. What follows are some facts about these tests. You can decide for yourself whether you could still be convinced that these are in any way useful and reliable predictors of job success, especially when it comes to leadership roles.

Myers Briggs

Myers Briggs, or the MBTI questionnaire, was first published in 1943 by Katharine Cook Briggs and her daughter Isabel Briggs Myers. This test was based on the theories outlined in Carl Jung’s 1921 book Psychological Types.

There are many issues with this test. Forget for a moment the issues of reliability and validity, of which it is neither. These are important because they’re not just words. They’re actually the qualifying factors that determine whether a psychometric test is reliable and valid, as laid out in the American Psychological Association (APA) guidelines.

They actually have a disclaimer on their site that reads: “It is not ethical to use the MBTI instrument for hiring or for deciding job assignments.” Would they have put that if the site was legally allowed to be used as a diagnostics tool for hiring? Exactly.

Have you ever heard of the Barnum effect? Also known as the Forer effect after Professor Bertram Forer investigated this phenomenon in 1948. The Barnum effect in psychology refers to people’s gullibility when reading descriptions of themselves. He gave students a personality test and gave them all random general feedback claiming it was specific feedback to them, yet he used general statements like, “you have a tendency to be self-critical.” When the students were asked to rate how accurate the feedback was, they were overwhelmingly convinced it was representative of them. Source: The Journal of Abnormal and Social Psychology, 44(1), 118-123. doi:10.1037/h0059240)

With the Myers Briggs test, there are 16 personality types, with each one represented by four letters, namely, ISTJ, ISFJ, INFJ, INTJ, ISTP, ISFP, INFP, INTP, ESTP, ESFP, ENFP, ENTP, ESTJ, ESFJ, ENFJ, and ENTJ. One of the issues with this is that we’re all supposedly typecast as one of these sixteen types. However, that’s not the main issue. Don’t let me convince you, though. Let’s try a little test so you can discover this for yourself.

For each of the following statements, choose one of two responses:

  1. This does not describe me at all.
  2. This describes me, at least in part.
  • Feel free to write down and add the number of 1s and 2s you collect.
  • Practical, matter-of-fact, realistic, and responsible.
  • Loyal, considerate, notice and remember specifics about people who are important to them.
  • Organized and decisive in implementing their vision.
  • High standards of competence and performance – for themselves and others.
  • Interested in cause and effect, organize facts using logical principles, value efficiency.
  • Loyal and committed to their values and to people who are important to them.
  • Adaptable, flexible, and accepting unless a value is threatened.
  • Have an unusual ability to focus in depth to solve problems in their area of interest.
  • Flexible and tolerant, they take a pragmatic approach focused on immediate results.
  • Enjoy working with others to make things happen.
  • Make connections between events and information very quickly and confidently proceed based on the patterns they see.
  • Resourceful in solving new and challenging problems.
  • Organize projects and people to get things done, and focus on getting results in the most efficient way possible.
  • Want to be appreciated for who they are and for what they contribute.
  • Loyal and responsive to praise and criticism.
  • Usually well-informed, well-read, enjoy expanding their knowledge and passing it on to others.

Source: https://www.myersbriggs.org/my-mbti-personality-type/mbti-basics/the-16-mbti-types.htm

So how did you do? How many were number two – “I believe this describes me at least in part?” Ten or twelve? All of them? Well, these sixteen statements are each taken from one of the sixteen different types. Do you see why it’s been so popular since it started to be sold as a commercial product in the 1970s? This is what the company earns tens of millions of dollars each year for predicting – that we are all many types all at once, despite the test concluding that we are only one type. Of course, it’s going to lead to “Oh my God; this is totally me” eureka moments. Do we really need to be scientists to figure out this is bogus? What’s more, none of the sixteen types have a single negative or constructive word to say. Who doesn’t love hearing only great things about themselves?

Besides, on their website, it also reads, “All MBTI administrators should be very clear that taking the Indicator is voluntary and that regardless of scored results, respondents are free to choose their own best-fit type.” So, in other words, “Don’t worry about our ‘licensed’ practitioners who administer your test and charge you or your company money. If you don’t like your result, just choose another one.” That’s like having a math test and asking students to add two plus two and then writing in the test guidelines, “If you get the wrong answer, don’t worry, just choose any number, and we’ll still give you an A.” You can’t make this stuff up. This is just a sample of a broader critique of the Myers-Briggs Test.

DiSC

I’ll go a little easier on DiSC as it at least isn’t trying to convince us that we’re one of 16 wonderful types, which all say good things about us. We’ll also come on to the Big Five following this, and a lot of the criticism of the Big Five is also relevant to DiSC.

DiSC assessments are behavioral self-assessment tools based on the 1928 DiSC emotional and behavioral theory of psychologist William Moulton Marston. This was also based on the theories of Carl Jung. In 1940, Walter Clark took the theory of William Moulton Marston and developed the first DiSC personality profile.

At its broadest, DiSC claims to measure four aspects of personality: dominance (D), influence (i), steadiness (S), and conscientiousness (C). These are the foundational traits of the original DISC model. To get your score, you have to complete a test where you’re asked to grade a reaction ranging from strongly agree to strongly disagree for 80 different statements.

Here is something that is quite astonishing about DiSC and the Barnum effect that I referred to earlier. This has to be read at the source to be believed. On the DiSC website, discprofile.com, there is a question:” Are DiSC profiles accurate?”

Their answer: “Overall, participants report that the DiSC fit is good or excellent approximately 90% of the time. As documented under the Forer effect (1949), however, it’s not unusual for participants to show a high level of agreement with psychological test results.” Source: https://www.discprofile.com/.

So, one can only conclude that even the owners of this product are saying, “The accuracy is high, but then, it would be because people are gullible when it comes to psychometric results.” Besides, what does ‘accurate’ mean when it comes to leadership assessment? Even they say DiSC should not be used for leadership assessment. Here is another astonishing quote from their website:

“Although DiSC® profiles are often used as part of the hiring and onboarding process, they’re not recommended for pre-employment screening. DiSC does not measure specific skills, aptitudes, or other factors critical for a position; it describes one’s natural work behavior patterns or styles to help improve productivity, teamwork, and communication.”

So, wait, we can’t use it for assessing candidates but it does describe someone’s behavior patterns or styles to help improve productivity, teamwork, and communication. Okay, so what is the job of a leader if it is not to act in an optimal way with the appropriate behavioral style to improve productivity, teamwork, and communication? That is what a leader does.

No doubt any DiSC fans will be saying, “Well, it shouldn’t be used in leadership assessment anyway,” which is true, but we all know it still is being used. They know it still is being used. Cigarette smokers know smoking is bad, and it says so on the packets, but cigarette companies are still selling them. Besides, any product that recommends Myers-Briggs as a useful tool to augment the findings of DiSC is immediately discrediting itself. Again, you can’t make this stuff up. This is on their website:

“Familiar to many is the Myers-Briggs Type Indicator® (MBTI®). A perennial discussion exists among consultants and others about which profile is the best. A better question is: Which assessment better fits your needs?” I can answer that. Neither Myers Briggs nor DiSC.

It continues, “Because the two instruments provide different kinds of information, they might very well augment each other as separate views of the same person.” We’re in an odd place when DiSC is recommending Myers-Briggs as a tag team partner. There’s plenty more to be critical of with the DiSC method, some of which also applies to the Big Five personality traits, so let’s go on to look at the Big Five in more detail.

The Real Personality Science – The Big Five

In the 1990s, what we now understand to be the Big Five or Five Factor Model was established by Costa & Mcrae (1992). Their work was a refinement of work conducted previously, which started with an American psychologist, Gordon Allport, back in the early 20th century, who is credited with the “lexical hypothesis” (Allport, 1937).

The Big Five is the most reliable and valid taxonomy of personality that we have to date. The Big five traits are Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. We all exist on a spectrum of these traits, which already makes it different from DiSC and Myers Briggs, among other factors. The Big Five should be studied and understood, but until we find a way to test for them that isn’t skewed by the candidate’s motivation to get a job, we should look for other ways to spot the clues of their presence.

The primary Big Five test is a battery of over 200 questions which, like the DiSC test, is a series of statements and options to rank the user’s level of agreement with statements ranging from strongly agree to strongly disagree. If you’re in the comfort of your home doing this test for yourself with nobody watching, you may well get accurate results and they may be useful to you. However, these tests are so easy to manipulate that in any job assessment, you’d have to be Jim Carey in Liar Liar to not massage your answers just a little.

Here are three examples of such statements:

  1. I can be trusted to keep my promises.
  2. I work well in teams.
  3. I tend to be messy.

As you can see, the questions tend to give away exactly how they should be answered when one is motivated to gain employment. Let’s say you were a bad candidate but with a good imagination. This is how your internal dialogue with one of your avatars may go:

“Question one – ‘I can be trusted to keep my promises.’ Well, clearly, I can’t, but I’m not going to let them know this. They’ll want someone who is trustworthy, so I’m going to score a five here; strongly agree.”

“Question two – ‘I work well in teams.’ Ugh, I hate working in teams, but it’s a job, and obviously, they’ll want me to get along with people, so I’ll score a five here, also.

“Question three – ‘I tend to be messy.’ Well, yes, I am messy, but I’m not telling them that. I’ll strongly disagree here so that they think I’m super tidy.”

Hopefully, you can see how such a test can be easily manipulated by anyone motivated to get a job. Hopefully, you can also see how bad such tests are, especially when they’re relied upon to provide data to hiring leaders. Don’t you wonder how many leaders missed out on a job opportunity because their competitor candidate did a better job of cheating with the psychometric test? Isn’t it extremely unfair that despite even the psychometric companies’ disclaimers stating that they shouldn’t be used in job assessment, they still are?

Better Structured Interviews

Here’s the final analysis of these psychometric tests that makes any debates about validity or usefulness completely irrelevant. You won’t find any nuances in a personality test that you won’t find in a properly structured interview that’s designed to get much deeper insights – the Bremnus interview method, for example. Just like with personality tests, not all structured interviews are created equally. It’s not just about the questions being asked but also about how the interviewer knows what to do with the answers. A psychologist doesn’t become a psychologist because they learned to ask, “Tell me about your childhood.” A professional interviewer isn’t an effective interviewer just because they memorized questions like, “Tell me a bit about yourself.”

Knowing what questions to ask isn’t even half the battle. How does the interviewer know how to interpret the answers? What do they know about the scientific literature on things like intuition, biases, personality, and so on that they can draw from in order to form more objective opinions? These are the futures of leaders’ lives that are in their hands during this assessment process. Therefore, they owe it to the sacrifices the candidates went through to get to where they are to know more about how to interpret their answers. Otherwise, it is not a fair process, to say the least.  Because Big Five personality traits are relatively stable over time, if you did personality tests blind to select people without a structured interview, you’d run into way more trouble than if you did interviews with no personality tests.

A properly constructed interview will uncover how one’s traits have so far manifested in their work. Doing further vaguely valid to completely invalid tests to discover dormant traits just adds misleading bias to the interview process. That bias occurs because you’re re-evaluating the candidate based on factors and traits that have never amounted to anything to date, so why will they now?

Why not find the candidate where the relevant and desired traits are self-evident from the structured interview? To those that say, “but a well-constructed psychometric test will validate some of the traits found in the interview so they can be cross-referenced,” my reply would be, “We don’t need to cross reference traits that have clearly been demonstrated in the patterns of behavior shown throughout the candidate’s career.” Actions speak louder than words and actions definitely speak louder than questionably discovered dormant traits.

Conclusion

People don’t always trust their judgment, and behavioral psychology aims to augment human decisions with some scientific credibility.  We are quick to rely on any psychometric test to validate what can already be found out in a properly constructed interview where we’ve also learned how to correctly process the answers. All we have right now is people who have interview questions to ask, which are typically behavioral competency interviews, which is a problem in and of itself.

If, as a leadership interviewer, you’re faced with the following options to augment a well-structured interview process, which would you choose given what you now know? I’ll give you a clue. These are listed from the least to the most useful.

  • Myers Briggs Test.
  • DiSC Test.
  • Big Five Personality Test.
  • Have leadership candidates play ‘pin the tail on the donkey’ – the correct game. The one who gets the tail closest to the correct part of the donkey gets the job.
  • None of the above.

For leadership candidates to get a fair assessment, it is necessary to have at least two things in place. Firstly, a properly structured interview where the candidates have a better opportunity to tell more of their complete story. Secondly, interviewers that have been trained to extract the patterns of traits that are evident in their deep stories. Without these basics in place we may as well just get the donkey picture out, print the resumes and use the resumes as tails. Whichever resume gets pinned closest to the donkey gets the job. Not the worst idea.

ABOUT THE AUTHOR

Fraser Hill is the founder of the leadership consulting and assessment company, Bremnus, as well as the founder and creator of Extraview.io, an HR software company aimed at experienced hire interview and selection in corporates and executive search firms. His 20+ year career has brought him to London, Hong Kong, Eastern Europe, Canada, and now the US, where he lives and works. His new book is The CEO’s Greatest Asset – The Art and Science of Landing Leaders.

Other Articles

DE&I Process Audits to Accelerate Change – New Research

Based on 300 pages of new research, this article outlines one of the many reasons why DE&I efforts are failing and what to do about it. In particular, it explores the failings of processes relating to hiring and succession planning as well as proposing new solutions.

The Unbiased History of Unconscious Bias

We have been led to believe that our unconscious biases are corrupt and cannot be changed. A whole training industry has been born out of this misled belief guided by misleading science without the complete scientific history being shared. Here we share the entire scientific history, which paints a much different story.

The Leadership Assessment Fallacy

Leadership hiring at all companies appears to work just fine. Great leaders are still being hired into companies, and they go on to do well no matter how good or bad the interview process and psychometric tests are. Isn’t that interesting?