Algorithmic hiring systems: what are they and what are the risks?

Around 99% of Fortune 500 companies use talent-sifting software in some part of the recruitment and hiring process. Some of these solutions rely on the power of machine learning in order to predict candidate performance with less human input required in the process, constituting algorithmic hiring.

Notable users of algorithmic hiring systems include Facebook, Deloitte, Nestle, Vodafone, Jones Day, Google, the Canadian Department of National Defence and many more. One of the top providers in the field, HireVue, claims to provide its services to over one-third of Fortune 100 companies.

Algorithmic hiring systems, whether semi or fully automated, play a part in determining individual access to important life opportunities with significant implications for their life chances and future wellbeing. In 2018, Amazon famously halted use of their CV scanning algorithm which was found to be biased against women as it had not been trained on sufficient examples of successful women, favouring male candidates. The potential harms to fairness and equality in access to work must be carefully scrutinised as an emerging area of workplace risk.

As part of the Institute for the Future of Work's new research project to create an algorithmic impact assessment (AIA) protocol for algorithmic hiring systems, we have conducted a review of 36 algorithmic hiring tools that are currently in operation. The majority of the tools reviewed were designed and are operated in the United States and typically use machine learning to automate the initial assessment and screening of candidates, before providing recommendations to managers, who make the final decision. These assessments are based on data voluntarily and knowingly submitted by the candidate to make predictions about their job performance in a given role.

The review does not include algorithmic systems used in the workplace for headhunting and recruitment, workforce management, or those that scrape data from the internet without candidate knowledge or consent.

In this post, I outline the common workings of algorithmic hiring systems, describing the data they use, how they process it and how they contribute to decision making. I also briefly explore the implications of these systems for fairness and equality in the UK.

What are the different types of algorithmic hiring systems?

The reviewed systems range in complexity. Those providing more simple services are typically based on keyword matching, applied to conventional modes of assessment such as CVs and cover letters. In such cases, machine learning is not necessarily used as CVs are scanned to discover similarities between words used by the applicant and the job description. For instance, if the role seeks a secretary, then the model searches the candidate's CV for the word "secretary" and related words and competencies. These tools provide basic initial screening of candidates, primarily to check if the candidate has work experience that broadly overlaps with the requirements of the job description. These tools are less explicitly predictive but provide more minor decision support for hiring managers who then make their own assessment of each candidate.

More complex systems, that bring with them more potential for reward and risk alike, are based on prediction. Keyword matching would not suffice here. Predictive systems are based on probabilistic and statistical reasoning, using machine learning techniques, such as natural language processing (NLP), to predict a candidate's job performance based on a potentially wide array of factors.

Setting up the hiring process

Employers first articulate their hiring needs to the developer, who either works with the employer to create a customised profile of success, based on the performance of existing employees, or uses ready-made models of success, depending on the mode of assessment. One of the main selling points of using an algorithmic system is the ability to tailor assessments to the employer's particular role and needs. For example, employee performance data and their results on personality tests could be combined to create a profile of what a “good” candidate looks like at a given company.

Applicants who score similarly on the same personality tests are then thought to be likely to perform well in similar roles. Most systems provide a shortlist of candidates and/or numerical scores for each individual.

Generally, algorithmic hiring systems are not validated using real-world outcomes. In other words, the systems are not validated by checking if new hires perform well once recruited. Instead, the systems predict performance according to how similar candidates are to current top performing employees in the respective company. In principle, it is possible that some systems could validate predictions by factoring in the candidates' eventual performance; however, at present, none of the algorithmic hiring providers have stipulated this to be a feature.

Assessment types

The modes of assessment range widely, from the screening of CVs and cover letters, to analysis of video or chat interviews, to psychometric testing.

Interviews are typically processed using NLP to analyse the content and meaning of candidate speech, judging their comprehension of the question and quality of the answers. Some providers also use facial analysis for video interviews to assess the candidate's expressions and mannerisms.

The problems associated with the use of sensitive biometric data, and the potential for malfunctioning or misuse of algorithms for facial data have been noted and legislated against in certain jurisdictions in the USA.

Psychometric testing includes cognitive and personality assessments, in traditional questionnaire or gamified form. Some of the most popular providers pride themselves on the provision of assessments that ostensibly measure candidates' intelligence, competencies and personality traits—such as resilience or creativity—in rapid, games-based assessments that take minutes to complete. The most common personality model used is the Big Five model, which is one of the most highly validated and researched personality assessments in cognitive psychology.

‍Competency tests are commonly divided into hard skills tests, such as those evaluating programming or numerical reasoning, and soft skills tests, such as those evaluating time management, problem solving, teamwork or emotional intelligence.

As an additional outcome measure to job performance, a few providers also offer prediction of cultural fit, typically expressed in terms of cultural and value add, as well as loyalty and retention.

Fairness and bias considerations

Although much work is being done on fairness and machine learning, it is not always clear what is being done by algorithmic hiring providers and how these procedures relate to equality legislation.

Currently, few providers publish additional information on top of the limited, non-proprietary technical information they disclose on the workings of their models. However, some are starting to report on their attempts to bolster fairness of the models or overall decision-making processes. The few that comment on such matters make almost exclusive reference to compliance with guidance from the Equal Employment Opportunity Commission (EEOC)—an American federal agency that enforces laws against workplace discrimination.

The EEOC recommends application of the four-fifths rule. This is a statistical fairness measure which requires that the ratio of probability of selection of the lowest probability group to the highest probability group should not be less than 80%. For instance, if a company which typically hires more men than women implements this rule, the figures would be as follows: if 60% of male applicants are invited for an interview, no less than 48% of women should be invited for an interview.

While it is promising that some providers are demonstrating their compliance with substantive provisions for equality, the four-fifths rule has its limitations and does not guarantee compliance with the Equality Act 2010, as the tests used to determine discrimination in the UK are not based on this rule.

The most commonly mentioned fairness procedure is the “blinding” of models to protected characteristics and the exclusion of these variables from decision making, despite computer scientists having long asserted that blinded models are insufficient to combat discrimination in machine learning. In our previous reports, Mind the Gap and Machine Learning Case Studies, we demonstrate the shortcomings of blinding for building fair and accurate models.

Another development with significant implications for fairness and equality is the use of biometric data for the analysis of facial expressions and voices (see for example Interviewer.AI or Retorio). Although the use of facial analysis is still rare, with many video interview providers swearing off the practice, it is likely that the growth of facial recognition technologies and algorithmic hiring will contribute to greater use of the practice. As facial or voice analysis could in principle generate more granular insights about a candidate's emotions and presentation, interest seems poised to grow.

The use of biometric data poses unique challenges for human rights, privacy and dignity and represents a regulatory gap, as asserted by the Ada Lovelace Institute. Our previous work has identified ways in which voice analysis can lead to discrimination beyond the existing framework of protected characteristics, such as by discriminating on the basis of socioeconomic background.

The risks and harms that can and will be produced by algorithmic hiring systems are only just beginning to be understood. More systematic, empirical and rigorous accountability measures, such as algorithmic impact assessments (AIAs), will be needed to gain a better understanding of the potential risks to individual rights in the workplace and how to mitigate them.

Stay tuned for further research from IFOW on how algorithmic impact assessments may be applied to algorithmic hiring systems, making the processes of harm discovery, risk mitigation and stakeholder consultation more clear, more rigorous and more standardised.

Contact stephanie@ifow.org for more information on our upcoming work.