1. Introduction

Over the past years, researchers have investigated the effect of human individual biases on job recruitment decisions. Bias has been reported at several levels, ranging from gender (Davison & Burke, 2000), motherhood (Correll et al., 2007b), religion (Weichselbaumer, 2020), and ethnicity (Correll et al., 2007a) to even candidates’ attractiveness (Desrumaux et al., 2009) and hairstyle (Koval &Rosette, 2020). Accordingly, Human Resource departments have lately been shifting towards AI solutions as a screening process. Besides AI solutions reducing human biases, they also improve and speed up the screening process (Hmoud& Laszlo, 2019). Their efficacy has been particularly seen in times with high unemployment rates, such as the current void-19 situation (Bernstein et al., 2020). In these times, with enormous amounts of applicants, humans become more prone to missing out on well-suited applicants, and their efficiency is challenged with the limited recruitment time frame. Thus, as stated in (Black & van Esch, 2020), the perception around AI-based recruitment solutions has shifted from“nice to talk about” to “necessary to utilize”.Most of the research done in this area has focused on analyzing candidates’ profiles against job descriptions(Al-Otaibi & Ykhlef, 2012; Celik, 2016; Mishra et al.,2020). While this approach helps as a pre-selection phase, it only reflects technical competency, while neglecting candidates’ personality traits, which have shown to be crucial indicators of employees’ success (Judge et al., 1999; Rothmann & Coetzer, 2003). Lately, researchers have started working on AI systems that analyze videos to assess applicant’s personalities. This direction is further motivated by research in psychology showing that first impressions are formed by humans in a fast and intuitive manner in as minimal exposure time as 100ms (Willis & Todorov,2006). The task of apparent personality recognition has thus recently attracted the interest of the AI community, where researchers have succeeded in achieving high recognition accuracy using machine learning (Ponce-L ́opez et al., 2016; Subramaniam et al., 2016; Zhang et al., 2016; G ̈uc ̧l ̈ut ̈urket al., 2017; Kampman et al., 2018).

In OCEAIN, we contribute to this research area by providing a platform that assesses humans’ personalities and competencies by utilizing AI in the fields of Computer Vision and Natural Language Processing. Our System pro-vides two main functionalities: (1) Automating candidate pre-selection process, by providing a faster solution that involves less individual bias; and (2) Identifying team coherency, strength points, and weak points, thus enabling human resources departments to analyze their teams as well as identify how well an applicant will fit into a desired organizational culture.

2. System Overview

Studies of personality prediction fall under two main categories; recognizing actual personality traits (automatic personality recognition) and recognizing apparent personality traits (automatic personality perception) (Vinciarelli & Mo-hammadi, 2014). Given that our goal is to support and enhance the recruitment process through interview evaluation, we are thus interested in the latter task which involves first impression analysis. The aim of our system is to predict personality traits based on visual, audio, and language cues obtained from interview videos, as shown in Figure 1 and further discussed in Section 3. In order to model personality, we use the Big Five personality traits (Goldberg, 1992), which is a dominantly used model in personality research. According to the Big Five personality model, personality is represented by five broad categories: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism, each evaluated on a scale of [0-1].

3. Our Methodology

3.1. Data Set:

We use the ChaLearn dataset (Ponce-L ́opez et al., 2016) to train and evaluate our models. The data set consists of 10,000 high-definition (720p) video clips collected from YouTube, each spanning 15 seconds. The clips were chosen to involve only one person facing the camera and speaking in English, mimicking an interview scenario. The speaker’s five dimensions of personality were assessed by annotators using Amazon Mechanical Turk. It was decided not to rely on speakers’ self-assessments as in addition to them being variable and biased, these ratings would reflect the actual rather than apparent personality traits, which is a different task. As a drawback of relying on the “wisdom of the crowd”, it is a typical problem to fall into annotators’ biases and prejudices. Therefore, the pairwise ranking method was used in the dataset collection process to alleviate the bias problem. In pair wise ranking, annotators rank pairs of participants with regards to each personality trait instead of labeling each participant separately with a rating for each trait. In total, around 2,500 annotators labeled 321,684 pairs. The provided rankings are later mapped to ratings using the Bradley-Terry-Luce method (Bradley & Terry, 1952). Such a setting has shown to be easier for annotators, more engaging and reduces bias (Chen et al., 2016).

The ChaLearn dataset involves participants of different gender, ages, nationalities, and ethnicity. The dataset is gender-balanced, with54.4%female participants. When checking the psychological distributions of participants, we observe close to normal distribution. For example for the extraversion trait, as shown in Figure 2, 47.8% of participants lie in the mid-range of 0.4-0.6, 30.7% lie in the lower range, and 21.5% lie in the higher range. The distribution has a mean of 0.48, the standard deviation of 0.15, and nearly covers the whole scale. The mean and standard deviations for all traits are reported in Table 1.

Trait Mean Standard Deviation
Openness 0.567 0.146
Conscientiousness 0.524 0.155
Extraversion 0.477 0.151
Agreeableness 0.549 0.134
Neuroticism 0.521 0.153

Table 1. Personality traits distributions across

3.2. Machine Learning Models:

Humans rely on both behavioral as well as language cues in order to infer the personality of a person. Accordingly, in our models, we utilize the following different modalities to predict personality traits: visual, audio, and language. The visual features are obtained from (1) video frames using OpenCV (Bradski & Kaehler, 2008), (2) facial action units and feature positions, gaze position, and head possessing Open Face (Baltru saitis et al., 2016), and (3) skeleton feature points (body gestures) using OpenPose (Cao et al., 2019). From the audio, we extract the audio features using PyAudioAnalysis (Giannakopoulos, 2015). These features include Zero Crossing Rate, Energy, Entropy of Energy, Spectral Centroid, Spectral Spread, Spectral Entropy, Spectral Flux, Spectral Rolloff, Mel Frequency Cepstral Coefficients, Chroma Vector, and Chroma Deviation. Finally, in the case of English interviews, we utilize the transcriptions provided by the ChaLearn dataset and integrate further language information into our models. We then use NLTK(Loper & Bird, 2002) to obtain frequencies of words that are reported to have the highest correlations with the Big Five personality traits (Yarkoni, 2010). We also perform sentiment analysis using Flair (Akbik et al., 2019). The frequencies are then normalized by the total number of words in the participant’s speech. After the feature extraction phase, we train our models using a convolutional neural network in an end-to-end fashion where the input to the network consists of the concatenation of the features obtained from the different modalities.

4. Evaluation:

For the evaluation, we report the accuracy achieved by OCEAIN models on the ChaLearn test set which consists of 2,000 videos. In Table 2, we report the overall accuracies as well the accuracies per personality traits for OCEAIN models. We report both English and Generic models, where the English model is trained on visual, audio and language features, while the generic model does not utilize the language features to enable interview evaluations in other languages that are not well supported by speech recognition tools. We also compare our results against IBM Watson Model, as it is publicly available. Results show superiority of OCEAINmodels in both English and generic options with regards to the overall system accuracy. When looking into systems’ ac-curacies per trait, OCEAIN models outperform IBM Watsonmodel in predicting Openness, Conscientiousness, Agree-ableness, and Neuroticism. By also plotting the absolute error across videos per trait, shown in Figure 3, we observe that for most of the test set, the absolute error is less than0.1.In Table 3, we report our system’s accuracy across genders. We use the English model for this evaluation. We show that our model performs equally well across genders.

Trait English Model Standard Model IBM Watson Model
Openness 88.42 88.34 73.85
Conscientiousness 88.85 88.79 81.90
Extraversion 88.30 88.30 83.65
Agreeableness 88.28 88.34 77.74
Neuroticism 88.18 88.11 76.26
Overall 88.41 88.38 78.68

Table 2. Accuracy scores of OCEAIN models vs. IBM WatsonMode

5. Future Work

While the task of apparent personality prediction has received increasing attention from researchers in the past decade, it is still in its initial phases, with several open questions and areas for research advancements. Among the directions that need to be further assessed are privacy, bias, and ethical issues. Even though bias in annotators’ judgments was taken into consideration in the process of data collection, further analysis needs to be conducted to assess the level of bias existing in the data as well as that reflected in the system’s output. One way to address system bias is through explainability, where the recruiter is provided an explanation for the systems’ output. Such a feature would increase the level of transparency as well as trust towards the system, and thus further support the recruiter in the selection process. For future work, we also plan to continue improving our models by incorporating other sources of information as well as investigating different approaches for using the different modalities.


Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter,S., and Vollgraf, R. Flair: An easy-to-use framework forstate-of-the-art nlp. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pp. 54–59, 2019.

Al-Otaibi, S. T. and Ykhlef, M. A survey of job recommender systems.International Journal of Physical Sciences, 7(29):5127–5142, 2012.Baltruˇsaitis, T., Robinson, P., and Morency, L.-P. Openface: an open-source facial behavior analysis toolkit. In2016IEEE Winter Conference on Applications of ComputerVision (WACV), pp. 1–10. IEEE, 2016.

Bernstein, J., Richter, A. W., and Throckmorton, N. Covid-19: A view from the labor market. 2020.Black, J. S., and van Esch, P. Ai-enabled recruiting: What is it and how should a manager use it?Business Horizons,63(2):215–226, 2020.

Bradley, R. A., and Terry, M. E. Rank analysis of incomplete block designs: I. The method of paired comparisons.Biometrika, 39(3/4):324–345, 1952.

Bradski, G. and Kaehler, A.Learning OpenCV: Computervision with the OpenCV library. ” O’Reilly Media, Inc.”,2008.

Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., and Sheikh,Y. Oppose real-time multi-person 2d pose estimation using part affinity fields.IEEE transactions on pattern analysis and machine intelligence, 43(1):172–186, 2019.

Celik, D. Towards a semantic-based information extraction system for matching r ́esum ́es to job openings.Turkish Journal of Electrical Engineering & Computer Sciences,24(1):141–159, 2016.

Chen, B., Escalera, S., Guyon, I., Ponce-L ́opez, V., Shah, N., and Sim ́on, M. O. Overcoming calibration problems in pattern labeling with pairwise ratings: application to personality traits. European Conference on ComputerVision, pp. 419–432. Springer, 2016.

Correll, S. J., Benard, S., and Paik, I. Getting a job: Is there a motherhood penalty? American journal of sociology,112(5):1297–1338, 2007a.

Correll, S. J., Benard, S., and Paik, I. Getting a job: Is there a motherhood penalty? American journal of sociology,112(5):1297–1338, 2007b.

Davison, H. K., and Burke, M. J. Sex discrimination in simulated employment contexts: A meta-analytic investigation. Journal of Vocational Behavior, 56(2):225–248,2000.

Desrumaux, P., De Bosscher, S., and Leoni, V. Effects of facial attractiveness, gender, and competence of applicants on job recruitment.Swiss Journal of Psychology, 68(1):33–42, 2009.

Giannakopoulos, T. py audio analysis: An open-source python library for audio signal analysis.PloS one, 10(12):e0144610, 2015.

Goldberg, L. R. The development of markers for the big-five factor structure.Psychological assessment, 4(1):26–42,1992.

G ̈uc ̧l ̈ut ̈urk, Y., G ̈uc ̧l ̈u, U., Baro, X., Escalante, H. J., Guyon,I., Escalera, S., Van Gerven, M. A., and Van Lier, R.Multimodal first impression analysis with deep residual networks. IEEE Transactions on Affective Computing, 9(3):316–329, 2017.

Hmoud, B. and Laszlo, V. Will artificial intelligence take over human resources recruitment and selection?NetworkIntelligence Studies, 7(13):21–30, 2019.

Judge, T. A., Higgins, C. A., Thoresen, C. J., and Barrick,M. R. The big five personality traits, general mental ability, and career success across the life span.Personnel Psychology, 52(3):621–652, 1999.

Kampman, O., Barezi, E. J., Bertero, D., and Fung, P. Investigating audio, visual, and text fusion methods for end-to-end automatic personality prediction.arXiv preprintarXiv:1805.00705, 2018.

Koval, C. Z. and Rosette, A. S. The natural hair bias in job recruitment.Social Psychological and PersonalityScience, pp. 1948550620937937, 2020.

Loper, E. and Bird, S. Nltk: The natural language toolkit.arXiv preprint cs/0205028, 2002.Mishra, R., Rodriguez, R., and Portillo, V. An ai based talent acquisition and benchmarking for job .ar Xiv preprintar Xiv:2009.09088, 2020.

Ponce-L ́opez, V., Chen, B., Oliu, M., Corneanu, C., Clap ́es,A., Guyon, I., Bar ́o, X., Escalante, H. J., and Escalera,S. Chalearn lap 2016:

First-round challenge on first impressions-dataset and results. In European conference on computer vision, pp. 400–418. Springer, 2016.

Rothmann, S. and Coetzer, E. P. The big five personality dimensions and job performance.SA Journal of IndustrialPsychology, 29(1):68–74, 2003.

Subramaniam, A., Patel, V., Mishra, A., Balasubramanian,P., and Mittal, A. Bi-modal first impressions recognition using temporally ordered deep audio and stochastic visual features. European conference on computer vision, pp.337–348. Springer, 2016.

Vinciarelli, A. and Mohammadi, G. A survey of personality computing.IEEE Transactions on Affective Computing,5(3):273–291, 2014.

Weichselbaumer, D. Multiple discrimination against female immigrants wearing headscarves.ILR Review, 73(3):600–627, 2020.

Willis, J. and Todorov, A. First impressions: Making up your mind after a 100-ms exposure to a face. Psychological science, 17(7):592–598, 2006.

Yarkoni, T. Personality in 100,000 words: A large-scale analysis of personality and word use among bloggers.Journal of research in personality, 44(3):363–373, 2010.

Zhang, C.-L., Zhang, H., Wei, X.-S., and Wu, J. Deepbimodal regression for apparent personality analysis. In European conference on computer vision, pp. 311–324.Springer, 2016.