Openreq™ The Premier HUB for HR, Staffing & Recruiting Professionals

Expert Blogs

Welcome to Openreq’s blogosphere covering emerging topics and trends in HR, Staffing, and Recruiting. We provide a venue for thought leaders in our field to share their perspectives and provoke conversation about topics that interest us all for FREE, via search or subscription.

Subscribe now to receive our FREE weekly newsletter with updates!

A Critical Incident for Big Data

The rise of big data did not produce a desire for businesses to analyze data, identify patterns, build models, and better understand the world. No, this desire for analytics has been around for a long time, with the only changes being the tools that are used and the increased ease of data acquisition (Asay, 2013). Big data both requires and enables new approaches to long-established and contemporary problems. I-O psychologists are capable of handling big data analyses but are not as involved as engineers and hard scientists who dominate the field of big data analytics, even in the domain of human re- source analytics. To improve I-O psychology’s contribution, I-O psychologists must better market their current data analytics skills and take steps towards acquiring new skills. An important first step is being informed as to what the term “big data” really means. In the following discussion, we review an emerging definition of big data and the process of big data analytics to outline how I-O psychologists can market their current skills and build the necessary skills that big data organizations seek.

The rise of big data did not produce a desire for businesses to analyze data, identify patterns, build models, and better understand the world. No, this desire for analytics has been around for a long time, with the only changes being the tools that are used and the increased ease of data acquisition (Asay, 2013). Big data both requires and enables new approaches to long-established and contemporary problems. I-O psychologists are capable of handling big data analyses but are not as involved as engineers and hard scientists who dominate the field of big data analytics, even in the domain of human re- source analytics. To improve I-O psychology’s contribution, I-O psychologists must better market their current data analytics skills and take steps towards acquiring new skills. An important first step is being informed as to what the term “big data” really means. In the following discussion, we review an emerging definition of big data and the process of big data analytics to outline how I-O psychologists can market their current skills and build the necessary skills that big data organizations seek.

Big Data Defined

In their October 2013 article, “The Modern App: ‘Big Data’ Technologies: Problem or Solution?” Poeppelman, Blacksmith, and Yang offered an excellent starting point for understanding “big data.” To expand on their work, I offer a definition used by big data practitioners, which will allow I-O psychologists to have a shared understanding of big data with current practitioners and enable them to be more marketable for big data analytics positions. Big data is precisely defined using the three Vs: volume, variety, and velocity (Eaton, Deutsch, Deroos, Lapis, & Zikopoulos, 2012). In considering big data, volume refers to the amount of data and implies that the data is too large to fit into a single computer’s memory. The volume characteristic is similar to Poeppelman et al.’s definition of big data, but variety and velocity represent the breadth of the big data definition. Variety refers to the number of different types of data, from unstructured social media data to ordered data in databases and Excel tables. Velocity refers to the speed of the incoming data, which in big data comes as a concurrent stream from myriad sources, moving much too fast for traditional chunk -sized data-analysis methods to analyze. Consider two examples. First is Google’s self-driving car, which takes in 750 mega- bytes (volume) of data per second (velocity) from a laser, GPS, four radars, camera, inertial measurement unit, and a wheel encoder that tracks the vehicle’s movements (variety), and must analyze this data as fast as it’s arriving in order to respond accurately to an erratic environment (Guizo, 2011). In an employment setting, consider an application that takes in all the data available to human resource and uses it to predict turnover among other important outcomes. Normally, regression analyses are conducted in chunks to make such a prediction, but in big data the analyses would be conducted persistently, as fast the data arrived. The data would be of a wide variety, consisting of applicant tracking system (ATS) data, survey data, point-of-sale data, financial data, employee social media and email data, performance management data, and customer feedback data, which amount to a large volume of the 1.8 million gigabytes of data created with a high velocity every second (McAfee & Brynjolfsson, 2012). In sum, “big data” is a lot of data that is too varied and too fast for traditional analyses.

Now that a precise definition of what “big data” actually means has been established, who designs the analyses? Analyzing the large, varied, and fast- changing big data requires data scientists, who IBM describes as individuals with “a solid foundation typically in computer science and applications, modeling, statistics, analytics and math” and “a strong business acumen, coupled with the ability to communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge” (Eaton et al., 2012). The I-O psychologist is a domain expert in human behavior and has strong analytical skills. With further training in computer programming, advanced mathematics, and large-scale data analytics, an I-O psychologist could become an ideal human resource (HR) analytics data scientist. Even if such training is impractical, an I-O psychologist could benefit from learning the capabilities of the latest data analytics tools and then partner with data science experts to implement HR analyses.

Dr. B. J. Gonzalvo, a PhD in I-O psychology and regular contributor to Data Science Central, outlined the relationship of HR analytics and big data in a personal correspondence:

Big data brings plenty of opportunities to investigate relationships between HR variables such as job satisfaction, attrition, demographics, etc., and overall business productivity. HR doesn’t have to have big data but it’s the automation and digitization of business transactions that opened up these opportunities for analytics and data science in HR. (B. Gonzalvo, personal communication, October 15, 2013)

However, most HR analytics companies have been founded by computer scientists and engineers, who possess the technical background and business acumen but not the human behavior expertise. Gild, TalentBin, RemarkableHire, Identified, and Entelo (Google 2013) are just a few examples of companies founded by non-human-resource professionals that are leading the way in HR analytics. Instead of using employer- generated content (e.g. a job analysis) typical of valid selection assessments, these companies rely on user-generated content from social networks, blogs, and online communities to assess individuals, which raises questions about the veracity of the data, legality of its use, and the ability of such data to reliably predict future performance. Prophecy Science, a recruitment startup founded by neuroscientists, uses physiological responses such as heart rate, eye tracking, and electrodermal activity during a 30-minute test to make predictions about individuals (Empson, 2013), which possibly violates the American with Dis- abilities Act (1991) prohibition of pre-employment medical examinations. There- fore, it’s important for I-O psychologists to become more involved, but if they are to be more involved, they need to augment their skill set. A recent review of graduate curriculum demonstrates an I-O psychologist’s expertise in statistics, evaluation, and psychometrics, but there is a lack of courses on even basic information technology literacy (Tett, Walser, Brown, & Simonet, 2011). Meanwhile, other professional programs such as MBA programs and law schools are recognizing the need for more technical training. A search on the Law School Admissions Council reveals 113 law schools that offer technology courses, and MBA programs offer technology-centered degrees, business analytics electives, and data science pro- grams (Bednarz, 2011; Durupt & Natale, 2013). Although there is a need to re- view the utility of technology integration into graduate programs in I-O psychology, this article will instead focus on how I-O psychologists can market their current skills and gain new skills to con- duct the analyses themselves or at least gain the knowledge needed to partner with data science experts.

New York University’s Center for Data Science describes the big data analysis process as occurring in four major stages: acquire and parse, filter and mine, analyze and refine, and interaction (New York University, 2013). These stages have been used as the basis for the big data analysis stages discussed in this article. Specifically, the stages are asking the right question, mining and refining, and data interpretation. Each stage will be presented along with the recommendations for I-O psychologists to apply their current skills and learn new ones.

Stage 1: Asking the Right Question

Every worker in the U.S. is a prolific data creator, but much of this voluminous resultant data is unstructured and often publically available despite ethical concerns. The first important step in managing big data is asking the right research question, which involves determining the data to collect or analyze if it’s already collected and determining ethical methods of collecting and handling data. In determining how to choose and acquire human performance data, an I-O psychologist can draw upon his or her expertise in human behavior, business, and ethics to best meet the needs of an organization while retaining the trust of employees. Studying an employee’s data has privacy implications, and the resulting decisions can affect employees’ lives and trust in the organization; hence, much care is needed in handling it. The recent spotlight on government data collection from private citizens has only magnified these concerns. Applicant characteristics such as gender, ethnicity, and political affiliation can be easily collected with a cursory Google search (if they’re not already evident in an application), but the use of any of that data to differentiate candidates for hiring or promotion decisions poses serious legal implications. Similarly, the collection and use of the physiological data or publically available data by recruitment startups presents serious issues of privacy, fairness, and legality.

In addition to ethical and legal concerns, there are concerns of scientific integrity. Handler (2013), president and founder of, argued that pure statistical models without support of theory are just “dustbowl empiricism” and unscientific. According to C. James Goodwin, author of Research in Psychology (2010), statistical models deprived of theory do not meet the goals of psychological research or of scientific progress. Generating hypotheses to match findings impedes the development of general theories that can explain divergent findings within different contexts. Without stated hypotheses, such models are unfalsifiable, and an unfalsifiable theory is a useless theory, argues Pop- per (1959). The falsification process is also crucial to developing comprehensive theories that can be applied to more than a single data set and can move towards the ultimate goals of psychological research: prediction and explanation of phenomena. An example of generating a hypothesis to match findings comes from Cognizant, an information technology (IT), consulting, and business process outsourcing company, which used social networking data to discover that employees who were highly active online performed almost 100% better than those who were not active online (Davenport, Harris, & Shapiro, 2010). How- ever interesting the findings may be, they were not based on a theoretical model. Because many other unknown variables likely confound the relationship, the utility of the finding is minimal. Therefore models firmly grounded in human performance theory, and individuals with expertise in human performance, are needed to guide data collection. Data does not speak for itself; it is influenced by human biases and can be only as good as the people who manage it. As statistician Nate Silver states in his 2012 book Signal and Noise, “Before we demand more of our data, we need to demand more of ourselves....unless we work actively to be come aware of the biases we introduce, the returns to additional information may be minimal—or diminishing.” I-O psychologists are experts in recognizing human biases and pre- venting them from influencing data. This is a key advantage I-O psychologists can sell to clients and use to convince business leaders that I-O psychologists have an important role to play in any big data projects.

I-O psychologists must seek to bring together science and business, advocating for the building of models that move both the business and science ahead. One ex- ample of achieving this is through a strategic literature review aimed at improving a product or aspect of the business. As a graduate student interning at, a big data startup that automates person– job fit predictions, the author’s first project was an extensive review of the per- son-environment fit literature. The company had built a tool to predict person– job fit and recognized the need for theory but had not yet taken it into consideration. So the author reduced the vast corpus of literature into an outline summarizing each article and explicitly stating the direct relevance to the business’s primary product, while identifying potential new directions to study and advance theory. The review became a reference tool for the data side of the business going forward, grounding product design in theory and providing substantiated science- backed claims for the marketing team. In addition, the literature review led to the development of a thesis project to further advance the science. Literature reviews are a common task for I-O psychologists and can be quite useful if targeted to- wards specific business or client needs. Gonzalvo recommended a broadening of I -O psychology’s understanding of human behavior: “There’s an even bigger opportunity to know social psychology and even behavioral economics. Big data needs interpretation and psychologists can put to use their understanding of psychological theories to interpret the findings” (B. Gonzalvo, personal communication, October 15, 2013). So when it comes to asking the right question, it’s less about I-O psychologists acquiring new skills and more about highlighting their current skill set, asserting their expertise, and continually monitoring and communicating advances in both psychology and related fields.

Stage 2: Mining And Refining

The second stage in analyzing big data is mining and refining, or acquiring and cleaning the data. Once the source of data has been chosen, methods for collecting, filtering, analyzing, and refining the data are needed. I-O psychologists are skilled in choosing the right analyses for large datasets and in building predictive models. Like other scientists, I-O psychologists are trained to scrutinize the findings for confounds or other explanations, to identify and apply different strategies to solving a problem, and to implement a correct solution. They are trained, however, in an imprecise science, which forces them to always be skeptical and questioning of the data and to be comfortable with uncertain findings and imperfect models, placing them at a distinct advantage when managing big data.

Mining and refining is a familiar intermediary step of model development in I-O psychology research, but a crucial difference exposes a skill gap. Model development in I-O psychology is slow and can take years of debate and synthesized research to unfold. With big data, results must emerge much sooner—in the Google car, for example, within milliseconds. Auto- mated tools are essential to streamline the collection of data and analyze it on the fly, but I-O psychologists lack explicit training in computer programming, advanced mathematics, and automatized data analytics, which are necessary to develop and tweak the tools for continuous analyses and scalable data manipulations. In re- viewing the admissions requirements for three graduate programs in data science at NYU, Illinois Institute of Technology and UC Berkeley, three common applicant requirements emerged: knowledge of linear algebra, advanced calculus, and at least one programming language (other preferred coursework included probability, algorithms, and relational databases). Linear algebra and calculus are courses often offered as part of a general education requirement in undergraduate pro- grams, but computer programming is less common. Like survey design or SPSS, computer programming is a data analysis tool. Powerful high-level computer languages like Python, more versatile statistical packages like R, and distributed computing platforms like Hadoop are a few examples of tools that I-O psychologists could learn to diversify their skill sets and improve their value for companies. The easiest way to acquire these additional skills is through free online sources. Any of the programming languages commonly used in data analysis and courses in linear algebra, probability, algorithms, and relational databases are all available for free online. There are self-paced tutorials on sites like Code Academy or Khan Academy, or there are more traditional massive open online courses (MOOCs) taught by leading academics or industry experts on the Coursera and Udacity platforms. In fact, Sebastian Thrun, who developed Google’s self-driving car and founded Udacity, just announced a new data science track on Udacity, which “will help you go from beginning analysts all the way to big data experts” (Thrun, 2013). This review is only a small subsection of all the free course offerings that are available, and while most are targeted at beginners, more advanced courses are available as a learner progresses. There is much opportunity to improve one’s skill set with the only cost being time. A few hours a day is all it takes to acquire the basics of any computer language or to gain a basic understanding of the important subjects within big data analytics. Even if it’s not practical to complete numerous courses, the first few lectures in any class provide an adequate basis for understanding the tools well enough to design analyses that data science experts can implement.

Stage 3: Data Interpretation

At the third stage of big data analysis, data interpretation, I-O psychologists could pro- vide insight into presenting and visualizing data. As applied researchers, I-O psychologists are practiced in presenting data in digestible formats such as expectancy tables or lucid presentations, or in terms of pragmatic business strategy. Although there still exists much work yet to bridge scientists and practitioners in I-O psychology, I-O psychologists’ experience in writing for business publications and practitioner journals put them ahead of other scientists who remain entrenched in esoteric journals and business-impractical research topics.

Ignored findings are a failure; thus the responsibility is upon the scientist to present data in a way that is suitable for their audience and his or her findings. Eric Doversberger is a member of Google’s Personnel Analytics department, where he uses visualization techniques to clearly communicate his team’s findings to the entire company. The visualizations engender trust be- tween employees and the people analytics team, and win the praise of man- agers who can apply the findings to improve their effectiveness (Bryant, 2011). As in data collection, establishing trust is important when explaining findings and will become more important as big data is used to make business decisions, presenting a major challenge for organizations. I-O psychologists can draw on the social identity approach and other models of trust to help businesses gain trust from employers and clients.

I-O psychologists would benefit from improving data presentation skills, using new computer programming methods such as Data-Driven Documents (D3), a JavaScript library designed to “help bring data to life,” (Bostock, 2013) or the text summarization tool TextTeaser to point out a couple of examples. The D3 project ( is open source and allows the creation of everything from basic bar graphs to complex collapsible trees. Data visualizations not only help a business interpret results for its own need but also are also useful in communicating findings to current and potential clients. Another data science solution for quick interpretation is auto- mated text summarization. Perhaps one needs to quickly summarize a large re- port for a client; the TextTeaser tool could be used exactly for that purpose. The online application called TextTeaser ( takes a block or text or hyperlink and, in a few seconds, returns a summary of the text. The tool is best used for news articles, but the underlying technology is intelligent and soon to be open sourced (Shu, 2013), meaning an enterprising I-O psychologist could make some tweaks to the algorithm and model to optimize it for reports or research. Learning to pro- gram visualizations and summarizations provides the researcher with the most versatility but may not always be necessary. Tools such as DataWrangler from Stanford, Many Eyes from IBM, or iCharts allow researchers to upload and interact with data through a user inter- face instead of through programmed commands. They have been designed to facilitate the data visualization process for nontechnical professionals and are not any more difficult to use than Excel. Working in an applied field, psychologists are better prepared for translating their findings into coherent and clear business-relevant presentations but could use further training in learning to use or at least understand live data visualization and summarization techniques.


Managing big data requires creativity; just as I-O psychologists must be creative in their solutions to the complicated problems of work. In the Harvard Business Review article “Competing Talent Analytics,” the authors recommended I-O psychologists as key candidates for data science teams, due to skills in psychometrics, human resource management systems, employment law, and creating analytical initiatives (Davenport, Harris, & Shapiro, 2010). John Merrill, who leads a team of data scientists at Zest Finance, cited psychologists as one of the key members of a data science team (Merrill, 2013). In early 2012, General Motors hired Michael Arena to lead its global talent and organizational group specifically because he was an organizational psychologist with a strong analytics back- ground (Overby, 2013). I-O psychology has the attention of some leaders in big data analysis, but there is still more work to be done to better market I-O psychology’s current skills and develop new ones. With an improved technical skill set or at least improved technology literacy, I-O psychologists could become more valuable assets in the emerging field of big data human resource analytics.

Acknowledgments: A thank you to editor Morrie Mullins and two anonymous re- viewers, Kevin Eschleman and Paul Holman-Kursky for their feedback on prior drafts, and B. J. Gonzalvo for his willingness to be interviewed.


View reviews (0).

What do you think? Leave a review!

Generated by readers, the comments included herein do not reflect the views and opinions of Openreq. All comments are subject to editorial review. Off-topic, inappropriate or insulting comments will be removed.

Please login to leave a review!

Featured Jobs

Director, HR Operations Sparrow Health System - Carson City, Michigan

» view more

Linkedin Feed 30,892 Members

Google+ Feed In 15 Circles

Like us on Facebook 9,492 Likes

Follow Us 2,067 Followers