The Advent of Big Data: To Analyze or Be Analyzed
By Esther S. Weon
May 2, 2017

Big Data has generated a buzz-worthy frenzy within the tech sphere in recent years, largely because it is a breed apart from what the Valley has otherwise been feverishly churning on. Data science does not simply add a veneer of convenience to an existing solution, like many of Silicon Valley’s most recent darlings — mobile apps that get your lunch sushi more quickly, have your laundry done more easily, hire a chauffeur more affordably. Technologists equipped with a sound knowledge of how to wrangle Big Data don’t just get advantages in the tech game — the rules of the game change altogether in their favor. With this paradigm-shifting power comes a heavy responsibility for us technologists – to vigilantly steer Big Data’s growth in a way that protects those who will inevitably fall within its omnipresent reach.

Big Data emerged as the trend du jour in the tech sector within the last decade, but the concept of gathering and analyzing large amounts of data has existed for much longer and spanned industries. Though often mistaken for related disciplines like machine learning or AI, data science is simply “the machine-based collection and analysis of astronomical quantities of information”. In short, it uses technology to analyze staggering amounts of previously idle data to uncover previously obscured insights. It helps us understand our data in real-time, instead of retrospectively, bringing to the surface different observable patterns that can help us make decisions around that data and even predict its future trajectory.

Insurance companies, for example, have been analyzing adjusters’ reports and correlating them to instances of fraud to reclaim millions of dollars in debt collections. Retailers analyze a steady stream of supply and demand data to adjust pricing in near-real time for millions of products. Big Data has infiltrated the ranks of government and public service as well — the Los Angeles Police Department repurposed an algorithm originally used to predict earthquakes to ingest crime data. The algorithm now predicts where crimes are likely to occur with startling accuracy, resulting in a ~20% reduction in violent crimes in areas the software is used.

With Big Data, we can zero in on any trend or correlation and put it up for scrutiny and manipulation. The potential for good is enormous. Government agencies can use traffic data to inform when to schedule construction or how to alleviate congestion. Large corporations can study their monthly cash flow trends to more efficiently manage department resources throughout the year. The healthcare industry can analyze millions of patients’ medical histories to extract common trends and improve future patient care. The education system can ensure students are on-track for academic success and accurately identify at-risk students.

But in the wrong hands, Big Data, like any other technology, can wreak incredible damage. Perhaps most obviously, Big Data heralds in a loss of privacy — as more companies in more sectors realize the treasure troves of data they are leaving untouched or unanalyzed, there’s a growing expectation and financial pressure to mine that data. As more and more companies add data science to their toolbelt and demonstrate how profitable mining personal data can be, we as consumers will inevitably need to be more vigilant about protecting ourselves from those who would exploit our personal information for financial gain.

Furthermore, incorrectly interpreted data can also “justify” decisions that cause more harm than good. Data scientists ingest data from a bewildering array of sources in different formats in real-time — this Velocity, Variety, and Volume are the 3 Vs of Big Data that make their work so difficult. Big Data alone also cannot distinguish between cause and effect — it can only identify correlations, much less statistically significant ones. At its most ill-steered or misinformed, Big Data may recommend against loans for historically lower-income demographic groups, even though it is clearly backwards logic to infer someone’s financial success via something like race. It may also recommend a higher insurance premium for applicants who live in a particular county, or who eat at certain restaurants. It is up to the humans, the data science practitioners, to intervene and analyze the conclusions the data is surfacing to determine which are logically sound.


Because Big Data is open to this human misinterpretation, it also has the potential to become an echo-chamber, reinforcing repeated misconstrued data alongside correct data. Big Data could easily be used to judge people by their networks — by their Facebook friends, their family member, or even people who patronize the same stores or consume the same types of entertainment. This line of application could easily be extrapolated then to justify dystopian no-fly or no-hire blacklists, populated by the usual suspects. It has the potential to be a vicious cycle of stereotype reinforcement, of preconceived notions calcifying under the guise of mathematical objectivity and precision — until those stereotypes become even more pronounced, even more fatalistically “true”.

Even when used properly, Big Data exhibits a reach chillingly disproportionate to the number of people wielding its power. Much has been made of the data analytics firm that allegedly helped Donald Trump rocket to the top of the Republican Party and successfully secure office as the 45th President of the United States. By using “psychographic modeling” to identify those people most susceptible to sensational content on social media platforms, the firm was able to more efficiently tarnish their opinion of the opposition — oftentimes with “fake news” or sound bites taken out of context. The fact that one private company could have used social media users’ information to dramatically change the fate of millions of Americans is humbling, if not also terrifying.

Its this frightening duality that makes Big Data so ominous and promising, all at the same time. After all, Big Data works a lot like we thought magic did when we were children. It can fundamentally change people’s behavior and beliefs in invisible ways. It can pick out insights and exploit them to “change the future”. It can manipulate entire swaths of the population at lightning speed. Most chilling, it can reinforce stereotypes to make the divide between the haves and have-nots wider faster. If nothing else, Big Data is one big slippery slope — one that could be used to justify someone being deprived of insurance coverage, a loan, or even a job.

As technology inexorably outpaces our wildest predictions and marches into our brave new world, it’s easy to be trampled underfoot in the process and forget an essential truth — that technology should be put in the service of humanity. Big Data’s moral gray areas and ethical questions are one thing. But it’s our responsibility as technologists and as human beings to do our due diligence as this technology grows and ensure it does not outpace our ethics. We want to be thoughtful about its growth, not clean up the havoc it may wreak after the fact. We want it to empower, rather than exploit — we want it to evolve in a way that leaves our world more inclusive and prosperous.

Competition for the Best Home Gym is in Bulking Season