Big data has proven itself to be more than just a fad –– it has real staying power as an important development that cuts across many fields. No matter what service you use, whether it’s Amazon Web Services, Google BigQuery, or other offerings, the issue of privacy still raises concern about access and availability of personal data. In fact, it is growing in importance as more businesses and organizations start tapping into the power of data collection. In this post, we’ll discuss the relationship between privacy and big data.
The most obvious topic that comes to mind in connection with big data is the problem of data security. Every few months, there’s a major hack, leak, or breach that exposes the data of thousands or even millions of people. That might be financial information, identifying information, health information, or anything else. Criminals can use that data for fraud or other purposes. The more data companies collect, the bigger targets they become. No security is completely impregnable. As a result, over time more and more personal data becomes vulnerable to exposure. This has created an uncomfortable awakening in many people who did not realize just how much of their information was online or stored in a company’s database. It is a tradeoff between the risk of a massive privacy breach and the benefit of better service. Perhaps more intimidating is the idea that the decisions about what data to collect and how to protect it is in the hands of companies, not the people that the data describes. By now, so many different kinds of information are available online that it’s hard to go back and protect all of it. The cost of big data has been a permanent loss of privacy in the form of the risk of hacking.
On the other hand, the definition of privacy has also changed over time. The rise of social media as a means of mass communication and exchange has illustrated that many people are perfectly happy sharing personal information. That ranges from biographical details to updates about relationships, jobs, opinions, and networks of friends. If users are willing to reveal this information themselves, then maybe society’s definition of what constitutes the truly private has shrunk. Of course, that assumes that social media users fully understand privacy settings and how their platforms work.
The underlying theme is knowledge. Even computer-savvy people may not fully appreciate just how much of their data is public and who can access it. Whether that means their social security number that sits in the database of a compromised bank or a controversial opinion on a platform with weak default privacy settings, one explanation for why people share so freely is that they don’t realize how exposed they really are. Some of this is deceptive –– companies are often vague or obtuse about collecting data and aren’t forthcoming about what they do with it. At other times, it’s by mistake –– some just don’t realize that they have their Facebook profile set so that anyone can see anything.
Big data is not going away. Its achievements are too valuable and broad to ignore. At the same time, it’s hard not to be concerned about what it means for people to place a growing amount of data about themselves online and data security. One potential idea is to model data collection after the same model as administering medical care: informed consent. Right now, plenty of companies ask for permission to collect or expose data, but aren’t clear about what, why, and how. Others hide the details behind profile settings or don’t disclose collection at all. Requiring informed consent would mean requiring transparency so that potential users of any software, service, or platform would understand what data the provider might collect, how they store it, why they need it, and who can access it. That would allow for the full use of big data tools without undercutting individual privacy.