Think AI tools aren’t harvesting your data? Guess again

The meteoric ascent of generative artificial intelligence has created a bonafide technology sensation thanks to user-focused products such as OpenAI’s ChatGPT, Dall-E and Lensa. But the boom in user-friendly AI has arrived in conjunction with users seemingly ignoring or being left in the dark about the privacy risks imposed by these projects.

In the midst of all this hype, however, international governments and major tech figures are starting to sound the alarm. Citing privacy and security concerns, Italy just placed a temporary ban on ChatGPT, potentially inspiring a similar block in Germany. In the private sector, hundreds of AI researchers and tech leaders, including Elon Musk and Steve Wozniak, signed an open letter urging a six-month moratorium on AI development beyond the scope of GPT-4.

The relatively swift action to try to rein in irresponsible AI development is commendable, but the wider landscape of threats that AI poses to data privacy and security goes beyond one model or developer. Although no one wants to rain on the parade of AI’s paradigm-shifting capabilities, tackling its shortcomings head-on now is necessary to avoid the consequences becoming catastrophic.

AI’s data privacy storm

While it would be easy to say that OpenAI and other Big Tech-fuelled AI projects are solely responsible for AI’s data privacy problem, the subject had been broached long before it entered the mainstream. Scandals surrounding data privacy in AI have happened prior to this crackdown on ChatGPT—they’ve just mostly occurred out of the public eye.

Just last year, Clearview AI, an AI-based facial recognition firm reportedly utilized by thousands of governments and law enforcement agencies with limited public knowledge, was banned from selling facial recognition technology to private businesses in the United States. Clearview also landed a fine of $9.4 million in the United Kingdom for its illegal facial recognition database. Who’s to say that consumer-focused visual AI projects such as Midjourney or others can’t be used for similar purposes?

Clearview AI, the facial recognition tech firm, has confirmed my face is in their database. I sent them a headshot and they replied with these pictures, along with links to where they got the pics, including a site called “Insta Stalker.” pic.twitter.com/ff5ajAFlg0

— Thomas Daigle (@thomasdaigle) June 9, 2020

The problem is they already have been. A slew of recent deepfake scandals involving pornography and fake news created through consumer-level AI products have only heightened the urgency to protect users from nefarious AI usage. It takes a hypothetical concept of digital mimicry and makes it a very real threat to everyday people and influential public figures.

Generative AI models fundamentally rely upon new and existing data to build and strengthen their capabilities and usability. It’s part of the reason why ChatGPT is so impressive. That being said, a model that relies on new data inputs needs somewhere to get that data from, and part of that will inevitably include the personal data of the people using it. And that amount of data can easily be misused if centralized entities, governments or hackers get ahold of it.

So, with a limited scope of comprehensive regulation and conflicting opinions around AI development, what can companies and users working with these products do now?

What companies and users can do

The fact that governments and other developers are raising flags around AI now actually indicates progress from the glacial pace of regulation for Web2 applications and crypto. But raising flags isn’t the same thing as oversight, so maintaining a sense of urgency without being alarmist is essential to create effective regulations before it’s too late.

Italy’s ChatGPT ban is not the first strike that governments have taken against AI. The EU and Brazil are all passing acts to sanction certain types of AI usage and development. Likewise, generative AI’s potential to conduct data breaches has sparked early legislative action from the Canadian government.

The issue of AI data breaches is quite severe, to the point where OpenAI even had to step in. If you opened ChatGPT a couple of weeks ago, you might have noticed that the chat history feature was turned off. OpenAI temporarily shut down the feature because of a severe privacy issue where strangers’ prompts were exposed and revealed payment information.

While OpenAI effectively extinguished this fire, it can be hard to trust programs spearheaded by Web2 giants slashing their AI ethics teams to preemptively do the right thing.

At an industrywide level, an AI development strategy that focuses more on federated machine learning would also boost data privacy. Federated learning is a collaborative AI technique that trains AI models without anyone having access to the data, utilizing multiple independent sources to train the algorithm with their own data sets instead.

On the user front, becoming an AI Luddite and forgoing using any of these programs altogether is unnecessary, and will likely be impossible quite soon. But there are ways to be smarter about what generative AI you grant access to in daily life. For companies and small businesses incorporating AI products into their operations, being vigilant about what data you feed the algorithm is even more vital.

The evergreen saying that when you use a free product, your personal data is the product still applies to AI. Keeping that in mind may cause you to reconsider what AI projects you spend your time on and what you actually use it for. If you’ve participated in every single social media trend that involves feeding photos of yourself to a shady AI-powered website, consider skipping out on it.

ChatGPT reached 100 million users just two months after its launch, a staggering figure that clearly indicates our digital future will utilize AI. But despite these numbers, AI isn’t ubiquitous quite yet. Regulators and companies should use that to their advantage to create frameworks for responsible and secure AI development proactively instead of chasing after projects once it gets too big to control. As it stands now, generative AI development is not balanced between protection and progress, but there is still time to find the right path to ensure user information and privacy remain at the forefront.

Ryan Paterson is the president of Unplugged. Prior to taking the reins at Unplugged, he served as the founder, president and CEO of IST Research from 2008 to 2020. He exited IST Research with a sale of the company in September 2020. He served two tours at the Defense Advanced Research Agency and 12 years in the United States Marine Corps.

Erik Prince is an entrepreneur, philanthropist and Navy SEAL veteran with business interests in Europe, Africa, the Middle East and North America. He served as the founder and chairman of Frontier Resource Group and as the founder of Blackwater USA — a provider of global security, training and logistics solutions to the U.S. government and other entities — before selling the company in 2010.

This article is for general information purposes and is not intended to be and should not be taken as legal or investment advice. The views, thoughts and opinions expressed here are the author’s alone and do not necessarily reflect or represent the views and opinions of Cointelegraph.