Attacks on generative AI: where the dangers lurk

The use of generative AI in companies promises major efficiency gains and rapid idea generation – and new areas of attack on data security. These risks and dangers must be addressed by AI and security managers.

3 May 2024, Text Andreas Heer 4 Min.

The boom (and hype) around generative AI opens the door to the business world for large language models (LLMs). This often unintentionally creates a back door for cybercriminals or at least for the leaking of confidential business data. These security risks need to be addressed by companies’ IT and security managers. This article highlights the main threats and possible safety measures.

Data security and integrity of LLMs

‘The most obvious aspect is data security with LLMs,’ says Beni Eugster, Cloud Operation and Security Officer at the Swisscom Outpost in Silicon Valley. ‘This is about controlling what data the models are allowed to access and, above all, what happens to that data.’ Data or AI governance helps to define which information employees with an AI model are allowed to use – and which information they are not.

With externally hosted models, i.e. SaaS offerings, companies have little control over what happens in the background. This is the case especially if the provider does not offer a data security agreement. Is the data being used for training, or has the model even been manipulated with attackers reading all input and output in the background? What is the risk of encountering a manipulated version with the 470,000 or so models on HuggingFace(opens in new tab)?

Cybersecurity may not be the top priority for some providers in the current phase, says Eugster: ‘In a technology boom, providers focus first on the business opportunities and only afterwards on the risks.’ But the second phase is already rolling in, observes Eugster: ‘In San Francisco, the AI city par excellence, more and more start-ups are being created that deal with the risks of generative AI.’

Discover the latest cybersecurity trends and relevant threats.

Prompt injection or misuse of generative AI

But there is also a danger at the interface to the models when prompts (instructions given to the generative AI) are entered. Clever wording makes it possible to trick the safety measures, referred to as ‘guardrails’. A recent example from research is the ArtPrompt(opens in new tab) attack, in which keywords are transmitted as character graphics (ASCII art), which overrides the guardrails. Prompt injection can also be used to extract confidential data from a model. Among other things, this includes the sources of the information provided.

An excerpt from the prompt used by the researchers to get GPT-4 to generate instructions on how to make counterfeit money. Source: ArtPrompt research paper.

However, the reliability of the models themselves must also be checked during output – the key words here are hallucination and wrong answers. This is all the more true if a company uses an LLM for customer contact, or if the answers are used for decision-making. For example, Air Canada recently had to grant a passenger a subsequent refund because the chatbot on the website recommended this option – contrary to other information on the same website.

Filtering and validating input and output for AI-based corporate services is a key security measure – as is extensive testing. ‘This is where cybersecurity and AI experts need to work together and combine their knowledge to get meaningful test results,’ recommends Eugster. ‘Companies also need to supplement their cybersecurity strategies and requirements accordingly.’

Attacks on infrastructure and training data

It can be assumed that few companies train and operate their own models. These cases give rise to other aspects relevant to security:

AI poisoning: Training data is manipulated in order to generate wrong answers. This undermines the value of an LLM and undermines trust. Effective security measures include controlled data security and appropriate data governance to ensure the integrity of training data.
Security loopholes in the supply chain: Third-party models and plug-ins may have security loopholes that allow attackers to access information such as prompts and training data. A security check of all components is crucial here.
API security: Insufficiently secured APIs can allow unauthorised users to use and abuse LLMs. The usual safety measures for APIs, such as authentication mechanisms and access control, can help.

Protecting LLMs, a new discipline in cybersecurity

Securing large language models requires a holistic strategy encompassing both direct use and the underlying infrastructure and training data. Access rules, data security measures, filtering/checking input and output and proactive risk management ensure safe operation and reliable use. Or, as Eugster sums it up: ‘Understanding the model, the use and the use cases is half the battle to provide good protection for AI projects. Developers and cybersecurity teams need to work together closely to identify vulnerabilities and risks at an early stage.’

Attacks on generative AI: where the dangers lurk

Data security and integrity of LLMs

Prompt injection or misuse of generative AI

Attacks on infrastructure and training data

Protecting LLMs, a new discipline in cybersecurity

Stay up to date

More on the topic