Resources
Podcasts
Protecting Privacy in ChatGPT, Credential Stuffing Strikes 23andMe, Freebie Bots

Protecting Privacy in ChatGPT, Credential Stuffing Strikes 23andMe, Freebie Bots

Available on:

Season 02, Episode 06

19th October 2023

To start this month’s episode, we once again weigh in on AI – this time considering the privacy implications when feeding prompts into generative AI tools like ChatGPT and Bard. We’ll discuss whether it’s safe to share company IP or your own personal information into such tools, before hearing how we approach this at Netacea from Principal Software Engineer John Beech.

Next, we’ll look to the news of another major data breach, as it was recently revealed that millions of stolen records from genetics testing site 23andMe were available for sale from an underground forum. The attackers even touted that the data identifies those with Jewish genealogy. 23andMe held customers responsible for reusing their passwords on other sites that had been hacked previously, but where does responsibility for protecting this kind of sensitive information lie and what can each party do to keep data safe? Having spent five years of his career in biotech, Engineering Manager Karol Horosin has plenty to add to this story.

Finally, our security researcher extraordinaire Cyril returns to tell us about freebie bots – a type of scalping bot that targets discounted goods to resell in bulk at retail prices. Sounds like a “prime” bot attack type to target recent and upcoming sales events…

Speakers

Danielle Middleton-Wren

Head of Media, Netacea

Cyril Noel-Tagoe

Principal Security Researcher, Netacea

Karol Horosin

Engineering Manager, Netacea

John Beech

Principal Software Engineer, Netacea

Episode Transcript

[00:00:00] Dani Middleton-Wren: Hello and welcome to Cybersecurity Sessions. I'm Dani Middleton-Wren and I am joined today by a panel of experts are going to be talking us through some of the cyber security topics that have been hitting the news in the last month.

We'll start today's introductions with the wonderful, the practically famous on the podcast stage at this point, Cyril Noel-Tagoe.

[00:00:26] Cyril Noel-Tagoe: Hi everyone, my name is Cyril Noel-Tagoe, I'm Principal Security Researcher at Netacea.

[00:00:31] John Beech: Hi, I'm John Beech. I'm Software Team Lead for the engineering enablement team at Netacea.

[00:00:37] Karol Horosin: Hi everyone, my name is Karol Horosin and I'm an Engineering Manager here at Netacea.

[00:00:42] Dani Middleton-Wren: Great. Thank you, everyone. And if it wasn't clear by everyone's titles, today we have got a mixture of expertise joining us. So we'll make sure to cover the threat research analysis, the business analysis and the technical expertise. So we will be talking about the importance of data privacy in generative AI as it starts to take over the world; the 23andMe credential stuffing attack; and the attack of the month, freebie bots. So let's chat a little bit about ChatGPT. As I just said, it has been taking the world by storm, along with Google Bard, over the last year.

The applications for these tools are seemingly endless, from writing emails and suggesting content ideas down to creating business plans and generating code. Both businesses and individuals have been boosting their productivity using generative AI, but how many people have considered the privacy pitfalls of inputting sensitive data information into their prompts?

Cyril, I'm going to come to you first. Do you think that tools like ChatGPT state what they do with their data?

[00:01:54] Cyril Noel-Tagoe: ChatGPT in particular does. You may have to look for it, but on the OpenAI website, there is a security section which talks about how they use the data you provide. It's split between whether you're an enterprise customer or a normal consumer, and for enterprises, things that you share with them are still owned by the enterprise and it doesn't train its models on the data you input. However, if you're just a regular user, the stuff that you input can be used to train their models. So you need to be careful with what you're actually providing there.

[00:02:28] Dani Middleton-Wren: Is that something that you think individuals are aware of when they use the platform? Like you said, you had to dig quite deep to find those privacy details.

[00:02:37] Cyril Noel-Tagoe: It depends on how security and privacy conscious an individual is. You know, with privacy policies, especially after GDPR, nearly every app or website you go to has a privacy policy, but how many people actually spend the time to read through it, understand how the data is being used?

[00:02:55] Dani Middleton-Wren: Yeah, or do they just blindly click? I mean, how many of us are guilty of just blindly clicking, accept all for cookies? It's kind of the same thing, isn't it?

[00:03:03] Cyril Noel-Tagoe: It is, and I think it's actually worse with cookies, because with a lot of them you have to manually unselect the ones you don't want, so yeah, everyone just goes accept all.

[00:03:12] Dani Middleton-Wren: So, if people are doing that, if they're just, for instance, blindly accepting this standard privacy and they're not really reading into it, what risks are they exposing themselves to?

[00:03:22] Cyril Noel-Tagoe: So, the main risk there is that they are providing sensitive data externally. So that data is now outside of their control. Which means if OpenAI does get breached, that data might get exposed. If attackers find another way of getting the information out of OpenAI, that data is now exposed, and also, you know, just because it's out of their control, especially if it's a enterprise customer who's not using the enterprise version of ChatGPT, so let's say this is an organization where people are just going on ChatGPT themselves, they might be leaking sensitive company information that the company doesn't want out there as well.

[00:04:01] Dani Middleton-Wren: Do you think then that businesses need to put something in place to protect themselves against such data breaches, make their company, well, everybody in the business aware of the risks of using ChatGPT? So I suppose in the same way that we did with GDPR, everybody had to go through compliance training.

We have bribery training as standard. Do you think that we're going to see similar come into force for ChatGPT? If you're going to use this tool, you need to go through this training.

[00:04:32] John Beech: So I think that all existing training still applies, and I think now's the time to do a refresher on all those kind of courses about what is personally identifiable information, and what information might you be leaking out of the company via these tools. So one perspective of the tools themselves, if you just think of them as computers, you're sending your information to a different computer in a different country, potentially.

The problem with a lot of these AI tools, so ChatGPT, Bard and so on, is that they're very new enterprises, they're very new companies. And we haven't had the time to establish the trust with those companies in the same way that we had with other vendors. There's already been a lot of leaks within ChatGPT from a user's perspective in terms of people seeing other people's conversations, people seeing some of these things they shouldn't.

And it's classic, they're building so fast that these bugs have been encountered, they've got hundreds of millions of users. I think it's going to take time for them to mature and to build in those protections. So like Cyril says, we've got enterprise mode now for companies to come and do things but they're probably built on exactly the same tech that they did for the consumer release. So it's going to take some time for them to find all the bugs and properly understand what the implications are, and that's before we even start to talk about what about the sort of data that we're sending to these models and what kind of information could you glean off the types of questions we're asking.

[00:05:58] Dani Middleton-Wren: What do you think, Karol?

[00:05:59] Karol Horosin: Well, I got to agree with John, all of the current training applies, but it needs to be framed in a new way considering those tools. And that's what we did at Netacea. I think every company has a requirement or a point in their ways of working to not share company data in the unapproved or outside tools.

You cannot copy files out of your laptop, but people seem to forget that pasting something in ChatGPT's website is actually taking company's data and putting it outside of the company bounds. And as John mentioned, we have good settings in ChatGPT, but there's a question of trust. And what we do internally at Netacea, we sometimes don't allow the use of tools that we think are not trustworthy. And there has been several company wide tools that we tested for a month. We saw that the company is not very serious with their security metrics and we said, okay, we're not going to share even, you know, basic data or user emails with them because we don't think we can trust them.

So all of the old rules apply, but, we need to reinforce them looking at all the enthusiasm with AI. And also purchase relevant subscriptions that actually make sure that our company data is not used for training. And yeah, ChatGPT states that clearly, but I've used Bard for testing and I still don't know how they're going to use my data because it's hidden quite deep. And as we see with many other services, when the company can benefit from the use of users data, they're usually not very upfront with they're going to use it and with all the opt in and what the opt in really means.

[00:07:48] Dani Middleton-Wren: That's what Italy have done recently, isn't it? They said, we don't really know how you're using our data. So until we know more about it, we're going to ban the use of ChatGPT in Italy.

[00:07:57] John Beech: The types of companies that are making these models, they're basing on training data. And the higher quality training data that they can get a hold of, the better their models and the better performance of those AI machines are going to be.

So they've got a vested incentive in gathering data from the users in order to improve their service. And like for years, we've seen the analytics cookies of like, oh, can we send some anonymous data metrics please so we can improve the quality of our service. That's like a standard thing. But it's literally in the design of these applications that they want that data so they can feed it back into a training loop.

So every day or every week they're releasing a new build or a new model that's improved things, that's a real risk for say our kind of business, any kind of business that wants to start engaging with these tools, these facilities, to start automating things because we're giving away so much data. So my almost thought from that is the only way that we really protect against this is that these models can become small enough or compact enough that we can start to run them in house.

And then you've got an additional layer of trust where you're saying either we're training our own versions of these models, or we're running the models purely on our own hardware, so that we know 100 percent for certain that the data never leaves our estate. And then we can start to take full advantage of some of the more advanced data processing.

[00:09:06] Dani Middleton-Wren: And that's the, I guess the next thing, isn't it? So not everything related to generative AI is negative, let's talk about some of the positive applications of ChatGPT. What do you think we're going to see over the next, 12 to 18 months without frightening us all?

[00:09:23] Karol Horosin: Well, I think there's a lot of developer experience related stuff, and also for every business user. I can speak to what we're doing with our engineering teams and what I'm doing personally. So, with engineering teams, and John is actually a part of a test run of using GitHub Copilot Enterprise version.

So speeding up development and making sure that developers spend more time actually designing solution, not writing code and remembering function call details. So how to use certain functions. But there's all sorts of generation tasks that can be used. So putting data into right formats and getting quicker from your idea to execution of this idea, usually say you create a presentation, you have to come up with an outline with content and then spend all of this time to do a good slide layout.

Why do you have to do that? You put all this work into ideas and how to do it, just transferring into the proper layout can be automated with your guidance. So there's all sorts of assistive tasks that AI will help. It's still very bad at doing whole projects and whole initiatives, but it's very good at small tasks.

So how Microsoft advertises as a co pilot or as an assistant, I think is a great way to put it. Automation, conversions, file generation, everything. It's really good and we've been using it with success.

[00:10:59] Dani Middleton-Wren: I mean, you seem extremely excited about it.

[00:11:02] Karol Horosin: Oh yeah, I'm actually writing a lot of LinkedIn posts and blog posts related to it, because I see a huge benefit myself and others do as well.

[00:11:10] John Beech: Yeah, so, in order to open up these tools to the developer community and also to employees, what we've done is we've built a layer between, The external models, so ChatGPT, Bard, et cetera, and our own systems. The idea is that we can authenticate as Netacea employees to safely use these tools. And then we've got an opportunity to filter out PII and customer information. So we've got the idea of blocklists and pattern matching where we can basically substitute the data that goes out. So maybe you want to ask a question about a customer but you don't want to name the customer, so we can substitute out the name of the customer, still ask the question, it'll tokenize it send it off to the model to get some sort of answer out and then substitute that that data we can use that for auditing so we can see what messages we're using through these tools. so we've got a bit of legal protection to say we never did actually send your details out. We kept those locally in secret. And that means hopefully we can build tools which are actually closer to the customer, closer to the user without actually exposing customer data out to a third party.

[00:12:10] Dani Middleton-Wren: So do you think then, so if it's a co pilot kind of set up with generative AI, is it a case that you still need to be an expert yourself? You can't lean on the AI. Do you still need to have that base knowledge, that foundation, and then you can apply it to do the easier bits? Could a little marketer like me just go, okay, well, I want to do a bit of development. I'm going to use ChatGPT or Google Bard to help me out and try and whip something up. Or are we way off that kind of thing yet? Or is it not possible?

[00:12:44] John Beech: I think exactly someone like you could do a huge amount with the current set of tools, simply because you can put your thoughts into text, but you might not know how that translates into computer code. These models are very good for giving you ideas and pointing you in the right direction.

So as a learning tool, people are starting their sort of developer journeys through co piloting and GPT.

So I think that's a fantastic way to get into coding and build stuff, build products. It helps to be an expert. Like, as an expert using these tools, I'm using it for boilerplate. Sometimes I ask it to do more complex algorithms, and it fails, it just gets it wrong, or it gets it mostly right, but not quite right.

So it takes an expert to spot when it's got things wrong, and you need to go back and edit those things. There's loads of use cases for, like, template transformation. It's like, oh, I've got this data set, or I've got this paragraph, these sentences with some numbers in it.

Draw me a graph, or make this a table, or convert it into CSV format, or convert it into JSON format. And those kind of very factually accurate type things. These language models do really, really well, and so rather than like, spending, I don't know, an hour writing a script to like, do the conversion for you, you can just ask these models, say, oh, could look at this article and make it a different format for me?

And that massively accelerates our ability to transform information from one structure to another. I'm really excited for us to start building those into our workflows because they do save a lot of time. They're not as, quote unquote, efficient as having a script that does that. But if you could just explain your problem in plain text and then have one of these systems figure it out for you, that's a whole new way of interacting with computers that's been unlocked, that we never had the ability to use with.

And that's at the text level where I'm looking for in, like, next year and beyond, these multimodal models where you give it a problem and it'll generate you an image, it'll generate you audio, it'll generate you maybe 3D models in the future. There's all these, all this potential that becomes unlocked as these models become more and more clever.

And then suddenly that's a really powerful aid to everyone's workflows.

[00:14:42] Dani Middleton-Wren: I mean, you really sold it to me there. I'm now thinking you know, maybe I'm going to start using it to build something exciting. I don't know what it's going to be off the top of my head, but who knows?

[00:14:52] John Beech: I feel like fire's been handed down from the gods again. It's a really exciting time to be alive.

[00:14:56] Dani Middleton-Wren: Oh, wow.

[00:14:57] Karol Horosin: Yeah. Have you guys tested the new ChatGPT voice conversation feature? It's so scary. Actually today I've been preparing some research for a topic. And basically I was browsing the web and at the same time had it on and asked some questions so it can look it up when I'm reading some other stuff.

It is scary good. And also the voice is made realistic because it introduces pauses in speaking. You know, those ums, there's some static, so it sounds a little bit like a phone conversation. The realism is really scary, and it answers really well. Sometimes long, because, say, ChatGPT is quite verbose and likes to talk a lot, but when you ask it to be concise, it's scarily close to a conversation with human. I think they're rolling it out to Plus users.

[00:15:50] Dani Middleton-Wren: Wow.

[00:15:51] Cyril Noel-Tagoe: Maybe next podcast, we can have ChatGPT join us.

[00:15:55] Karol Horosin: We could actually have a few questions, back and forth with it.

[00:15:59] Dani Middleton-Wren: We should definitely do that. We've talked about ChatGPT a lot on this podcast. The next logical phase is to have, well, we could bring in two of the biggest guest stars of this season's podcast. We could have ChatGPT talking in Taylor Swift's voice.

[00:16:15] Karol Horosin: Well, don't forget about the third biggest star, which is Cyril.

[00:16:19] Dani Middleton-Wren: Yeah, of course, naturally.

[00:16:20] John Beech: I think the thing to watch for is that over the next... Year, there's going to be a lot more models available to us. So Facebook are in the running as well. They've got their open source LLaMA model, which is comparative to ChatGPT 3. 5. So it looks like GPT, GPT 4 is the leader at the moment from OpenAI.

But there are lots of companies training their own AIs in the background and working on ways to train open models so that companies, countries, etc. can have their own models that aren't constrained by this. So it's going to be really exciting to see these tools to disseminate through the business world and people find new ways of using them.

[00:17:01] Dani Middleton-Wren: Yeah. And how businesses start to get comfortable with using it across their teams as well. That'll be really interesting thing to see.

So, next up we are going to be discussing the 23andMe credential stuffing attack. So a couple of weeks ago it was revealed that millions of records taken from genetics testing site 23andMe were available for sale from an underground forum.

The company confirmed that the breach was genuine, blaming a credential stuffing attack. Because 23andMe users can opt into sharing their information with relatives also using the service, millions of users could be exposed in the attack. The attacks exfiltrated details, including ethnicity, with the first batch of records advertised as identifying members from Jewish backgrounds, which is concerning considering recent world events.

So Cyril, I'm going to start with you again. Would you be able to give us an overview of credential stuffing attacks and how they work?

[00:18:05] Cyril Noel-Tagoe: Yeah. So a credential stuffing attack is basically where an attacker will get a list of username and password pairs from a previous, often unrelated hack or data breach, and then they'll use a bot to test these username and passwords or the new websites or web service at scale by attempting to log in.

Now, if any of the username and passwords have been reused between the previous data breach and this new website, the attacker now gets access to those accounts. So an attacker can take maybe a list of maybe a million username and password pairs from an old breach, throw them in a new site, and let's say 1 percent of them have been reused, they've now got access to thousands of accounts.

[00:18:53] Dani Middleton-Wren: And so why is it then that 23andMe are blaming their users for the exposure? Is it because of, like you said, because they think that users are reusing passwords?

[00:19:07] Cyril Noel-Tagoe: Yeah, it's an interesting one because a credential stuffing attack only works if the credentials have been reused, but... in any cyber attack, there's never just one party to blame. I think it's always a bit more nuanced than that. And yes, on one hand, you can say that the users shouldn't have reused passwords.

And, you know, as a security community, we're trying to do a lot to educate people that you should have unique passwords for every site. But as a company as well, it's up for you to differentiate how you protect your user's data in spite of the user. There's not just one way of protecting against credential stuffing attacks.

And especially, we, I mean, we've done some research recently in our Death by a Billion Bots report, talking about kind of the loss of customer satisfaction from attacks, such as credential stuffing. If, as a company, your users accounts are getting breached, and then you're blaming them instead of kind of offering ways to protect them, that's not going to be very good for customer satisfaction either.

[00:20:06] Dani Middleton-Wren: No, I don't think I've ever seen any company totally blame the customers and their users before for a credential stuffing attack. Usually it's, okay, this attack has happened. Here's what you need to do. Rather than it saying, well, this is quite frankly your fault.

[00:20:22] Cyril Noel-Tagoe: I think part of it might be trying to reduce liability on their end. So, especially with, you know, regulations like GDPR, you know, you've got duty for your customers data. And by kind of shifting the blame, they're saying actually, and I think in their press release they said something like, there was no intrusion in our system.

But even, I think, as recent as, you know, a few months ago, data protection authorities such as the Information Commissioner's Office and others spoke about credential stuffing and about how big a threat this is, so it's not a new attack that, companies are getting blindsided by.

This is a known attack and the supply of leaked credentials is only growing. So companies really need to take some responsibility and do a bit better.

[00:21:08] Dani Middleton-Wren: Yeah, because like you said, it's going to come back to bite them in the bum when it comes to their reputation later, isn't it?

[00:21:14] Karol Horosin: Oh yeah. And this can make or break a company. That's what we see here as a result of a leak that's been based on internal data sharing. So matching relatives within a service, within a genetic testing service. But one thing I think that was pivotal to Facebook's market dominance was all those Cambridge Analytica scandals.

And they basically were heavily blamed for what happened as a part of legitimate data sharing with external companies, so. And I think Facebook is not the same ever since. It has decreasing user base and Meta is still strong with their other products they bought. But Facebook itself is just groups and marketplace.

[00:21:57] Dani Middleton-Wren: And mums.

[00:21:59] Karol Horosin: And mums.

[00:22:00] Dani Middleton-Wren: Yeah. But I think that speaks to the fact that younger people are perhaps more aware of the significance of the Cambridge Analytica scandal versus the generation above who are... maybe just don't necessarily recognize the significance and what that damage that could do to their own digital health.

That is a great point. It is, it's long term damage to your business, damage to your reputation, how your consumers are going to behave around your product. So another part of this is that 23andMe say that no genome data was leaked, but access to certain accounts does expose relatives, which is part of why anyone who obtains this data could identify Jewish members, for example.

So what risk are users of services like 23andMe under in cases like this?

[00:22:45] Karol Horosin: Oh, there's a plethora of risks and what 23andMe said is not entirely true. And that's part of the lawsuit because they claim that no genome data was leaked but at the same time they agree that the credentials to accounts were leaked, which allow access to DNA data, but I want to be, here a little bit calming to all the users.

I actually have a lot of experience in DNA analysis and what 23andMe does, they don't do full genome sequencing. They do what we call genotyping. They identify only variants of genes that are relevant for several diseases and stuff like finding ancestors or matching relatives. So, in no case, there was your whole genome leaked.

The reason for that is because full genome sequencing is, say, $300, $500, or even $1, 000. While, you know, a kit for 23andMe right now is $130. you can get 50 percent off sometimes, so %60. So they do partial DNA matching. But what happened there is very interesting, business and product wise, and we have a lot of executives here listening, and people that are responsible for products, and the main source of how big this leak was, was, 23andMe's DNA relative option.

So you could opt in, and you were very encouraged to opt in, to be matched with your potential relatives using the service. And this feature had some really heartwarming stories connected to it. People were finding lost sisters. You were able to find your biological parents. There were a lot of, you know, stories in the news of reunions after decades, this was great, but this feature previously also helped catch criminals, or even, I have a good story for you, and that's not a personal one this time. There was this case where actually through a very similar service, the user found that the doctor she consulted on her IVF is the father of her child.

And what actually happened is this doctor used his sperm to... perform IVF. And he had 50 to 100 kids over his years of practice. And there are several stories like that, actually. And those doctors are not longer practicing, because they clearly breached boundaries of ethics, so...

[00:25:27] Dani Middleton-Wren: And many other boundaries! I hope they're prison, perhaps. That's unbelievable!

[00:25:34] Karol Horosin: Yeah, so, there are good stories out of these features, but also stories like that where criminals were catched. And police sometimes use those services to maybe not find criminals, but find relatives of criminals criminals and in this way, identify who the criminals were.

So, really useful feature, but it had one huge drawback, it was that users were highly encouraged to use it, because it was a very public feature. That's a part of why people were using the service. So, you might have opted in not knowing that potentially hundreds or thousands of people will see your name and surname and will be able to connect to you.

So, a lot of the data that was leaked in this leak were not directly accounts, but also details of relatives of accounts that were hacked. This way, millions of people were exposed in this leak, but not necessarily all of them had accounts hacked.

[00:26:35] Dani Middleton-Wren: I'm absolutely flabbergasted. And just, you know, for the audience's benefit, Cyril and I were sat here with our mouths agog as Karol was talking through that IVF story. I cannot fathom that that's happened, but then, you know, we can, we've all heard horror stories similar to that effect.

[00:27:13] Karol Horosin: And while we're talking about those data sharing features, I think I'd like to put them in the context of other data sharing features that others have. Because we're talking DNA, that sounds a little bit abstract, but I mentioned that product people may be interested in that because many of the products have data sharing capabilities.

We have unintentional ones, like in ChatGPT, where users could saw conversations of other users. But also, ChatGPT has a feature where you can share the conversation with ChatGPT and you can expose potentially some of your data. And there's a lot of stuff... Facebook has what they have, you know, suggested connections.

And they based suggested connections, suggested friends, on your location data, your interests, so this can expose potentially your interest and location data. They made sure that's not the case, but they had more risky features like that before. Facebook had a feature where they could identify you on other people's photos.

So what you could do is you could have uploaded a photo of a person you saw in a coffee shop. And then identify this person by letting Facebook identify them. So they are not offering this feature anymore. Instagram, they had a feature where you could see all likes and activity of other people several years back.

[00:28:33] Dani Middleton-Wren: Mm hmm. I remember it.

[00:28:34] Karol Horosin: It was heavily used by stalkers.

[00:28:37] Dani Middleton-Wren: Yeah.

[00:28:37] Karol Horosin: Or maybe many people.

[00:28:38] Dani Middleton-Wren: Yeah, I couldn't say I used it myself, but...

[00:28:42] Karol Horosin: yeah. Some people have looked at this tab a lot.

[00:28:45] Dani Middleton-Wren: Certain people too much time on their hands.

[00:28:46] Karol Horosin: And there's a lot of stuff like that, you know, streaming services, their friend activity, fitness apps. So this is a reality of many products. In most of the products, it's not as serious as genetic and ethnicity data that may be used in a malicious form or by hate groups, but still it's a risk.

So, I think those cases, cases like 23andMe are really educational for people that design products.

[00:29:14] Dani Middleton-Wren: Yeah, this is why we're kind of talking about it today as well. As well as the leak itself, the significance of the credential stuffing attack, but if it's exposed people who are from like certain backgrounds, then like you said, it puts people at risk of being targeted by hate groups. So what can businesses like 23andMe do to protect their users? And what should users be doing to protect themselves?

[00:29:42] John Beech: I wanna jump in on the, is it the user's fault? And I don't think it can be the user's fault. So you sign up to these websites with the intent of getting some service, they're paying, these are paying customers in a lot of cases. It comes back to trust again. Like they're entrusting their credentials.

They use their password to that company to gain access to the service. From our perspective, when we're protecting customers, we're playing defense and we're looking at all of those logins and those attacks and we can proactively protect our customers, on behalf of their customers. That's an active job that the company is responsible for.

It's not for an individual to, like, what can the individual do? They can go and change their password on a regular basis. Are you likely to do that? Like four or five months down the line, probably not. Like, are you going to set a reminder on your phone? Oh, I need to go around on the website to check things.

Probably not. You're trusting that the information you gave those companies is going to be kept secure. So I think it the responsibility of those companies to keep an eye on credential stuffing attacks, to protect that data as best they can. To come back to what Karol was saying about the relationship mapping.

So, kind of like a social network, right? You'll be building all these relationships between users, building that data map, going back generations, that's incredibly sensitive data, especially as we've talked about hate groups as well, but like, there are governments around the world who, penalize and punish families, or they set immigration policy based on, where was your great grandparent born.

You know, if people can access that information and misuse it, they will, because it's part of their policy. So yeah, I'm horrified at the response from 23andMe, I think in that situation it's never the user's fault.

[00:31:20] Karol Horosin: Yeah. So what's important to notice here is that the, some product decisions were made to make things easier at the cost of security. And that's something that always has to be made. And 23andMe made the decision, even though we're handling medical data, we want to make the user journey a little bit easier by not forcing them to do a multiple factor authentication or by not having more complicated passwords that maybe are stricter than other passwords policies in other tools.

So I think by coming back to your question about what companies can do is to make sure what, when they are making product decisions, that they are weighing the risks accordingly, and what we've learned analyzing multiple cases in this podcast, usually two factor authentication is not a negotiable thing, even when you're handling mundane data that shouldn't harm anyone.

[00:32:20] Dani Middleton-Wren: Any data can be used maliciously.

[00:32:23] Karol Horosin: Yeah. So what, what companies can do and that 23andMe didn't do, according to a lawsuit even, because there is a class action lawsuit against 23andMe and what lawyers are demanding is up to $1, 000 per user in damages, which is massive. And one of the things that they say 23andMe didn't do was to properly communicate the extent of the breach and that this was something that their usersexpected.

And they, as Cyril mentioned, they did it probably to, decrease their liability. But the fact is that they didn't say exactly what data was leaked, what... Could be the extent of the leak. They just said it happened. So one question that the lawyers have in the lawsuit is what really was exposed?

And I think 23andMe really doesn't know, or maybe they can find right now, but if the credential staffing was done well and they had no protection measures, maybe someone just tested passwords from, you know, legitimate IPs. We probably could have noticed it in the historical data, but they didn't.

So, I think basic features we're talking about here all the time, which is, multi factor authentication, strong password policies. And one other thing we're mentioning all the time, which is training users on how to protect their data and also making right product decisions. So, you need to weigh financial benefit over security.

So, user journey when it comes to passwords. But also using features that will benefit your company, like the DNA relative. Do you warn users enough or do you convince them by all costs to sign up for the feature?

[00:34:08] Dani Middleton-Wren: And I think you hit the nail on the head there with, there's a strong chance here that they didn't incorporate those strong password features and two factor authentication from the off because they didn't want to deter the audience using the platform and they want to make it as accessible as possible.

And I suppose, you may not consider that data as valuable as, say, like, your bank details or, you know, data stored elsewhere, where you will have multifactorial authentication, and you'll keep it, you want it to be as secure as possible. Whereas with this information, you're like, okay, well, yeah, I just need a password.

I don't necessarily need it to be a strong password. But if you'd had more of a security by design approach, would that have protected customers and then to your point, John, it would enable 23andMe to have a bit more of a case to say, look, we gave you the option to protect yourself.

Rather than them just saying, it's your fault you didn't have a good enough password.

[00:35:06] Karol Horosin: Yeah, so the message out of this is, when it comes to ChatGPT, other tools. Watch out where your data goes. One cautionary tale to sum it up for users, 23andMe in 2018, some of their shares were acquired by GlaxoSmithKline which is a big pharma, and they entered a partnership in which 23andMe shared their patient's data for drug research.

And that wasn't stated as a possibility before. It was, with a really small print and you could opt into it. But, they finished the partnership with going public, the U. S. Stock Exchange. So, they clearly aimed for profits there.

You cannot blame them. They used their data to their advantage, but, there's a huge risk and a huge temptation to use those services, but, companies have to make money, and you have to always ask yourselves, like John mentioned, what is the interest of the company? What are they making money on?

And if they're making money on your data, you should be cautious about what you put

[00:36:10] Dani Middleton-Wren: there.

[00:36:11] John Beech: The perfect answer from a company after breach is obviously admitting fault, but we're able to say we've got an active security team who's investigating the issue or investigating attacks. That's a reassurance to users that they were actively trying to defend against such attacks.

And this is one that got through that they weren't able to protect and they learned lessons from it. Another common technique that I've seen is where inactive users get prompted, they get sent an email. So the MFA kind of routes again, so it's like, oh, why is this user logging in after six months?

Maybe we should send them a reminder email to check they are actually a real user. They are trying to log into the websites, not just a credential stuffing attack. So there are relatively simple things that companies can be doing to protect their users, that just didn't occur in this case.

[00:36:56] Cyril Noel-Tagoe: I think one of the things that this hack has shown is a lack of adequate detection. There's multiple facets to security and the fact that 23andMe became aware of this due to a post on a hacker forum means that they didn't have the internal detection tools required, or at least they weren't able to capture this when it happened without some external notification to them as well, which I think is the case in a lot of these types of attacks.

Abd it serves to reduce use of trust even more, in that, yeah. You didn't actually find this till someone told you that we had done this to you.

[00:37:31] John Beech: When you've got an attack like this, because it's scripted, there's often an automated pattern that you can spot. I think the sort of systems we develop would have very quickly seen commonality in the attack pattern and done something to block it. And not only do you have the advantage of blocking it, but you also know that you're under attack and then you can start to look more deeply at why is there suddenly a spike in logins, that kind of active monitoring of your own estate to see if there's any breaches occurring.

[00:37:58] Dani Middleton-Wren: OK, let's move on to our attack of the month. So just last week we had Amazon Prime Day and next month we have Black Friday. So with that in mind, our attack of the month is one that is likely to exploit such events.

Freebie bots. Cyril. As always, can we start with you? Would you like to explain exactly what freebie bots are and how they exploit things like hype drops to their advantage?

[00:38:34] Cyril Noel-Tagoe: Yeah, so I mean, they don't exploit hype drops, which is the difference with freebie bots. So freebie bots are kind of special kind of scalper bots that target discounted or mispriced items. So the goal of a scalping attack is to purchase a product and then resell that for a profit and typically a scalper bot would purchase a hype product or high demand product and capitalize on that demand to sell it for a profit. So when we talked about Taylor Swift's tickets in the past, those aren't cheap by any, you know, stretch of the imagination, but they could be sold for a massive markup. Now, a freebie bot kind of flips that equation on its head a little bit, if you buy something that's already discounted, you can sell that for its normal retail price and you can still make a profit.

So that, and that's basically how freebie bots operate, if an item is discounted or if it's been mispriced. So if there's a pricing error by the retailer as well, they can pick them up and then sell them for a profit in that way.

[00:39:38] Dani Middleton-Wren: Okay, so in instances such as Prime Day, where you've got items that are heavily, heavily discounted, and Black Friday will be a similar situation, Cyber Monday or Cyber Week, etc. So, are you seeing discussion in forums with people targeting certain items saying we know this item is going to be discounted we're gonna buy loads of these and sell them for a profit, sell them the RRP? Or is it a case of people just have a look on the day It's less organized than that. How does that work?

[00:40:08] Cyril Noel-Tagoe: So you can set up the bots either to look for items with a specific item code. So like a generic scalper would do. So in cases like Prime Day when you know that certain items are going to be discounted, you can target them. But you can also set up the bot just to look for items with a discount over a certain amount.

So anything over 80 percent off, you want to buy. And we track a few of these bots and their purchases, and some of the items that we see coming through that they've purchased, I'm just like... that's the most random item in the world, but I guess...

[00:40:44] Dani Middleton-Wren: Like what?

[00:40:45] Cyril Noel-Tagoe: Aha, there's been everything from... so at one point one of the bots I've seen purchased lots of standing desks randomly.

And actually I think they posted on their Twitter talking about how good the posture of their users was going to be following the standing desks they'd been able to purchase. Typically we see a lot of beauty items, sunglasses, just lots of random items which are just discounted.

[00:41:07] Dani Middleton-Wren: And where do they tend to go for sale? Is it just on general resale sites?

[00:41:12] Cyril Noel-Tagoe: Oh yeah, everything from your eBays to your Facebook Marketplace, anywhere you can resell. A lot of people will just resell it locally as well, like a kind of a garage sale type.

[00:41:21] Dani Middleton-Wren: Old school.

Very analog. What items would you expect to see being targeted by freebie bots this year? Have we got the 2023 Furby or the Tamagotchi equivalent?

[00:41:34] Cyril Noel-Tagoe: I mean, one of the interesting things that not just freebie bots, but retail bots in general, have been going for recently are just water bottles, which is like tumblers, which has been really interesting to see. And I think there's... this brings up like an important point that it's very hard for organizations to predict in advance what a freebie bot is going for.

With a normal scalper bot you can predict, okay, we've got a hype release, we know this is going to be targeted and we can try and protect against it. With a freebie bot, yes, you can try and protect whatever you've got a big sale going on, but you might have some... Pricing error, and that happens and you get hit.

It's a lot, lot harder to predict in advance.

[00:42:14] John Beech: Cyril, does this only work on, bulk items? Because in a mature marketplace for sort of high value items, occasionally you'll get someone who just lists something and it's like, I don't know, 40 percent cheaper than the kind of going rate for that item. But that quickly rebalances because, like, once that item's off the market, it basically goes back on resale at its normal price.

So do these freebie bots tend to target bulk items?

[00:42:38] Cyril Noel-Tagoe: That's a good question, and I'm not sure how the kind of the economics play into it, kind of at that level. I would say that if the item is one that will sell out, that gives them an advantage that they can price at whatever price they want and we kind of move into kind of your traditional scalping territory where they can mark it up because, especially if there is still some latent demand for it, if it doesn't, sell out they can even sell it slightly below what the normal price is once it goes off sale and try and steal the customers that way.

[00:43:09] John Beech: Because my thoughts were the other side of this is Christmas, so, toy buying on the run up to Christmas is usually one of those areas where it's like, you try and corner the market, you hold the stock for like one or two months, and then as you get to Christmas there's this latent demand that basically everything you bought two months before is suddenly, like, jacked up in

[00:43:27] Cyril Noel-Tagoe: price.

Yeah, and we definitely see that with traditional scalping. We've seen some items being purchased, which again, just make no sense, like gingerbread houses being purchased, it's like, why would someone purchase, but then again, it's a, can you get the item before the hype, like preview the hype and then once it's hyped and maybe you've cornered the market, you can profit that way.

[00:43:48] Dani Middleton-Wren: And is that... when you say corner of the market and the hype before it becomes a hype, do you mean that if I wanted to buy a bunch of gingerbread houses now, that would be me saying, right, I know they're going to be really popular in a month and people, there's going to be a load of demand for it.

I'm going to buy them all at a discounted price now, resell them at future dates at RRP. Is that what you mean?

[00:44:08] Cyril Noel-Tagoe: Yeah, whether or not they're discounted now, buy them now and sell them. I think with some things, especially seasonal things, you can try and buy them out of season when they're discounted and save for the season depending on obviously if it's something that has an expiry or not. I can see people using freebie bots maybe post Christmas to buy old Christmas related items and save for next Christmas.

[00:44:32] Dani Middleton-Wren: Like Christmas cards and such.

[00:44:34] John Beech: So is this actually a problem for companies or is this just good business?

[00:44:39] Cyril Noel-Tagoe: That's a very good question. So it depends on what it's doing to your site, right? With the scalper bot problem, there's essentially two ways it becomes a problem for a company. One is that it causes degradation to your site due to the traffic. And the other is that it reduces customer satisfaction because they can't get what they want.

And for freebie bots, the customer satisfaction bit might not be as heavy because, you know, unless they're exhausting all the stock, the customer can still get what they want, although they might not be able to get the discount that they want, so if I'm a customer and I'm looking forward to this item and it's finally gone on sale and I want to buy it and I can't because the freebie bots have all bought it, there's still that risk, and then the kind of site degradation, if enough bots are hitting your site, that can bring it down.

I think there's a third risk with freebie bots which you don't have with your typical scalper bots and that's mispriced. If there's an accidental pricing change on your website and the bots hit it because they're monitoring, right? So they don't they might see that change before you see that change and exhaust your stock. You've lost the revenue that you'll have made from that item.

[00:45:46] Karol Horosin: Yeah, what happens, and was really interesting to me when I read about this is freebie bots not often target discounted items, but free items. So, and they become free by human error or automation error. So imagine you have a online store and you just add an item to your offer. And you save it, but you forgot to add a price, or you wanted to update the price later, after you save the description.

And in the time between you save the description of the item and it gets added to the store, and then you change the price to the proper one minutes after, someone can use the bot to buy all of your stock, and then you won't even notice what happened, because, say if you're using a service like Amazon Fulfillment, they will ship it out automatically, you won't have to ship it and you will lose all your stock and pretty much sell your items for free.

So, this is really, really dangerous. And I even saw, you know, regular people with questionable ethics looking for those bots on Reddit. They were like, oh yeah, just buy this bot, buy some proxy service and you'll get one free item from Amazon per day.

[00:47:00] Dani Middleton-Wren: That brings us nicely to the legality of using freebie bots. Is it black and white? Is it a case of, yeah, you shouldn't be using these? Or is it a case of it's just unethical?

[00:47:10] Cyril Noel-Tagoe: I think it's a gray area at the moment. There's no legislation that prohibits using a freebie bot, as far as I'm aware. There's obviously going to be ethical considerations to buying mispriced items. So if you know that the retailer did not intend for something to be this price, and you are purchasing at that price, and even were to use an automation to purchase all of the items at that price, from an ethical point of view, that's definitely on the wrong side. From a legality point of view, I'm not sure. And I think maybe that's something that needs to be discussed, similar with ticket scalping by, you know, governments.

[00:47:49] Karol Horosin: Yeah, but I think that police is listening, because I don't know if our listeners heard, but in Cyril's background there was a police siren. So, you know, maybe they're trying to do something about it. I think anyone listening would agree that it's not fair, but I think many people would find justifications for it.

Like, you know, you're stealing from large corporation, not from an individual. So, people have different ethics, outlooks at this, sadly. I think what is lacking in those websites and services is usually when you have a drop, you know, you have to protect. So, businesses are just unprepared because, if you're dropping a pair of Nikes, you know that this particular product website has to be rate limited.

You have to have a queue or something in, something else in front of it. But when we're talking, simple pricing errors, especially when you have a system for merchants, not necessarily a whole web store, or actually any other cases, if you're not watching the whole traffic and what's happening, you will miss those tags and many businesses are not analyzing what's going on.

[00:49:02] Dani Middleton-Wren: Is there any way that businesses can predict or put something in place to help them prevent freebie bots taking advantage of their website?

[00:49:11] Cyril Noel-Tagoe: I think the prediction part is hard. I don't think this is something that you can predict. I mean, as Karol said, you know, this isn't a hype drop that you're expecting to be hit. So it becomes, again, a matter of detecting. So on the item level, is there a spike in the amount of purchases of this item? That was maybe unexpected, and then looking into that.

[00:49:32] Karol Horosin: Yeah, and also this has to be done carefully as well in, the case of retailers, because in several countries there are laws that state that, you know, clicking and making a purchase is actually making a legal contract between a seller and a customer. So, detection before the sale is crucial.

So on the level of knowing that someone has malicious intent or did some actions that, because in many countries, especially probably in the EU where there's heavier legislation, the click means similar thing to signing a contract, which we've actually, you know, talked recently, even, that sending an emoji can be considered agreeing to a legal statement. That's actually, I think, something we were trained on at Netacea. That if you say okay to a customer on Slack, this means this can be later in court considered agreeing to someone's terms. So, even clicks of users on websites mean contractual repercussions.

[00:50:37] Dani Middleton-Wren: That was going to be my next question. Does it depend on the emoji used? Does it have to, you know, if you send a cat, for instance, a dancing cat in response to a customer, will that be perceived as an OK, as a positive sign?

[00:50:52] Karol Horosin: I think these are going to be really funny trials and cases. Because I don't know, I know that some people created an emoji with my face in the company, and I'm not sure what that means, so...

[00:51:05] John Beech: I think there's this concept of regularity in law if... so I think the example given was if you post an envelope, then by all things normally considered, you'd expect the envelope to be delivered to the company that you posted it to. So having proof that you posted the envelope is enough to say that they should have got that.

So I wonder in the future if we can have tests for what's the common meaning of emojis. Could we use like, a standard trained model to be like, no, in that context, a happy face means X. And we've got the data, we've sampled 10 million interactions to establish what the baseline meaning of that response means.

[00:51:43] Dani Middleton-Wren: Absolutely.

[00:51:44] Cyril Noel-Tagoe: mean, you were talking earlier about multimodal AI, I don't think we need voice and all that stuff for ChatGPT, I think we need emoji intelligence.

[00:51:52] Dani Middleton-Wren: We do. There must be some sort of sample data that we can pull from in certain age ranges or maybe just across Slack. Like this emoji is used most frequently to denote celebration. It's Karol's face.

[00:52:08] John Beech: The best part is they're already encoded. So you can have conversations in emojis. It's quite fun. You can configure them to only reply in emojis.

[00:52:17] Dani Middleton-Wren: Wow, that does sound extremely fun. Well, thank you all very much for joining us for today's cybersecurity session. It has been a brilliant conversation. We've all learned a lot, a lot more than we probably bargained for when it came to the 23andMe DNA conversation. Thank you so much, Karol, for those terrifying anecdotes.

My jaw dropped about three times. Thank you all for joining us. Thank you, John, for your podcast debut. And hopefully it's been a fascinating conversation for everyone listening. If you would like to follow us on Twitter, you can find us @CyberSecPod, and if you would like to send any questions to the podcast, to the panel, then you can email us at podcast@netacea.com. Thank you all very much and speak to you all next month.

Block Bots Effortlessly with Netacea

Book a demo and see how Netacea autonomously prevents sophisticated automated attacks.

Book

Related Podcasts

View All Podcasts

Navigating Cybersecurity Leadership w/ Simon Brownhill, DWL Partners

Podcast

S03 E06

Explore cybersecurity leadership with Simon Brownhill's journey from Navy engineer to CISO and valuable insights into risk management.

Listen now

Open-Source Security Frameworks w/ OWASP Board Member Sam Stepanyan

Podcast

S03E05

Discover the influence of OWASP on the security world. Learn from OWASP Global Board Member and London Chapter Leader Sam Stepanyan.

Listen now

Dr. Christoph Burtscher (AI Researcher & Author)

Podcast

S03E04

Join us for an engaging discussion on how AI is reshaping cyber defense. Learn about the shift from human-led security to machine-led defenses.

Listen now

View All Podcasts

Block Bots Effortlessly with Netacea

Demo Netacea and see how our bot protection software autonomously prevents the most sophisticated and dynamic automated attacks across websites, apps and APIs.

Agentless, self managing spots up to 33x more threats
Automated, trusted defensive AI. Real-time detection and response
Invisible to attackers. Operates at the edge, deters persistent threats

Protecting Privacy in ChatGPT, Credential Stuffing Strikes 23andMe, Freebie Bots

Speakers

Danielle Middleton-Wren

Cyril Noel-Tagoe

Karol Horosin

John Beech

Episode Transcript

Block Bots Effortlessly with Netacea

Related Podcasts

Navigating Cybersecurity Leadership w/ Simon Brownhill, DWL Partners

Navigating Cybersecurity Leadership w/ Simon Brownhill, DWL Partners

Open-Source Security Frameworks w/ OWASP Board Member Sam Stepanyan

Open-Source Security Frameworks w/ OWASP Board Member Sam Stepanyan

Dr. Christoph Burtscher (AI Researcher & Author)

Dr. Christoph Burtscher (AI Researcher & Author)

Block Bots Effortlessly with Netacea

Book a Demo