reddragdiva: (geek)
[personal profile] reddragdiva

I've been doing GDPR stuff at the day job.

tl;dr: Nothing about this is hard ... unless your business model is to abuse your customers' personal data. Then it might be hard.

Fundamentally: REGULATORY COMPLIANCE IS NOT OPTIONAL. Complaining on Hacker News won't make it so.

(I routinely see the loudest complainers about the onerous nature of GDPR compliance suddenly get vague or stop posting when you ask for details of precisely what bit is so hard for them in particular. So far, it seems a safe assumption that they're abusing personal data, and they know they're abusing personal data. Perhaps one day a clear exception will show up.)

There are no roving gangs of GDPR inspectors, waiting for you to slip up so they can find you 20m EUR. This year, in fact, I would say that the most important thing is to do your sincere best. That alone will put you in the top 5% of companies.

Actual GDPR compliance in practice for me so far involves fairly mundane dealing with technical debt. You need to approach this as "we have run up a pile of technical debt, we need to clear it down."

The threat model we're working to is: "querulous upset customer sends GDPR Nightmare Letter, will complain to the ICO if we don't fulfil our obligations."

The GDPR "Nightmare" Letter is not that nightmarish — and it makes a lot of sense if you read it as A List Of Technical Debt You Can Finally Get The Mgt. To Pay For. Because, you know, it actually is. That letter is a blessing.

Despite the increasingly fevered GDPR horror fan-fiction favoured by American commenters, there's no reason to panic — but there is excellent and useful material to get management to finally pay for you to do things properly. I've greatly enjoyed having a GDPR stick to wave and say "no, actually, it's illegal for us not to do this right" or saying "no" to marketing when they think they're being clever.

I must note — we're doing this by the seat of our pants, because, like most businesses, we didn't get into the heavy-duty slog of breaking down our GDPR issues until the last moment either. There's probably better ways to do lots of this, and important stuff we haven't thought of.

The universal GDPR experience is "I never knew just how many systems we had." Someone's going to need to make a proper list.

Our business's interest is to keep our users happy and thinking well of us and keep them as customers for decades. I am delighted to note that the techies are very onside with the GDPR, and what it means in terms of your responsibility as a technologist for the things you build.

The GDPR effectively mandates that you make any database with personal data in it easily redactable. Every pile of data containing personal data needs to be easily redactable — or it needs to be deleted as absolutely soon as possible. Make redaction easy for yourself.

If you decommission an application — you don't keep the final database dump around "just in case." Backups containing Personal Data also need to be deleted as soon as possible.

(I've personally taken great joy in killing a bad idea by saying "certainly, we can save that for you! I'll just tell the data protection officer that your unit's accepting redaction responsibility, and ... oh, you want to delete it? I'll get right on that.")

We've just realised that some applications will need to run (at least) two separate databases — one handling PD and one handling mundane data. Responsible businesses already handle credit card numbers separately, for instance — but you need to do this with any PD.

When we do a new project, one of the handover steps before it's allowed to go live is a GDPR assessment. Note that staff data counts as PD, e.g., employee actions — it may or may not be redactable, but you should definitely note it.

Dev/stage DBs are typically a snapshot of live. PD in these counts! We've had a redaction where we had to redact the dev and stage databases just as we did on live, 'cos updating dev and stage was very long-winded. (The proper solution is, of course, to make updating easier.)

Apache logs count as PD — they contain IP numbers, and probably login cookies. So if you want to analyse these, do it early, so you can throw the PD away and keep only the impersonal aggregate. We now keep these for 30 days on the server and in our Kibana — we're pretty confident that's legit sysadmin/security usage — and need to work out what to do with them after that. (Ops is heavily advocating Just Delete It.)

So far the only real pain point has been a redaction request for data in our Magento — and at least half of that is because the company we thankfully outsourced the horrible pile of trash to are not so great sometimes. I would be delighted if the business were to decide Magento was too much trouble GDPRwise.

All of this is sensible and obvious with a moment's thought. But the thing is — this is technical debt you had piling up for the past two years anyway. And were ignoring all that time. Personal data is a radioactive toxic waste pool, and must be handled like one.

Everything in the GDPR is stuff you should have been doing anyway, and you know it. That's precisely why the apocalyptic GDPR fanfic is so weird. They're going "BUT WHAT IF YOU HAVE TO DO REDACTIONS FOR THE MARTIANS" and I'm going "dude I've literally been doing GDPR and it's easy if you're not a dick."

I posted the above to LWN and got a few responses. Main difficulty is how git should handle the likely GDPR redactability of email addresses, which is a tricky one.

So! What have you been doing? Is there anything I've missed?

Apocalyptic GDPR horror fanfic is off-topic and liable to be deleted. Looking for your practical on-the-ground issues.

(no subject)

Date: 2018-07-31 07:24 am (UTC)
ewx: (Default)
From: [personal profile] ewx
Any thoughts about what application and service developers can do to help (or at least, not hinder)? The applications I work on are fairly general-purpose but (i) some generate things like web swerver logs and other records that capture information about who did what, and (ii) our customers will certainly use them in ways that touch PD, some of it very sensitive, and not necessarily in ways we predict.

(no subject)

Date: 2018-07-31 10:41 am (UTC)
xtina: A mini house with a combo lock and chain. (security)
From: [personal profile] xtina
I love every part of this post unreservedly. Signed, a security engineer.

(no subject)

Date: 2018-07-31 06:47 pm (UTC)
fluffymormegil: @ (Default)
From: [personal profile] fluffymormegil
An actual experience I am having here:

You have to find all the "backup" tables left over in the schema from when someone did a bulk UPDATE or DELETE FROM that they were worried might not go perfectly smoothly, then didn't bother to DROP TABLE after everything went without a hitch.

Including the ones that were created ten years ago which contain data about the customers of a client you no longer deal with.


Date: 2018-08-09 08:52 pm (UTC)
From: (Anonymous)
You also have to find those entire backup *databases* meant for QA but containing unencrypted, unblinded complete dumps of customer data, some of which will just be database files detached for years from actual running db instances, some of which might be on unmounted filesystems (but, no doubt, not in any way *protected* ones) or scattered around on random USB disks floating aroud the office. I have seen systems which were perennially short of disk space with heavy complaints about this where 90% of the disk space turned out to be consumed by junk like this kept around "just in case". At least GDPR compliance might force people to find useless radioactive crap like this and deal with it at long last.

(Yes, I saw exactly this, and worse, in my last job. Thankfully this was only high finance, not medical data, and it would be hard to do more damage with that than the banks themselves did on their own without assistance.)

-- Nix

(no subject)

Date: 2018-08-01 04:10 am (UTC)
tcpip: (Default)
From: [personal profile] tcpip
> Apocalyptic GDPR horror fanfic is off-topic and liable to be deleted.

Goddamn bait :p

Seriously good article. And congrats on being on LWN.

(no subject)

Date: 2018-08-01 02:46 pm (UTC)
lovingboth: (Default)
From: [personal profile] lovingboth
Presumably - this translates as 'I hope' - there is a limit of reasonableness in some of this.

So.. backups of event attendee's data exist on DVD+R discs that have other material. They might even be readable discs. Is there a need to find them all, read them, delete that data and nothing else that I might want to have, make a new backup, then destroy the discs?

Or is that unreasonable? No-one is reading or processing the data.

On a scale of 'fine' to 'call the lawyers now', how is <> ?

July 2018

29 3031    

Style Credit

Expand Cut Tags

No cut tags