I've been doing GDPR stuff at the day job.
tl;dr: Nothing about this is hard ... unless your business model is to abuse your customers' personal data. Then it might be hard.
I routinely see the loudest complainers about the onerous nature of GDPR compliance suddenly get vague or stop posting when you ask for details of precisely what bit is so hard for them in particular. So far, it seems a safe assumption that they're abusing personal data, and they know they're abusing personal data. Perhaps one day a clear exception will show up.
Fundamentally: REGULATORY COMPLIANCE IS NOT OPTIONAL. Complaining on Hacker News won't make it so.
There are no roving gangs of GDPR inspectors, waiting for you to slip up so they can find you 20m EUR. This year, in fact, I would say that the most important thing is to do your sincere best. That alone will put you in the top 5% of companies.
Actual GDPR compliance in practice for me so far involves fairly mundane dealing with technical debt. You need to approach this as "we have run up a pile of technical debt, we need to clear it down."
The threat model we're working to is: "querulous upset customer sends GDPR Nightmare Letter, will complain to the ICO if we don't fulfil our obligations."
The GDPR "Nightmare" Letter is not that nightmarish — and it makes a lot of sense if you read it as A List Of Technical Debt You Can Finally Get The Mgt. To Pay For. Because, you know, it actually is. That letter is a blessing.
Despite the increasingly fevered GDPR horror fan-fiction favoured by American commenters, there's no reason to panic — but there is excellent and useful material to get management to finally pay for you to do things properly. I've greatly enjoyed having a GDPR stick to wave and say "no, actually, it's illegal for us not to do this right" or saying "no" to marketing when they think they're being clever.
I must note — we're doing this by the seat of our pants, because, like most businesses, we didn't get into the heavy-duty slog of breaking down our GDPR issues until the last moment either. There's probably better ways to do lots of this, and important stuff we haven't thought of.
The universal GDPR experience is "I never knew just how many systems we had." Someone's going to need to make a proper list.
Our business's interest is to keep our users happy and thinking well of us and keep them as customers for decades. I am delighted to note that the techies are very onside with the GDPR, and what it means in terms of your responsibility as a technologist for the things you build.
The GDPR effectively mandates that you make any database with personal data in it easily redactable. Every pile of data containing personal data needs to be easily redactable — or it needs to be deleted as absolutely soon as possible. Make redaction easy for yourself.
If you decommission an application — you don't keep the final database dump around "just in case." Backups containing Personal Data also need to be deleted as soon as possible.
(I've personally taken great joy in killing a bad idea by saying "certainly, we can save that for you! I'll just tell the data protection officer that your unit's accepting redaction responsibility, and ... oh, you want to delete it? I'll get right on that.")
We've just realised that some applications will need to run (at least) two separate databases — one handling PD and one handling mundane data. Responsible businesses already handle credit card numbers separately, for instance — but you need to do this with any PD.
When we do a new project, one of the handover steps before it's allowed to go live is a GDPR assessment. Note that staff data counts as PD, e.g., employee actions — it may or may not be redactable, but you should definitely note it.
Dev/stage DBs are typically a snapshot of live. PD in these counts! We've had a redaction where we had to redact the dev and stage databases just as we did on live, 'cos refreshing dev and stage from live was very long-winded. (The proper solution is, of course, to make refreshing dev and stage from live easier.)
Apache logs count as PD — they contain IP numbers, and probably login cookies. So if you want to analyse these, do it early, so you can throw the PD away and keep only the impersonal aggregate. We now keep these for 30 days on the server and in our Kibana — we're pretty confident that's legit sysadmin/security usage — and need to work out what to do with them after that. (Ops is heavily advocating Just Delete It.)
So far the only real pain point has been a redaction request for data in our Magento — and at least half of that is because the company we thankfully outsourced the horrible pile of trash to are not so great sometimes. I would be delighted if the business were to decide Magento was too much trouble GDPRwise.
All of this is sensible and obvious with a moment's thought. But the thing is — this is technical debt you had piling up for the past two years anyway. And were ignoring all that time. Personal data is a radioactive toxic waste pool, and must be handled like one.
Everything in the GDPR is stuff you should have been doing anyway, and you know it. That's precisely why the apocalyptic GDPR fanfic is so weird. They're going "BUT WHAT IF YOU HAVE TO DO REDACTIONS FOR THE MARTIANS" and I'm going "dude I've literally been doing GDPR and it's easy if you're not a dick."
I posted the above to LWN and got a few responses. Main difficulty is how git should handle the likely GDPR redactability of email addresses, which is a tricky one.
So! What have you been doing? Is there anything I've missed?
Apocalyptic GDPR horror fanfic is off-topic and liable to be deleted. Looking for your practical on-the-ground issues.
Update: since I wrote the above, our internal counsel advised us that keeping logs 30 days is almost certainly OK for sysadmin purposes, and anything over that needs a damn good reason. I've applied this swingeingly. I am not your lawyer, so if this is a problem you have, then call one, and don't rely on this post for serious purposes as any more than opinions of a non-lawyer.