GitHub Arctic Code Vault is a GitHub’s data repository that preserves a snapshot of every active GitHub public repository as of 02/02/2020 in the Arctic World Archive (AWA) for future generations. The data is stored in a “very-long-term archive” facility located in a decommissioned coal mine deep in the permafrost of an Arctic mountain close to the North Pole.
Now, it turns out Arctic Code Vault is storing sensitive patient medical records from multiple healthcare facilities. The private data leaked on GitHub repositories last year prior to taking a snapshot and are now part of an open-source collection bound to last for 1,000 years.
Though this patients’ personally identifiable information (PII) is protected by the international copyright law and regulations, it will be very hard to locate and remove the sensitive data now archived deep in the permafrost.
Because of its popularity, GitHub has been by anyone for all kinds of reasons, including by bad actors for hosting malware and leaked data like passwords and API keys that, of course, shouldn’t be on GitHub.
This week, the story took a surprising twist. Multiple medical facilities issued privacy incident and HIPAA breach notices as a result of the data leak at Med-Data PII, revenue cycle management solutions company. It turners out, one of Med-Data former employees uploaded private patient records to GitHub around September 2019. After this, the data made its way into the Arctic Code Vault historic collection.
In August 2020, Dutch researcher Jelle Ursem and Dissent Doe of DataBreaches.net had reported ten data leaks on GitHub containing medical records of 150,000 to 200,000 patients.
But only now, on March 31, the impacted patients were notified:
“Impacted covered entities whose patient’s data was affected were notified on February 8, 2021. Letters were mailed to impacted individuals and applicable regulatory agencies on March 31, 2021,” states Med-Data in the incident notice.
The leaked information may have included an individual’s name, physical address, date of birth, Social Security number, diagnosis, condition, claim information, date of service, subscriber ID (or Social Security Numbers), medical procedure codes, provider name, and health insurance policy number.
Med-Data consequently asked GitHub to remove the sensitive data from Vault.
“We do not know what transpired after that, although there had been some muttering that Med-Data might sue GitHub to get the logs,” said Ursem and Doe in a report published April 1st, which the researchers said was not an April Fools’ Day joke.
“We hope that GitHub cooperated with Med-Data, but we raise the issue here because we will bet you that many developers and firms have never even considered what might happen that could go so very wrong,” the researchers concluded in their report.