2019-2020

Writing an Algorithm to Audit Fortune 500 Privacy Policies

Assessing the accessibility and readability of the Fortune 500's privacy policies.

Summary

To help inform the ongoing work being done by BEACON, a company I co-founded, I wanted to do an inventory and assessment of all the privacy policies of Fortune 500 companies.

To do this, I used a combination of Apple Script, Apple's Automator utility tool, Spider web scraper and Numbers to develop a multi-step algorithm to capture, inventory, audit and record privacy policy data for all 500 companies.

How it Worked

Create a List of Fortune 500 Companies + Website Links
First the system would visit the Fortune 500 list and capture the names, ranking, website links and available labor market and financial data for each company. This output an ordered CSV list.
Pull All Links from Each Company Website
Next, the system would pull the website URL for each company from the CSV file and use Apple Automator to scan the website and create a list of every link on the site.
Filter Link List to Identify Privacy URL's
This list of links pulled from each company website was then run through a filtering mechanism that would identify and select only links that contained "privacy" or "policy" related keywords.
Update CSV File with Companies' Privacy Policy Links
The master CSV file was then updated with the resulting Privacy Policy URL's
Capture Privacy Policy Page Content
Next the system iterated through each company's privacy policy links, using a custom Apple Script to open the privacy URL in Safari browser, toggle on the text-only "Read" mode, copy all of the content to the clipboard and then automatically generate an archived text document with the resulting content.
Assess Privacy Readability with Hemingway App
The content from each privacy policy was then programmatically moved into the Hemingway App website and the resulting readability assessment data was captured with a custom automation method. The readability data was then added to the CSV file and a follow-on cleanup script was run to refine the new data updates.
Identifying Errors & Cleanup
Finally, once all of the data had been consolidated and organized in the master CSV, the system ran a final check through each entry and identified any items that had not successfully captured the desired data results.