When The Guardian broke the story about the NSA demanding ‘telephony metadata’ from Verizon, a new word was introduced into the public lexicon. In the world of enterprise data management, metadata gets a bad rap. It’s generally perceived as a pain in the ass– something that must be tended to, like a perpetually leaky tire or cleaning up your room. The cost of bad data can be massive, but you’ll likely never know it as relatively few organizations possess the skills to understand the behaviours of data.

Taking a trip back in time provides historical perspective: since the dawn of library science at the Library of Alexandria in Egypt, where Callimachus in the third century BC conceived of the first bibliographic system (Pinakes), metadata developed as a way to catalog key elements of printed works and make them easily searchable. Unfortunately this also introduced an unintended consequence of divorcing metadata from data.

The next significant development in metadata was two millennia later. In 1595 AD, Johan van der Does of Leiden University published Nomenclator, the first definitive publication of library holdings. While this index was fairly crude, it took a few thousand years to arrive. Next up was Melville Dewey who created The Dewey Decimal System, to organize all knowledge into ten main classes (further subdivided into ten divisions, each division into ten sections). This approach allowed for infinite hierarchy. Other systems followed, such as the Universal Decimal Classification and Library of Congress Classification.

Fast-forward a few decades: libraries, archives, bookstores and other repositories of knowledge are filled to the brim with card catalogs and the like. Tremendous human effort was dedicated to manually draft, distribute and maintain indices about their collections. What to do with these massive storehouses of index cards?


Thankfully the information age helped to solve the real estate problem by creating digital archives of metadata that could be searched. Beginning in the late-1960s, the OCLC took the lead to centralize the digitization and storage of metadata for all types of content. Content had become completely divorced from that which it describes– metadata and data had undergone a bifurcation, simultaneously advancing society and holding it back. If I was interested in, say, a book about musical scores and want to know more about New York, New York, I might be SOL. Searching by “Sinatra” might help me get what I want, but the limitations of searching metadata remained a function of that which was indexed.

Then the Internet happened. Moore’s Law took root and processing power went exponential, storage costs dropped like rocks. The cost to purge data became greater than to maintain. Bookstores, newspapers, libraries and anything dead tree oriented faced irrelevance. After a long separation, metadata was reunited with data, but not from the world of library science. Google Books and Amazon’s Search Inside are great examples of bringing content together with data, allowing users to simultaneously perform full text searches and query metadata. Contrast this with the world of geospatial data, where metadata remained off to the side, divorced from content.

Ok, what does this have to do with domestic surveillance? With the NSA demanding telephony metadata from Verizon, President Obama assuring Americans that nobody is listening to your phone calls, and the NSA gearing up to re-implement metadata collection, what exactly is the big deal? In short, this metadata is more valuable than the data itself. Yep, one person’s metadata is another person’s data. Relationships, travel habits/patterns and very personal matters can be inferred. But don’t take my word for it—see how an Australian journalist crowdsourced analysis of his metadata.

My point here isn’t to critique government (though I might!), but illustrate the power of metadata. The NSA surveillance issue is case study #1 in how metadata plays a critical role in surfacing actionable insights. It’s been taken from wonky to pedestrian in two years and hopefully future uses of metadata cease using the word in quotes. I’m working on another post that dives deep into business value created by metadata, furthering my mantra of one person’s metadata is another person’s data. Stay tuned and I welcome your thoughts!

