@Edd, organizer of #Strataconf, shared a link to @felienne’s talk on spreadsheets as “dark matter” in the enterprise. youtu.be/wbiVK6HKHHg

A compelling presentation of the problem of informal data representations of formal relationships.  I recollect a report of a savings and loan that went bust in the 1980’s where the only record of its assets was in a spreadsheet on a floppy.

This brings to mind that “bigdata” is not so much about the quantity of data as its lack of structure.  Sure, unstructured data is not a problem if there very little of it.  A shopping list that doesn’t track the supermarket aisles is manageable.  But voluminous data like a digitized movie is not a big data problem either.  One frame comes after another.

Spreadsheets and Word docs aren’t generated in the volume and speed of website hits, but get messy pretty quickly.  Which ones are relevant? how are they related? what do they mean?  Even a single individual can loose track of their own.  In groups, this is compounded.  Lots of organizations do not even use document management for important documents.  Those that do often have no way to really track the relationships.  In the absence of structure, docs are PigData.

It took me a while to see that doc management is part of BigData.  Felienne’s talk highlights this.

If the pig is on one side of the BigData continuum, order is at the other.