Archives for category: graph

Regarding the “datagram” essence of Prose Objects, I’ve just now have achieved better clarity on the model. I’ve long wondered whether a Value should be viewed as the fundamental element, or the Map is the fundamental element. It now seems clear that in a sense we need to think both ways. It is very important to maintain a notion of “file with key=values” as the unit of conversation because this is how so much of conventional computing is organized. From config files to credit card receipts, address card and email datagrams, a file as map of k/vs is a very useful format.

But in understanding the paradigm and in building high-performance systems, it is helpful to focus on the key=values. This is my attempt.

The entire system can be viewed as Values – strings of text whose “keys” are names under which they are stored. For instance, in a file system, a Value is a string stored in a file, and the key is the file name. In a database, the key would be the ID of the database record.

The basic mechanism of Prose Objects is to assign _additional_ names to the Values, to give them additional keys used in the recursive expansion and inheritance.

Let’s express this in git-ish terms (with a nod to Ludovic Dubost, who may have anticipated this mapping).

The Value is a file whose name is the hash of the string. This is how git stores files. A Map is a tree (directory/folder) that references those files using meaningful names (the kind of names we now use for keys). These trees are organized into other trees (directories). These higher level (2nd-Nth level) trees correspond to both: 1) the current system of organization of files into folders [G/Z/ol/*] and 2) the links among Prose Objects. The trees can be traversed in two ways: 1) to add a collection of Values to a namespace of a tree (as is done currently by a link to another file) and 2) to obtain a Value’s string ({G/Z/ol/z6#sec}?). This unifies two concepts that seemed to want unification – the folder hierarchy and the system of links.

Note that this somewhat complicates the notion of a file in the current implementation – each k/v becomes a separate file (or is treated as such). This translation can be handled in interface – each line of a file can be saved as a new file and the “file” can be a folder. But there is a problem or needed refinement because while order does not matter for normal k/vs, it does matter for links (“=[top-priority.md] \n =[second-priority.md]” is not the same thing as “=[second-priority.md] \n =[first-priority.md]”). This needs further thought.

This Value orientation simplifies and adds power.  For instance, you can traverse the file-folder tree for a Value or Map in a way that seems natural but wasn’t possible.  The Values also automatically de-duplicate and connect.

For Prose Objects generally – https://commonaccord.wordpress.com/fulcrum/data-model/

Advertisements

Decision-making involves the dual factors of rules and judgment.  In automated systems and drafting of legislation and contracts, the judgments are encoded (fossilized) during planning.  At the opposite end, litigation is an extreme form of post-hoc decision-making.

As food for thought about the significance of focus (and role of judges), contrast the styles of U.S. judicial opinions and their French counterparts.  Most reading this will be familiar with the U.S. style.  U.S. judicial opinions, often lengthy, have a narrative form.  With a description of the facts and the case’s history in the courts (procedural posture), they will often weight factors and explain reasoning.  The decision is usually clear, but often interwoven.

French judges, performing the same function, use only a single sentence.  The legal premises, the facts and finally a verb, the result.  The sentence is long, and one is impatient for a verb, but it will always be the disposition of the case.  Puncto.  

The means, of course, that the decisions are quick to read.  They do not capture the attention of the reader for long.  Attention moves elsewhere, to how the logic and decision fit in a framework.  This is characteristic of legal thought in civil code jurisdictions.  Thought flows more rapidly across the entire framework.  It feels less local than in common law jurisdictions.  More connections are made.  For an excellent, and critical, intro to French judicial opinions, see Michael Well’s piece in the Yale Journal of International Law.

One of the oddities of decision-making is how difficult it is to integrate complexity.  @marclauritsen has been working on this problem with his choiceboxes.

In legal decision-making, the impression (of this student of law) is that complexity is often handled by alighting on a few factors, or even one, and declaring it determinative.   With a turn of phrase, a subtle, complex question is turned into a simple one.  Appellate decisions sometimes embrace this kind of reasoning.

Of course, it is hard, or even impossible, to express complex relationships in text except as stories.  My maternal grandfather often told anecdotes, the point of which seemed to be to open one up to the ironies of human affairs.  They often escaped my youthful grasp but stayed with me.  My father, a lawyer, also looks for cross-currents.  But he also says, with characteristic concision, that if you have more than three factors, you don’t have a rule.  Without rules, it is hard to structure human relations.

The hard parts, the rules, need connective tissue, judgment.  He (the father) also emphasizes reading between the lines of (US-style) judicial opinions to show what the judges are thinking, as opposed to what they are saying.  Morality and prejudgment are practiced in these gaps.

Now to the point, or rather, to the lines connecting points.  Graph representations convey patterns of connections, visually and viscerally.  One can see large groups of interactions.  So, might wide-availability of visual representations of connections lead to a broader view of legal causation and of the possibilities of rules?  Something more consistent with common sense or engineering?

As an example, might the ability to see the connections among various provisions in documents and among different uses in similar documents — patterns of transacting — get us up from the phrase level analysis that so strongly characterizes US law?

Might it help us to make judgments that are more contextually informed?  Like those in close-knit communities or bayesian networks.

Might I run out of question marks?

P.S. As an aside(?) see the next post, on a difference between French and US judicial opinions.

@Edd, organizer of #Strataconf, shared a link to @felienne’s talk on spreadsheets as “dark matter” in the enterprise. youtu.be/wbiVK6HKHHg

A compelling presentation of the problem of informal data representations of formal relationships.  I recollect a report of a savings and loan that went bust in the 1980’s where the only record of its assets was in a spreadsheet on a floppy.

This brings to mind that “bigdata” is not so much about the quantity of data as its lack of structure.  Sure, unstructured data is not a problem if there very little of it.  A shopping list that doesn’t track the supermarket aisles is manageable.  But voluminous data like a digitized movie is not a big data problem either.  One frame comes after another.

Spreadsheets and Word docs aren’t generated in the volume and speed of website hits, but get messy pretty quickly.  Which ones are relevant? how are they related? what do they mean?  Even a single individual can loose track of their own.  In groups, this is compounded.  Lots of organizations do not even use document management for important documents.  Those that do often have no way to really track the relationships.  In the absence of structure, docs are PigData.

It took me a while to see that doc management is part of BigData.  Felienne’s talk highlights this.

If the pig is on one side of the BigData continuum, order is at the other.

“The majestic equality of the law, which forbids both rich and poor to sleep under the bridges, beg in the streets and steal bread.” Anatole France, in Paris.

Further east and earlier, Leonard Euler, the great mathematician, used the Seven Bridges of Konigsberg as an example of connections.  He proved that a person could not visit each part of of the city while crossing each bridge only once.  Like a person navigating the desks of a bureaucracy imagined by Kafka.

Connecting these ideas, the paperwork of law acts like tolls on the bridges of society. While paperwork is required of all, it is more burdensome to some. It masks inconsistent treatment. The slow-down at the toll booth affects even well-informed travelers.

Euler’s math and its offspring, graph databases, seem a good fit with codified text.

For bridge fans — Konigsberg’s islands resemble those of Paris. But the bridge count in Paris is different, so it is possible to visit all parts of Paris without crossing a bridge more than once. This seems to remain so even if we lawyer the problem by excluding, say, pedestrian or one-way bridges. Anyone see a differentiator that will make the number of bridges odd for three of the land masses? Paris, kilometre 0.

In the old, less connected days, Sun Microsystems communicated the message that connected computers can work together with the phrase, “the network is the computer.”  It does not all need to be on your desk.  It can be here, down the hall, across the world, in all those places at once.

The same is true for legal documents.  We trade, with others, often on behalf of employers or clients, in markets with established expectations.  Legal dealing is collaborative in its very nature.   Since we are sharing documents, we “might as well get good at it.”

The system of “lists” that is the core of the Cmacc data model was observed by @paulcapestany to look like a “graph.”  Graphs don’t mean what I thought, and a bit of investigation shows that Paul is right.  More than right. 

Graphs are a field of mathematics developed by Euler in the 18th Century.  The math has achieve new prominence as organizations try to understand and manage their many relationships, internal and external.  Some “graphs” include Facebook, LinkedIn, and Twitter.

I had long understood Cmacc to be triples.  This has such relationship to that.  Graphs systematize the bidirectional nature of chains of these relationships.  The network of relationships.  If X employs Y and Y’s hair is cut by Z, then Z is the haircutter of the employee of X.  And so on.

There is a fantastic amount of excellent work around graphs.  Cmacc is a way of rendering a document from a graph.  This means that the relationships expressed in graphs can be – more or less – legally self-executing.

For some work on/in graphs, see e.g., opencorporates.com, neo4j.org and @fredtrotter‘s docgraph.