My foray into software analysis tools

My interest in online collaboration and collaborative knowledge development has rubbed my nose in the problem that tools for automatically analyzing and summarizing systems are substantially short of what ought to be possible.  This is not a new revelation for sure, but this time around, I’ve embarked on a foray into the analysis tools arena to understand it a bit better, and see if some key obstacles can be surmounted.

Detour from collaboration to analysis tools

For effective collaboration (or knowledge production or publishing efforts), relatively elaborate and detailed requirements must be met which can be quite specific to the arrangement of people and the nature of the tasks and the content.  These requirements may well change over time, with limited time window to meet them before collaboration participants (or clients or customers) lose interest.

Of course, there’s quite a spectrum of online applications or services which attempt to cover different parts of this territory:  wikis, email, forums, IM, task trackers, Google docs/calendar/sites, Basecamp and Confluence and similar, blogs, content management systems (CMSs) and so on.  But inevitably it’s only practical for these services to offer limited configurability, using which you can only achieve an approximate fit to ideal requirements.

This prompts us to look to open source packages, in hopes that we can achieve a better fit through modification of source code if need be (without writing an app from scratch), and by taking advantage of the customizations and add-ons that other participants often contribute to open-source packages.

Open source increases the necessity to interact with source code, both to understand the product (and add-ons) in the first place, and to implement initial and subsequent customizations of our system, not to mention deal with subsequent versions of the adopted open-source package.

But interacting with source code is hard, and can easily become the bottleneck to progress on the original quest — the collaboration or publication effort.

So we are quite interested in tools that might digest source code and transform it into summaries or diagrams that we could understand vastly more rapidly than reading reams of text.   By the same token, these kinds of tools might help actual open source contributors to work more effectively, something we’d like to encourage if we want to enjoy the product of such labors.

This is certainly not the first time I’ve found myself traveling approximately this same path of reasoning.  This time I’m persuaded me to take a detour for a closer look at the state of these tools, to identify key obstacles, and consider how these tools could be brought to bear better than they are currently.

Tools for analyzing and summarizing  a software system

A very broad-brush breakdown of a software system distinguishes between what the system knows, and what the system does.  In a typical online system, what the system knows (at least for the long term) corresponds to the data it stores in a database, while what it does corresponds to the functionality implemented by the program source code.  Each of these can be examined to gain an understanding of the system as a whole, and each sheds some light on the other.

The tools for examining, digesting and summarizing each of these aspects of a system are at quite a different states of usefulness:

  • Database tools
  • Source-code analysis and summary tools

Database tools

(“Database diagrammer”, “Entity-relationship tool”, “database modeling tool”) There are numerous tools in this category, and many do a good job of reading an existing database and summarizing this into diagrams that you can read quickly to learn most of what’s significant about the database coarse and detail structure — elucidating what the system knows.

It would be useful to attach additional information to this kind of diagram, but most such tools don’t provide any facility to merge in and visualize additional information.  I’ve been helping to encourage richer database diagrams, by contributing to the ModelRight effort, which I describe in more detail here [link to come].

Source-code analysis tools

Compared to database structure, source code is a much more complex material to analyze, so the tools in this area are in a correspondingly less satisfying state. Though there is a variety of tools which programmers use to help work on the detail level of source code, there is a dearth of tools capable of providing intelligent and rich summaries of the important structure and relationships of source code.

Yet some of the obstacles to advancement in this area appear to be relatively mundane,  raising hopes that progress might be encouraged.  A major obstacle has been the relative awkwardness of just parsing source code (successfully!)  to derive reliable raw data as a basis for analysis. This, I suspect, has been a substantial deterrent to developers creating ad hoc analysis when they need it, which might have snowballed into more elaborate tools.

However, I’m encouraged that the basic parsing tools are getting more capable and domesticated.  In this vein I’ve been working to get a better grasp of ANTLR, and making some modest contributions to that effort, as I describe here [link to come].


Post a Comment

Required fields are marked *


%d bloggers like this: