Wow, did I ever get behind in posting, I was doing so well keeping up the info stream…  Well I had a pretty good reason I’ve been heads down in code on a tool I have wanted to release for some time now.  It’s a relatively small tool (though I could see it morphing into something bigger) but one I think will be very useful dealing with some of the data rot we all face as we interact with more people and do more things.

I am not releasing a lot of details at the moment but I am pushing hard to release a small beta to a test group then the full on application soon after that.  It’ll be a desktop application (yup they still exist) and will release first for the Apple platform.  This is a new way of deploying for me as I have always released for the Windows platform first (and in most cases only) but working with my new MacBook Air has inspired me to go Mac first. 

While I am not sure of the pricing model I may go with something like the Freemium plan outlined by Fred Wilson way back in 06.  One thing is for certain it won’t be an ad-driven model as it doesn’t fit the application, maybe a business pay / consumer free model. It’s odd… coding seems simple compared to finding the right revenue model in today’s marketplace.

Microsoft Powerset

July 3, 2008

I wonder if the tools Microsoft acquired via the Powerset acquisition will be used in not only their consumer live search environment but also to create new ways of looking at data in their enterprise search tools.

It seems to me that using Powerset to semantically slice, dice and present additional facetted data on a corporate information stack could be very useful indeed.

Some form of integration with SharePoint maybe…

Data vs. Information

June 27, 2008

Data is just data until it is contextually parsed then it can become useful information and as we all know he who has the best information not the most data rules.

Sunday Short Takes

June 22, 2008

1. I’m annoyed with the editors of the Globe and Mail’s Number Cruncher series.  They deliver their stories as blog posts but no one on the editorial team bothers to respond to their readers.  Maybe I am just the first person to ever ask a question so they are unsure what to do but come on after making me go through an annoying sign-up process you’d think someone could at least say “Can’t help ya mate”.  The real tragedy is it’s an excellent series.

2. I haven’t seen a lot of innovation in user interfaces for help desk applications, I wonder if creating a riff on the basic Getting Thing Done UIs I have seen would work and be helpful.

3. With all the unstructured data out there I wonder if a simple graphical tool designed to turn this data into actionable information would be helpful?  

What I am thinking is a visual environment with drag-and-drop parsing workflow creation where one could use predefined rules (or create their own rules) that parses a document extracting the information needed in a structured format 

So for instance you could create a “parser” for a series of emails that always have the essential same structure and get that data into a dB.

Hmm, maybe I need to think more about this, it may actually be quite useful…

4. Another interesting blog post today on /Message.  Stowe is right, at every turn and with every interaction we are exposing ourselves to loosely coupled, information rich data flows and we need to be able to mine these flows to extract actionable information otherwise we are stuck with static silos. Information is out there, we just need better means of extracting it from the crushing weight of all the data we see every day.  

5. Another OpenID mechanism.  From the site “Emailtoid is a simple mapping service that enables the use of email addresses as OpenID identifiers.”

Information is everywhere and we need tools that are transparent and can extract the concepts, terms and contextual meaning from this day-to-day data flow for our interpretation.

Timely Data

June 9, 2008

Paul Kedrosky had an interesting (at least to me) tweet today pointing to an article on the Wall Street Journal entitled “Playing a Role In Oil’s Ascent: Crude Data”. Essentially the article talks about  the market being starved for TIMELY data.

I am curious what data the market is looking for in this area and how much of it is available from open sources? 

If this data is more or less available from open sources I see an opportunity here to create an automated system to collect, collate and cluster this open data.  Of course that is not enough, we need to also use automated tools to rapidly refine the data pulling out specific data points of interest so the end consumer of the data does not have to.

Basically I am thinking I’d take some openly available crawling technology add a few of my tool (statistical and others) to gather and present realtime (or near realtime) reports.

So how about it Paul is this type of data publicly available (not an oil analyst so I am not sure precisely what data they would want) and would this be useful?

First what do I mean by actionable information?

It is information that comes from a trusted source, is about something that’s important to you, and that, once known compells you to an take action.

So how do we aqcuire actionable information from the fire hose of data we see every day?  Well while it often appears that the volume of information we have flowing through our various data pipes is inherently at odds with the ability for us to make it actionable I think this misses one key point, the fact that the inherent volume of similar data can actually be seen as a positive if the right “mining” tools are in place.

This flow of information more often than not has patterns that can be extracted statistically, syntactically or semantically or using some combination, ie. using syntactic and semantic features to improve statistical recognition.

A good overview of leveraging data using statistics can be found in a talk given by Peter Norvig at Paul Graham’s latest startup school.