Project Idea: metacortex, cognitive prosthetic for information storage/retrieval
Right on schedule, about once every quarter I feel the need to build an ubertool to capture, organize, present, and share all the information I keep around me, structured and unstructured, textual, audio, visual, etc. Each time it takes a different form, and each time I end up going nowhere with it, but I write it down nonetheless.
Now I’m thinking something (very) vaguely like the metacortex in Charlie Stross’ book Accelerando. In the book lots of SF liberties are taken, of course, and the metacortex includes all sorts of semi-intelligent autonomous agents tightly integrated with the wetware to expand its consciousness and abilities.
I’m thinking something a bit less dramatic. The idea came to me when I looked up the same Ruby syntax for the hundredth time, and wondered why I don’t keep notes on things like this that I keep needing. A cache, I suppose you could say. A tool I could keep running in the background, hit a key sequence to activate, and use a fast keyboard-based interface to query the information I want, then dismiss again. If the info isn’t found, I could go to Google or the Pickaxe Book or whatever, and upon finding it, easily in a few seconds add the information to my metacortex.
I know what you’re thinking. It sounds like a text file and grep. But it’s more than that. Yes, I could keep a ruby_cheats.txt file around and grep it when I forget how to write a migration or whatever, but that’s just one more silo of information I care about, on top of gmail, my bug list, my library of ebooks, my blog posts, my furl bookmarks, and my research notes. What I keep craving is a system that can aggregate my information, and allow me to capture, store, organize, and retrieve it in whatever form and format makes sense. A system that makes my information portable, easy to access and share whenever I want, and yet without giving up the specialized tools I already use for special-case types of information (eg Word, Except, Google Docs & Spreadsheets, Bugzilla, gmail, and on and on).
I suspect the reason I keep feeling the compulsion to build this tool is that I keep needing it. I can feel that I’m doing things The Hard Way, and intuitively I realize there must be a Better Way, if only it can be developed.
Some of the things I like:
- The way Quicksilver gives Mac users easy, keyboard-based access to bits of information and actions on that information
- The way Bugzilla organizes my and my team’s bugs by milestone
- The way I can tag my blog posts by arbitrary named tags
- The way Google Desktop Search gives me one place to search my email and local files
I have literally gigabytes of information I care about, taking the form of emails, chat transcripts, blog posts, bookmarks, feeds, bugs, account information, contacts, PDFs, images, MP3s, videos, etc. I don’t think the CS community yet has the sophistication to represent all this disparate information in such a way that is can on the one hand be readily accessible as a unified set of data, and on the other hand capture enough of the format-specific metadata as to make full use of each type of information individually.
For example, how can I build a system that lets me keep my emails, PDFs, ebooks, and photos in one logical collection of information, but still makes it easy to manage my email by itself, or lookup something in an ebook, or browse through my vacation photos? Obviously it’s not realistic to build a monolithic app that incorporates a feed reader, PDF viewer, image editor, and email client. Even if it where, there will always be superior one-function tools (like gmail or Picassa) that it would be preferable to use.
Of course, technically, the system I just described already exists, and is running right now on my machine. It’s called a filesystem, and it was invented decades ago. It’s a very simple interface to information, using a hierarchical organization structure mapping names to blobs of data (files) or other collections of names (directories). Its simplicity is probably why we still use it more or less unchanged from its original invention.
WinFS was supposedly going to set this idea on its head. Files wouldn’t be located at a specific path, but would be accessible through a number of paths, like the collection of all MP3 files, or files tagged with ‘my music’, or whatever. This always seems to get pushed back, and I think this is in large part due to the fact that the hierarchical filesystem model of information organization works fine for most uses, and consumers barely grok it as it is, with no hope of groking a multidimensional attribute-based filesystem.
I’ve had all sorts of ideas to solve this problem. I had envisioned a tool that would operate as the hub for all my information, with adapters into and out of info silos like email and feeds and bug lists. This way, there’d be one tool to use for all your information needs. Of course, this breaks down pretty quickly, since I still want to use Outlook to read and write email, Bugzilla to manage my bugs, and Google Reader to read my feeds. It’s not that I dislike those tools. It’s that I wish they had the ability to integrate more tightly, and that I could insert my own systems in the pipes between tools to munge things however I wanted.
And that’s the problem. There’s not a doubt in my mind that the answer to this problem is not a monolithic tool. It will inevitably be a decentralized, loosely coupled system with autonomous tools interacting with eachother. I just don’t know how.
I just read Dreaming in Code, which chronicles (in the incredibly annoying, breathless newspaper style) the first few years of the development of Chandler, aka Kapor’s Folly. By the sound of it, Mitch Kapor has had the same recurring need to build a metacortex tool as I have, only given his $100 million fortune, he founded a non-profit and launched a quixotic quest to build the software. Amazingly, it seems he and the crack former Apple, Microsoft, Netscape, and AOL developers he hired hadn’t learned any of the painful software engineering lessons of the preceding forty years. This, combined with the absence of corporate dysfunction and PHBs as a helpful constraint, seems to have bred disaster. Now, they’re excited to release a sort-of-working calendar app.
Along the way, the team had the same epiphanies I had. The fixation with object persistence, brief experimentation with RDF followed by horror and revulsion, fascination with peer-to-peer topologies, grappling with the Web vs. desktop app conundrum, and on and on. At no point did they (or, as far as I can tell, have they yet) make the leap from abstract things that should be easier to a list of concrete functionality. I have little hope that they’ll ever get there from what I’ve read.
By way of backstory, the book looks at previous attempts to construct what I call cognitive prosthetics, which apparently date back to the 60’s! In spite of all this, it seems no one has come up with a solution. It’s hard to believe the world’s best computer scientists have struggled with this problem for decades, that I discovered and grappled with it independently, and that we’re no closer to solving it now.
It seems none of the approaches that have been tried fit the requirements quite right. The Burners-Lee Semantic Web crowd are probably the furthest away from the right answer in my opinion. How you try to solve this problem with top-down elaborate ontologies and RDF is beyond me. I expected better from Sir Tim.
The rigid, statically typed approach currently taken by information management tools is also wrong, of course, but has the benefit of providing tools that work today, albeit imperfectly.
I suspect the right answer is to be found in even less order, tools that start with no structure at all, and add in only enough to power some basic computational primitives. Something closer to a spiral notebook than a spreadsheet or database, sort of an infinite scroll of digital paper with information atoms strewn about.
Another challenge to my mind is the legacy problem. Designing a new system of computing from scratch, a grand unified architecture of information-oriented computing primitives forming a cognitive prosthetic kernel upon which information stores can be built would be one thing. Sadly, untold petabytes of information are already stored in legacy silos, requiring the construction of a system that offers some migration path of sorts.
I’ve re-read my old OneNote notes on this subject. I think I’ve explored this problem from every imaginable angle, and inevitably I end up with something explosively complex and unworkable. I know I’m missing some key atomic concepts which will make this solution possible, but try as I might they continue to elude me. More basically, I’m missing some simple, easy to state, straightforward problem that I personally have that I want to solve, which can catalyze this work. I’m like Mitch Kapor trying to build Chandler based on little more than a sense that things are too hard and information should be integrated regardless of type.
I’ve tried to design an information tool that used OO modeling to capture information. I’ve tried using text with tags that fed into a processing pipeline to fire off actions in response to markup. I’ve tried building a content database from the filesystem. It never works. I’m trying to solve the wrong problem. I can make a list a mile long of unrelated things that the system should do, unified only by the fact that each thing involves the representation and manipulation of information.
Some of the recurring themes:
- Capture information with only enough structure to enable computer processing. As much structure as possible should be inferred by intelligent software rather than stated by humans
- The spiral notebook is a useful metaphor for the functionality I’m trying to find, as it’s infinitely flexible, offers random access, high usability.
- Information modeling used to create simple information systems like requirements lists, bug lists, contact databases, etc
- Cognitive tools like set theory, concept mapping, mind mapping, math, etc built into the system for easy, effortless cognition
- Activities as varied as scheduling, email, blogging, feed reading, free thinking, software modeling, code reading, all supported
Recent twitter activity
- only good thing about turning 30: all my family here to help console me.
- Attention Census workers: GET OFF MY LAWN! That is all.
- The libertarian movement has no greater recruiting tool than the Virginia DMV
- 5 of 9 supremes think 2A is incorporated against the states. yay! Dick Daley, go fuck yourself.
- Pizza and whiskey have never tasted so good as they will tonight...
- discovering just what happens when you consume too much sorbitol. hint: nothing good
- I'm at Clyde's of Reston (11905 Market St, Reston). http://4sq.com/65fETG
- software engineering: designing awesome software, discovering your tools won't let you build it, then building mediocre new software instead
- reg-free COM with .net com interop: such a pain in the ass, you'll probably give up and use the registry instead http://bit.ly/9Y4NDS
- Yay, I found a bug in the Visual C++ 2010 compiler http://bit.ly/cXtK9p
Delicious Bookmarks
Recent Posts
- Getting Reg-Free COM activation working between managed and unmanaged DLLs
- The Code Analysis feature in Visual Studio 2010 Sucks
- Huge gotcha in TransmitFile
- Socialized Healthcare, Here We Come!
- Open letter to Virginia’s Senate delegation