Monday, February 27, 2006
Look to add value to the Aggregate Web of data
As a company with infrastructure that can scale to scan, retrieve, and analyze a significant portion of all the public on-line information in the world, think about how you can use those capabilities to improve the world. What can you do that someone looking at a much smaller set of the data cannot? What patterns can be found? What connections can be made? What can you simplify for people?
Build for normal users, developers, and machines
Make whatever you build easy to use, easy to hack, and make it emit useful data in a structured form. That means you need a usability geek, an API geek, and probably an XML/RSS/JSON geek.
Start designing with data, not pages
Figure out what data is important, how it will be stored, represented, and transferred. Think about the generic services that one can build on top of that repository. Only then should you get the wireframe geeks and/or the photshop geeks involved.
This is scary, because you won't have a mock-up right away. Your PowerPoint presentations will look as if they're missing something. But that's okay. This is about doing some engineering style design before product design and interface mocks.
Identify your first order objects and make them addressable
Figure out what your service is fundamentally about. If it's a social shopping application, you're probably dealing with people, items, and lists of items. Nail those before going farther. And make sure there's a way to access each object type from the outside world. That means there's a URL for fetching information about an item, a list, etc.
These are the building blocks that you'll use to make more complex things later on. Hopefully others will too.
Use readable, reliable, and hackable URLs
If the URL is hard to read over the phone or wraps in email, you're not there yet. Simplicity and predictability rule here. Consider something like http://socialshopping.com/item/12345. You can guess what that URL does, can't you?
You may not grasp how important this is, but don't let that stop you from worry about it. This stuff really does matter. Look at how most URLs in del.icio.us are guessable and simple. Mimic that.
Correlate with external identifier schemes
Don't go inventing complete new ways to represent and/or structure things if there's already an established mechanism that'd work. Not only is such effort wasteful, it significantly lowers the chance that others will adopt it and help to strengthen the platform you're building.
You are building a platform, whether you believe it or not.
Build list views and batch manipulation interfaces
Make it easy to see all items of a given type and make it possible to edit them as a group. Flickr does this when you upload a batch of photos. Search, in its many forms, is the classic example of a "list view."
Create parallel data services using standards
Developers (and the code they write) will want to consume your data. Do not make this an afterthought. Get your engineers thinking about how they might use the data, and make sure they design the product to support those fantasies. Again, always default to using an existing standard or extending one when necessary. Look at how flexible RSS and Atom are.
Don't re-invent the wheel.
Make your data as discoverable as possible
The names and attributes you use should be descriptive to users and developers, not merely a byproduct of the proprietary internal system upon which they're built. This means thinking like an outsider and doing a bit of extra work.
Friday, February 24, 2006
One such feature was called ‘historization’ - capabilities to set an object’s validity. As were many other concepts within that framework, very smart people thought about this hard, and came up with an ingenious solution that enabled the creation of objects in the future, the past, the virtual might-have-been future and many other strange phenomena. For instance, you were able get answers to questions such as “if I had asked two years ago what this particular object would look like three years from now (I mean, then), what would it have been? (See what I mean?) All of this was hidden within the framework; you simply set a few time stamps, and if you didn’t, defaults were applied; you were able to set a view (a point in time), and the objects state, including relations to other objects, would magically match that view, of course only within that logical transaction (which be built as part of our own transaction framework) … It was brilliant. It was hard to implement, but two very smart guys did it. It rocked. It was also totally unusable. The problem was that while it may have been able to provide lots of answers, nobody knew how to ask the matching questions.
Monday, February 13, 2006
Previous research had shown that rats replayed specific brain firing sequences while sleeping.
. . .By measuring the amount and location of the hippocampus cell firing, the researchers were able to determine that [while awake] the neurons fired in the exact reverse order of the firing that occurred when the rat scurried from one end of the track to the other.
. . ."When awake, reverse replay occurs in situ, allowing immediately preceding events to be evaluated in precise temporal relation to a current anchoring event, and so may be an integral mechanism for learning about recent events,"
Wednesday, February 08, 2006
Here are some other basic things that I think fall through the cracks a bit if you're not paying attention:
- Database indexes. Too much, too little, or just the wrong indexing will drag your code down
- The ACID rules and transactional boundaries. I think the mass majority of systems I've seen do a bad job of governing their transactional boundaries. I'm not talking about "look those jokers didn't use the perfect transaction isolation level there," I mean zero transactions at all. When things blow up and go wrong, will your code and database end up in an invalid state? Will you lose data? Drop messages? It's not just important for database development either, you also need to consider the interaction with any kind of external system or messaging infrastructure as well.
- Database archiving. Databases can get awfully slow when table sizes get bigger. I watched a batch processing system implemented by a very large and prominent consulting company (here comes the script kiddies!) become almost useless in its 6th week of production due to severe performance problems. The culprit? A very poor database design from the software vendor was compounded by the lack of a database archiving strategy. I got to lead a team that built a replacement a couple months later. Our system archived records the second that the workflow was completed. Our throughput was 2 orders of magnitude higher.
- File system archiving. Weird, nasty things happen when servers run out of disk space.
- One of the things I love about .Net over the "Set objEverything = Nothing" nonsense in VB6 is garbage collection, but it isn't magic. Gotta make sure that unmanaged resources are cleaned up.
- Database connection discipline. One of the many reasons I think we all need to stop writing so much ADO.Net code by hand is to eliminate the possibility of screwing up database connection usage. I've been told this isn't as big an issue with the newer Oracle and Sql Server versions as it was 5 years ago, but I wouldn't bet the farm on the database being able to handle a couple thousand orphaned database connections.
- This is probably rare, but I had to fight with an XP zealot PM over this one time. If you're going to use some sort of offline pessimistic lock, you need to have some sort of mechanism to clear out stale locks. Things will go wrong.
- Adequate tracing and auditing - that is easily accessible and useful to the production support folks. In Agile process speak - remember that the production support organization is an important stakeholder in your project. In my case we're the support as well, so I think we'll care a bit about this one. On the other hand I detest code with too much tracing because I think it obfuscates the code and generates too much noise to be useful.
- Coding standard. Consistency is the hobgoblin of little minds but the hallmark of a smoothly running software team. This is called out very specifically by XP as a core practice, but I think my company has done a poor job of developing and following a coding standard in the past.
I've seen Agile teams screw themselves by taking a too literal approach to the XP ideals by putting on the blinders and only focusing on the functional requirements from the customer. There are additional technical requirements that could be termed a "cost of doing business" expense for building a system. You have to fulfil these requirements to make the system viable. It's our responsibility as developers and architects to communicate these issues to the project manager to insure these purely technical stories are adequately accomodated by the iteration and release plans. Now, what to do when the PM doesn't listen to you? Um, run away?
Tuesday, February 07, 2006
1.1 Modeling Meaning
So we want to study semantics. But how? To study meaning, we need a language for describing meaning. Human language is, however, notoriously slippery, and as such is a poor means for communicating what are very precise concepts. But what else can we use? Computer scientists use a variety of techniques for capturing the meaning of a program, all of which rely on the following premise: the most precise language we have is that of mathematics (and logic). Traditionally, three mathematical techniques have been especially popular: denotational, operational and axiomatic semantics. Each of these is a rich and fascinating field of study in its own right, but these techniques are either too cumbersome or too advanced for our use. (We will only briefly gloss over these topics, in section 23.) We will instead use a method that is a first cousin of operational semantics, which some people call interpreter semantics.
The idea behind an interpreter semantics is simple: to explain a language, write an interpreter for it. The act of writing an interpreter forces us to understand the language, just as the act of writing a mathematical description of it does. But when we’re done writing, the mathematics only resides on paper, whereas we can run the interpreter to study its effect on sample programs. We might incrementally modify the interpreter if it makes a mistake. When we finally have what we think is the correct representation of a language’s meaning, we can then use the interpreter to explore what the language does on interesting programs. We can even convert an interpreter into a compiler, thus leading to an efficient implementation that arises directly from the language’s definition.
A careful reader should, however, be either confused or enraged (or both). We’re going to describe the meaning of a language through an interpreter, which is a program. That program is written in some language. How do we know what that language means? Without establishing that first, our interpreters would appear to be mere scrawls in an undefined notation. What have we gained? This is an important philosophical point, but it’s not one we’re going to worry about much in practice.