Windows Longhorn: A Case Study in Innovation Strategy

Photo Credit: whisperwolf via Compfight cc

Photo Credit: whisperwolf via Compfight cc

Large successful companies must inevitably confront the Innovator’s Dilemma: when you’re making billions of dollars from your core product, how can you do anything that might disrupt – or even disturb – that cash cow? In most cases, you don’t: you look for incremental gains, adding small wins here and there to keep the gravy train rolling, if not rocketing into the cosmos. This is the innovator’s dilemma; the successes of the past become the cage that keep you from reaching deeply into the future.

Having worked at three different multi-billion companies, I’ve seen some different approaches to this play out at very close range. While far from scientific, I think they provide some interesting case studies. They’re not scientific; these are personal observations from up close. I’m sure others who were in these situations have very different perspectives.

In this post I’m going to chat about one approach to “breaking the innovation logjam.”

Windows Longhorn: All-In Innovation

In the early 2000s, several major executives at Microsoft – notably Bill Gates and Jim Allchin – were getting tired of being perceived as “not innovative” – this was as OSX was getting back on its feet and Apple was beginning a major innovation streak that would culminate – much later – in the wholesale reinvention of the personal computing space. All we knew at that time was that MacOS was starting to look really slick and that innovations like Quartz (the desktop graphic composition engine) and Core Animation were making MacOS feel far more modern. The Windows team was given a clear and simple message: go for it. Take all those big ideas you’ve been dying to get to but that didn’t in previous tentative, incremental releases, and take the time to do them right.

Longhorn was originally intend to be a “quick fix” release of Windows, knocking off some customer pain points before the next big moonshot. Instead, it became a moonshot itself, seeking to scratch every major itch in the Windows technology stack.

What did that entail? Primarily there were three components:

- Avalon: a completely new user interface framework, backed by markup (so that design teams and engineering teams could have a total separation of concerns) and with a deep and sophisticated animation architecture. The Windows desktop, or “shell” was to be completely rewritten using Avalon. Early demos of the new shell showed such rich 2D and 3D animation deeply integrated into the desktop as to make MacOS’ minor window animations seem primitive in comparison.

- WinFS: Microsoft makes some of the most powerful database technology in the world in SQL Server. Wouldn’t this be a more powerful system for users to store their documents and Data? WinFS sought to rebuild the end-user’s storage and search experience atop the world’s most popular (at the time) database. (This is where I first entered the picture: as a new recruit to Microsoft Research, my first task was to design a new user model for document storage on a relational database. More on this another day.)

- Indigo: COM is the fundamental way that Windows applications talk to each other; Distributed COM, or DCOM, enables apps and components to talk to each other over the network. Indigo was “DCOM on steroids,” enabling Windows apps to communicate over the internet, with rich capabilities for apps and components to find each other across the internet, negotiate capabilities, and cooperate.

So here we have every area of the desktop computing ecosystem given a blank check to invent their ultimate form. At the same time.

And one more thing: all of these systems were being written in a language – C# – that was also being invented at the same time.

So what happened?

Technical culture: Windows had one of the largest collections of C programming experts in the civilian world. Taking these world-class experts and making them feel like beginners on a language that was still being refined was not a recipe for happiness. Tricks that had worked in COM or the Windows API didn’t necessarily work in the new world. All the old ways of getting things done, of breaking logjams and “just shipping” were up in the air. Never I have seen “getting it right” and “getting it done” in such direct opposition.

Building on quicksand: any of these major subsystems would have been a massive endeavor – remember, this wasn’t an app, this was an ecosystem with millions of existing applications, drivers, and hardware variations. One senior Windows engineer referred to it as “doing open-heart surgery on a Formula 1 race driver – in the middle of a race – at 240mph.” Having virtually all the major subsystems of Windows being deeply reinvented while depending on each other was simply too much. Development slowed to a crawl, morale dropped through the floor, and simply getting the whole system to compile was becoming problematic.

Meltdown: at a critical juncture in 2004, another critical problem came to the surface. One of the great benefits of a “managed language” like C# is that it manages memory for you using a process called garbage collection. In unmanaged languages, programmers must take care to return any memory to the system that they are no longer using. It’s a huge pain and greatly increases code complexity, but it’s an established methodology. In managed code, all this is handled for you. Unfortunately, there was a huge gotcha with C#: when the core “engine,” or runtime runs out of memory itself, the whole thing just quits. This is tolerable – sometimes – if you’re an app. When you’re the operating system, it’s not acceptable.

At this point the execs had a brutal choice: push through and solve the problems, facing down a full-scale mutiny from some of their most precious technical talent, or admit defeat and junk the whole thing. They chose the latter, dumped the entire code base, pulled the Windows Server 2003 kernel off the shelf, and started again, eventually delivering Vista on a painfully accelerated schedule for what was still an ambitious release, with deep architectural changes like GPU compositing of the desktop, deep security management of every internal API that could touch user data, and – my favorite feature – a new blindingly fast and powerful search engine – basically throwing away the new developer APIs and trying to deliver as much of the envisioned user experience as possible. In hindsight, Windows Vista was a pretty good beta for Windows 7.

What was the lesson? Watching this train wreck up close, I learned a lot about the difference between “good enough” and “perfect” – and about the need to manage the speed at which you are attempting to innovate. A metaphor I’ve heard to describe this is “physics” – there is a certain pace beyond which a code base will not move based on a combination of legacy, culture, and clarity of purpose.

Could Longhorn have been saved? Hard to tell. It would have cost the company billions to find out. Would it have mattered? I think it would; a fresh code base, deep design tooling, and a new, more agile UX layer would have enabled a rapid pace of innovation for MSFT moving forward, and instead it took years to get Vista to a decent level of quality in Win7. Still, I’m not sure I would have decided differently from Gates and Allchin in 2004, and I’m glad it wasn’t a call I had to make.

How Hardware Acceleration in CSS Works

Want to get your CSS animations to run at 60 frames per second?

If you’ve been working with CSS3 transforms and transitions, you’ve likely come across the issue of hardware acceleration. This is a feature of the browser that allows certain graphics operations to run many times faster than their non-accelerated equivalents. Done right, this can result in high frame rate, silky smooth animation that can actually save battery life.

Right now, tapping into HW acceleration is a bit if a dark art; if you’ve seen incantations like -webkit-transform: translateZ(0px) – you may have wondered how such an apparently meaningless instruction can cause a div to render faster. And why does it cause some operations to be slower?

The answers are really pretty simple, but they involve stepping out of the standard CSS world and learning a little bit (very little, I promise) about how graphics hardware works. You probably know or have heard bits of this already, but hopefully looking at the whole picture will make it easier for you to design high-performance HTML5 and CSS3 web pages.

Our first player is the GPU. This is a Graphics Processing Unit: it’s a microprocessor, just like the CPU that runs your Javascript, but it only knows how to play with pixels on the screen. Because it’s so specialized, it can perform graphics operations many times faster than the more general purpose CPU. If you’ve heard of Nvidia, this is what they make. Pretty much every smartphone and computer made these days has a pretty powerful GPU; the latest version of the Nvidia Tegra, a popular mobile GPU, has four independent processors, all putting together that retina display you love to play with. GPUs can’t run javascript, and they can’t manipulate the DOM. In many systems – like high-end PCs – the GPU and the CPU each have their own RAM and it actually takes some time to copy pixels from the CPU’s RAM to the GPU’s RAM.

GPUs are processors and, while they can’t run Javascript, they can run programs written in a special graphics programming language known as a shader language. (These programs are called shaders because they figure out how light or dark to make a given pixel on the screen.) If you’re writing a video game, you can make shaders do all kinds of crazy stuff, but in the web browser, for the moment, your browser figures out what kind of shaders to use on the GPU. The bottom line is that, in CSS, there are a fixed set of things which the browser will ask the GPU to do, and these things will be done very fast. Other things will be done using the normal CPU and will be slower – the browser splits up your page into bits and hands them to either the GPU or CPU to be turned into pixels.

Lots of details. So how should we think about this when architecting our web UX?

The first thing to realize is that the GPU in a browser is only used for a few different effects, and the ones to pay attention to are transforms (-webkit-transform,) transitions, and translucency. This will surely grow over time; color fills and gradients are very “acceleratable,” but it’s up to the browser vendors to do it. For now, focus on the transforms as a) they are super powerful and cool and b) they will help you understand how it works.

And here is how it works, simplified:

  • the browser looks through your web page and finds DIVs that either have a (-webkit-) transform or are translucent
  • each such DIV – let’s call them ‘accelerated DIV’ is marked as “turn into a surface and send to the GPU”
  • other DIVs are chopped into surfaces based on size and layering (not important)
  • all surfaces are filled in and sent to the GPU
  • the GPU layers all the surfaces properly and builds the final image

So far, none of this has resulted in a tremendous amount of speed except that last bit: taking a hundred surfaces (think of layers in Photoshop) and compositing them together with translucency is very very fast on a GPU.

Where we really get speed is next: once you’ve sent a DIV to the GPU, if you want to move it in 2D or 3D, you can tell the GPU “rotate this by 2 degrees” and it will happen virtually instantaneously. You don’t have to send it to the GPU again. The CPU will not even lift a finger, and your device battery will be spared. This is why transforms are really fast and efficient.

Gotcha: Some things have to happen on the CPU. Text is a classic example (although this is changing.) If you’re going to put some text in a div and fly it around, great. But if you’re going to change that text as it flies around, you’re making it hard for the CPU to get out of the GPUs way. So if you transform a div without changing what’s in it, it will be lightning fast, but if you change the contents of the div while transforming it, you can cause it to be brought to the CPU, filled with the new content, sent back to the GPU, and then transformed. This is not at fast.

Simple rule: transforming DIVs without changing their content will be fast. Changing things inside the DIV may or may not slow things back down.

Aside from changing the contents of a DIV, changing the shape of the DIV can cause it to be re-sent to the GPU. A typical example of this is changing the height or width property. Since this needs a different-sized surface, current browsers appear to allocate a new surface, fill it, and then send that over since there is a different number of pixels in the surface. Note that this is different from using transform: scaleY. Scaling happens on the GPU and doesn’t affect the contents and can be fully. I think of this as happening the “outside” of the surface, whereas changing height affects what’s inside. It’s not important to mull this deeply; consider it an example of what to watch out for.

A little more about that transformZ(0px) trick we mentioned earlier. This has been recommended broadly as a way to make a div faster. The reason this works is because any transform property marks the DIV as an “acceleratable” surface; it will be sent to the GPU independently. This means even properties like “top:” can be faster. There’s a caveat, though: when you first apply a transform, there is a tiny stall as the div is sent to the GPU. Don’t go from “non-transformed” to “transformed” at the last minute. I came across this in an app where dozens of objects moved quickly in response to an event. Initially I applied the transform when the event fired. This resulted in a nasty “hiccup” when the animation started. I think applied a “null” transform (transformZ(0px)) in the default style of the items, with the result that when my event fired, the DIVs were already copied to the GPU. Result: hair-trigger responsiveness. Nice.

If you’ve made it this far, congratulations – it can seem quite complicated. This has been a very light sketch of a complicated issue that is evolving rapidly. Used properly, however, hardware acceleration is extremely powerful and usable right now and can deliver real benefits – responsive UI and longer battery life – to users today.

The future is bright as well: almost everything that happens in a browser can be accelerated further and at some point we may find that the boundaries between CPU rendering and GPU rendering go away completely. Digging a little deeper into this topic and running your own experiments will make it easier for you to pick up new enhancements as they come down the pipe.

If you’d like a visualization of how the browser breaks up your page, Chrome has a useful visual debugging feature: enter “chrome://flags” and look for a setting called “Composited render layer borders.” This will put a red rectangle around each different surface that is being sent to the GPU.

There are many more useful details around, and I’ll likely update this post down the road. If you’re a browser implementor, please forgive my lightweight treatment and feel free to correct any details in the comments.

‘night all!

Lawnmower Man

Lawnmower Man is a low-budget (5M) independent film released in 1992. It was the top-grossing independent film that year, and the the first film that I know of to use full-frame 3D animation for key plot scenes.

It’s also the one motion picture I’ve worked on.

For Lawnmower Man, I managed a small team that designed and animated simulated (i.e. fake) user interface for the characters to interact with. This was a side project when I was still working at Apple, conducted in the evening and during strategically chosen vacation days.

My work got an unusual amount of “screen time” for a first effort, and the director shook his head as he told me what other directors were stuck with for their screen graphics. The Northern California interactive media boom had made a lot of new stuff possible: interactive 3D, 32-bit graphics, transparency, animation and compositing were way ahead of hollywood’s UI thinking. For many years after, movies were still cursed with command line UI and super-crude graphics. It was fun to solve the problems with real interaction design tools, even if the UI was completely made up.

In those days, directors didn’t want a computer anywhere near the rolling camera as a locked-up computer could hold up shooting, costing hundreds per minute. After animating all the user interface in advance, I used a special video recorder, controlled by computer, to record one frame at a time onto Betacam tape cassette. The result played back beautifully and smoothly – quite a treat for those of us used to the stutters of real-time animation in the 90s.

On camera, this tape was then played through the fake computer monitors while the actors pantomimed interaction with the user interface. The UI would move and change on its own; to appear to be using it, the actors (Pierce Brosnan and Jeff Fahey) memorized the timing  and positions so they could move their hands correctly.

An interesting note: video (in the US) and film have different frame rates. Video is (around) 30 frames per second, and film is 24 frames per second. If you just point a film camera at a video monitor, the difference in frame rate will cause moving black bars to appear on the video screen, ruining the visuals. To fix this, they have special tape players that synchronize with the film’s shutter speed. There are companies that specialize in this, and you see them in the credits as “24 frame playback.”

For production, we used the 3D workhorse of the time for the Mac (the ‘040 era:) infini-D and everyone’s favorite animation and interactivity authoring tool, MacroMind Director. This was a time when the ability to composite images with an alpha channel (transparency) was somewhat new in the personal computer world – the first time where you could seamlessly blend output from 2D and 3D graphics programs, and Director was a great tool for this type of work.

It was a great experience visiting the set to do the final transfers; it was my first view into how large-scale creative endeavors can be organized to maximize creative freedom and personal expression while tightly controlling cost and delivering a compelling product. One “all nighter” as I recorded a few fixes to some of the animations, the set crew built an entire house – with paint and wiring – on the soundstage outside the art department. I was amazed at what this industry does on a daily basis. As the software industry becomes more design-driven, we are similarly challenged to be creative at scale – to maintain a coherent creative vision while coordinating the actions of thousands of individuals, many with their own creative sub-domains. But that’s another post.

Tagged , , , ,

Hybrid Software Design at eBay

Designers and engineers often approach product design differently. Designers might focus on research or narrative, while engineers might talk about frameworks or the algorithm that defines a particular behavior. To an engineer, designers can seem superficial: “thunder without lightning”; to a designer, engineers can seem reductive, as though they are prematurely “leaping to solution.” Even with best intentions, misunderstandings can be major and translation between these disciplines can be error-prone and expensive.

Regardless of cost, improving communication between design and engineering is critical. In an increasingly consumer-facing software enterprise, design quality correlates to bottom line profit. Design quality is in turn strongly influenced by a team’s ability to review a wide range of designs quickly: its iteration speed. This iteration speed is in turn largely driven by the communication quality – the bandwidth – between design and engineering functions. I call this latter the design bandwidth.

At eBay, one approach we’re taking to improving design bandwidth is to cross-train individuals in technology and design, and to deliberately hire people who have already trained intensively across disciplines. Because these individuals comprehend both design and engineering aspects of a problem, they are more able to resolve constraints, access efficiencies, and find synergies across domains – all essential to converging on an optimum experience efficiently.

Individuals who are skilled in both design and engineering can offer unique efficiencies in a corporate design environment. Some benefits of hybrid design individuals include:

  • evaluating feasibility and cost in real time during the design process
  • recognizing experience opportunities arising from technical issues
  • unearthing unforeseen issues through rapid prototyping
  • increasing usability research quality through prototype fidelity and rapid integration of usability findings into the prototypes
  • dynamic design deliverables – such as stylesheets, markup, and code – that eliminate specification ambiguity

Many engineers would like to do more design. Many designers would really enjoy doing more programming. Companies appreciate people who can do both: individuals possessing both design and coding skills can accelerate iteration speed and the depth (completeness) of design evaluations –  resulting in increased code velocity, greater volume of usable ideas, faster integration of usability findings, and higher confidence in “buildability” earlier in the design lifecycle. Many companies recognize these benefits and are actively seeking individuals who can both design and code. If you’re in the software experience industry, you know this already. If you’re one of these people, you may have noticed recruiters major tech firms specifically looking for this hybrid skill set.

Historically, large-scale software development organizations (of which I have worked for Apple, Microsoft, and now eBay) tend to inadvertently discourage internal development of such cross-disciplinary talent even while they actively recruit for the skill set externally.

To understand why, consider this: The top level of your typical major software organization is divided along discipline boundaries very high up in the enterprise. At eBay, we have “Marketplaces” and “PayPal” and, within each, separate Design, Software Engineering, and Program Management. (we have a lot more at eBay too but that’s a different discussion)

For a hybrid individual in such an organization, there is a strong organizational and cultural incentive to “choose sides” and seek to rise within a specific discipline – in particular, experience design and engineering are typically quite distant organizationally.

Last year (my first at eBay,) I designed a new job family at eBay called Design Engineer. This is a job family within the  Design job ladder, which I collaborated with our HR team and executive leadership to create. The Design Engineer track goes from college hire all the way up to Design Fellow – a VP-level position. A Design Engineer is someone who is skilled in both experience design and software engineering and is continuing to make progress in both disciplines. This latter part is very important – it’s all too easy to lose touch with the leading edge of a discipline as you focus more on management. By having VP-level Design Engineers who are not required to be people managers, we provide a career path for those who want to continue to become deeper and more powerful designers as they grow in responsibility and impact. While I do manage a team, I strive to continually update my IC skills; I don’t need to be the best in my team at coding, typography, or any specific IC skill, I think it’s important for leaders to maintain currency with the creative landscape. Otherwise one runs the risk of missing major “sea changes” in what our creative organizations are capable of producing.

Tagged , , ,

Kodu – why teach kids programming?

My favorite project ever was Kodu. This is a code development environment and world design tool design for young children + everyone.

Kodu is still growing at Microsoft. Some folks tell me that installs are over 600,000 in numerous countries. It’s been ported to several languages including Spanish, Hebrew, and Polish (Poland is a hotbed of sophisticated software design talent.) I was always delighted to see that we had at least a few downloads in Yemen – I like to think some activist teacher somewhere is helping little kids who may not have an xbox to program their own video games.

Why is programming good for kids?

Programming teaches people to design, predict, test, and modify complex systems. It also gives insight into how complex systems can arise from very simple input.

People today are surrounded by complex systems: from home theaters to government to the world economy. In the digital economy, creation of new complex systems – like Twitter, Facebook, and Youtube – requires an insight into how to grow and manage such systems. Increasingly, casting a vote correctly requires a degree of sophistication about what you want government to do and not to do. Teaching people to understand systems, you could argue, is a basic survival skill. This type of design is not static; it is algorithmic. Programming is the art of algorithm design. (Yes, I said art an not science. Programming is an art which, like painting, can be improved by science.

If you are a programmer, you are probably nodding; if you haven’t programmed, you might have difficulty understanding my point. Which is itself part of my point. In today’s academic curricula, math and writing are seen as the primary tools for thinking about systems. That’s fine and good. But when you design a game world with multiple actors each trying to achieve different objectives, and modify that world to create a larger narrative, and then, finally, run that system and watch it do things you never imagined, you are doing something very, very different from deriving the angles of a right triangle.

Learn more about Kodu here: http://www.kodugamelab.com/

structural animation in HTML5 + CSS3

I was cooking up a new search preview control lately and came across a very interesting idiom in HTML5/CSS3 that seems like one of those small technical details that indicates how we code web apps is about to change radically. This JSFiddle demonstrates what I consider to be an interesting and possibly important breakthrough in UI frameworks.

If you run the fiddle on a Chrome browser you’ll see an image. If you hover your mouse over the image, you’ll see something like the below:

Screenshot showing a user interface consisting of a picture of a "Tintin" statue with an array of images fanned out behind it like a deck of cards.

A control for showing a set of images in the effective space of one image. The user can “spin” the deck so that cards are rapidly rotated through the top position, allowing rapid preview of set contents. When the mouse moves away, the “deck” smoothly stacks up under the top image. see the demo

You’ll notice (on Chrome) that the cards in the background start out hidden, then animate smoothly out into the fan shape shown above.

There’s more: click on the top image and it will smoothly fade while flying towards you, the rest of the deck will smoothly rotate up with each card taking the place of the previous one, the original will reappear smoothly at the bottom – allowing infinite cycling – and everything happens at the best frame rate your machine can produce. On a MacBook Pro, it’s a solid 60 frames per second. It’s an entrancing effect, tuned to the nearest millisecond. In one of my apps I have it hooked to scrolling and it runs like recombinant mercury.

Here’s the interesting part: the code that runs when you click (or scroll) looks like this:

box.prev = function() {
$div.append($div.children(“.page”).first().remove());
}

This is superficially JQuery, but the practitioner will recognize that JQuery is not being used for animation. What this says is:

“remove the first child of the div and put it onto the end of the div”

In other words, a purely structural statement. There are no pointers to the top page, no “firstHiddenPane” variables keeping track of things, no ad-hoc circular buffer implementations. There is literally zero presentation logic for this arguably state-of-the art control.

This control’s behavior is built entirely in CSS3. You might expect the appearance – the static image – to be in CSS, but here a fairly complex set of dynamic state is being managed by the CSS engine with high GPU utilization** While we have been approaching this kind of architecture for quite some time in CSS, I think this example implementation represents a threshold in the evolution of the model wherein some large, complex, and hard-to-master user interface constructs suddenly become easier to build on JS/HTML/CSS than on any other UI platform – and promise to run with optimal performance characteristics on modern and future OS platforms.

As an example, I’m rewriting a fairly large and sophisticated view control to use this technique and finding the already-heavily-refactored code to be reduced by more than 50%. At the same time, it’s getting faster by a large measure. The code is also getting far prettier, always a key indicator.

More to share in a future prototype. Love to hear your thoughts about this technique – is it really something new, or just another idiom?

If you don’t want to parse the code, the key CSS concepts used are nth-child, transition, and transform. The insight was how they work together: more on this in a later post, if you can stand not to tinker it out yourself.

* all currrent-gen browsers can do this technique, but cross-browser CSS3 (including all variants) would be much harder to read. For my production work, I typically use a script called -prefix-free from Lea Verou that automatically inserts the right browser prefixes dynamically at runtime.

* certainly all the discrete visual elements are loaded onto GPU “surfaces;” with a suitable set of shaders, the position and opacity values could easily be interpolated on the GPU, meaning all the CPU-side CSS engine would need to do would be to switch shader parameters on the DIV’s surfaces

Tagged , , , ,

the Wild Coast

Here we are back in Santa Cruz – very very happy.

I’m now at eBay working as a Senior Director of Design at the corporate headquarters. Sounds fancy, but I’m not building an empire – currently I’m hand-picking a very small team of interdisciplinary design innovators to make cool stuff as quickly as possible. We look at trends, predict where they’re leading, then build stuff that will be useful in that predicted future, and ideally hasten that future’s arrival. Sometimes this sort of speculative design gives you something that is useful today – but that you might not have found if you hadn’t been daydreaming about the future. This is the latest iteration of a design innovation framework I’ve developed at Apple, MediaX, and Microsoft Research.

Since this blog is intended to be more permanent than previous company-specific blogs, I also get to talk about things that are outside of my current job, like old projects from the past and interesting trends that affect all of us digital creatives.

This new year marks six months since I departed Microsoft and began at eBay. One day one, I began  full immersion in open web standards (especially HTML5 and CSS3) What I’ve learned has been eye-opening; we have reached an inflection point in design expressiveness, providing huge leverage via what is becoming a core design literacy. More to share here soonish.

conference room with multiple temporary workstations and a pretty view of the outdoors

My team currently shares a conference room at eBay. We're about to outgrow it but the view is great and the company is fantastic.

Tagged , , , , ,
Follow

Get every new post delivered to your Inbox.