Yay, it's great to see my old group Microsoft Applied Sciences in the press! Lots of great display and interaction work being done there. Fantastic mix of software, hardware, and optics people. Congrats!
As it becomes increasingly cost effective to manufacture a more diverse set of computing form factors, exploring new ways of providing input to a computer and sending output to a user will become an essential part of developing new genres of computing products. The time when raw computing horsepower was the key differentiator passed us several years ago, and the rate of device specialization has shot up dramatically. Less computing power is fine, if it is where you need it when you need it in the form you need it. As a result, you likely see the speed at which wild interface technology research moves into to product also accelerate.
Consequently, if you are a young engineering student. There will be a steady stream of good jobs for people who like to write software for new kinds of input/output devices. =o)
Yay! This makes me happy. Microsoft officially announces support for Windows Drivers for the Kinect Camera as a free download in the Spring.
This was something I was pushing really hard on in the last few months before my departure, and I am glad to see the efforts of colleagues in the research wing of Microsoft (MSR) and the XBox engineering team carry this to fruition. It's unfortunate this couldn't have happened closer to launch day. But, perhaps it took all the enthusiasm of the independent developer community to convince the division to do this. It certainly would have been nice if all this neat work was done on Microsoft software platforms.
I actually have a secret to share on this topic. When my internal efforts for a driver stalled, I decided to approach AdaFruit to put on the Open Kinect contest. For obvious reasons, I couldn't run the contest myself. Besides, Phil and Limor did a phenomenal job, much better than I could have done. Without a doubt, the contest had a significant impact in raising awareness about the potential for Kinect beyond Xbox gaming both inside and outside the company. Best $3000 I ever spent.
In my opinion, all the press coverage around the independent projects brought a lot of additional positive attention to the product launch. That unto itself became the topic of internationalnews.
But to take this even further, it would be awesome if Microsoft went so far as to hold a small conference to actually showcase people doing interesting projects with Kinect. It is a really great device, and such an outreach program would give Microsoft an opportunity to engage with very enthusiastic partners to potentially build new applications around it both inside and outside of gaming. At the very least, it would be a cheap way to recruit potential hires.
There are lots of smart people outside of Microsoft that would like to build interesting stuff with it. Most of it probably won't be a "Microsoft-scale" business initially, but worth enabling and incubating in aggregate. Though, a large portion of the expert community is already using the Kinect camera in their own projects on just about every OS and every develoment tool in existence. So, Microsoft will need to give researchers and independent developers a reason to go back to thier platform - be it opportunities to engage with people at Microsoft/MSR, other Kinect developers, or opportunities to share thier work though larger distrubtion channels such as XNA, app stores, or XBox downloadable games. We have just seen the beginning of what can be done with low-cost depth cameras.
Since I relocated down to Mountain View, I wanted a good way to keep in touch with my fiance who is still back in Seattle. So, I decided to mount an old netbook I had on top of an iRobot Create to create a video chat robot that I could use to drive around the house remotely. Since it was a good procrastineering project, I decided to document it here.
There are two major components to the project: the iRobot Create which costs around $250 (incl. battery, charger, and USB serial cable) and the netbook which I got for around $250 as well. At $500, this is a pretty good deal considering many commerical ones go for several thousand dollars. The software was written in C# with Visual Studio Express 2010 and only tested on Windows 7 with the "Works on my machine" certifcation. =o) I'm sure there are TONs of problems with it, but the source is provided. So, feel free to try to improve it.
Included are the executable, C# source, and two PDFs: one describing installation and usage of the control software, the other more information about modifying the charging station.
The software does a few nice things like try to setup UPnP router port forwarding automatically, queries the external IP needed to make a connection over the open internet, maintains a network heartbeat which stops the robot if the connection is lost, a control password, auto-connect on launch options, and even mediates the maximum acceleration/deceleration of the motors so it doesn't jerk so much or fall over.
The UPnP port forwarding is far from perfect is not well tested at all. If it works for you, consider yourself lucky. Otherwise, ask a friend how to set up port forwarding to enable remote control over the internet.
Once you have all the parts: the netbook, the robot, the serial cable, the software. You can probably be up an running within 5 minutes. Assembly is merely plugging cables together. Mounting the netbook can be done with velcro or tape. Building the rise stand is more challenging, but entirely optional. I happen to have access to a laser cutter to make my clear plastic stand, but you can probably make something adequate out of wood.
Optional: Modifying the Charging Station
Probably one of the more interesting parts of this project from a procrastineering standpoint is the modifcation to the docking station so that it would charge something else in addition to the robot base.
What I did is admittedly pretty crude and arguably rather unsafe. So, this is HIGHLY NOT RECOMMENDED unless you are very comfortable working with high voltage electricity and accept all the personal risks of doing so and potential risks to others. This is provided for informational purposes only and I am not responsible for any damages or harm resulting from the use of this material. Working with household power lines can be very dangerous posing both potential electrocution and fire risk. This is also unquestionably a warranty voiding activity. DO NOT ATTEMPT THIS without appropriate supervision or expertise.
Now that I've hopefully scared you away from doing this... what exactly did I do? A high level picture is shown here:
The PDF document in the download describes changes in more detail. But, I had a lot of trouble trying to tap the existing iRobot Create charging voltage to charge something else. Primarily, because the charging voltage dips down to 0V periodically and holds for several milliseconds. That would require making some kind of DC uninterruptable power supply and made the project much more complex. The easiest way to support a wide range of devices that could ride on the robot was to somehow get 120V AC to the cargo bay... for those of you with some familiarity with electronics, you probably can see the variety of safety hazards this poses. So, again this is HIGHLY NOT RECOMMENDED and is meant to just be a reference for trying to come up with something better.
I actually do wish iRobot would modify the charging station for the Create to officially provide a similar type of charging capability. It is such a nice robot base and it is an obvious desire to have other devices piggy back on the robot that might not be able to run off the Create's battery supply. I personally believe it would make it a dramatically more useful and appealing robot platform.
Usage Notes
At the time of this post, I've been using it remotely for about a month on a regular basis between Mountain View and Seattle. My nephews in Washington DC were also able to use it chase my cat around my house in Seattle quite effectively. Thus far, it has worked without any real major problems. The only real interventions on the remote side have been when I ran it too long (>4 hours) and the netbook battery dies or having the optional 4th wheel on the iRobot Create pop-off which can be solved with some super glue. Otherwise, the control software and the charging station have been surprisingly reliable. Using remote desktop software like TeamViewer, I can push software changes to the netbook remotely, restart the computer, put Skype into full screen (which it frustratingly doesn't have as a default option for auto-answered video calls), and otherwise check in on the heath of the netbook.
I have some big news to announce on a personal front: Very recently, I have left Microsoft to join a special projects team at Google. After more than 2 and a half years working as a core contributor to the human tracking algorithms for Kinect, it was an extremely difficult decision and I leave behind many great colleagues in Redmond.
It was a wild ride, helping Kinect along through the very early days of incubation (even before it was called "Project Natal") all the way to shipping 8 million units in the first 60 days. It's not often you work on a project that gets a lavish product announcement by Cirque du Soleil and a big Times Square Launch party. The success of Kinect is a result of fantastic work by a lot of people. I'm also very happy that so many other people share my excitement about the technology.
It was great to be a part of such a unique project. I look forward to seeing all the creative and unexpected ways that game developers will use the data from the camera to create fun experiences. The Xbox is exceptionally well positioned to do great things in the entertainment space. It's a great console, and a great platform, with a lot of potential. I genuinely look forward to seeing how it will evolve over the next few years and I absolutely wish the Xbox team the best of luck.
A few months ago, I rediscovered the Khan Academy after stumbling across a presentation by Salman Khan. If you aren't familiar with this, I recommend making it an absolute priority take a quick scan of some of the videos. Here's a summary video from the website:
Sal has an astonishingly approachable and understandable method of explaining topics in his videos, and also has an incredibly deep understanding of the material he talks about hinting at lower levels of complexity that he might be skimming over, but sometimes revisits in future videos. The Khan Academy videos are, in my opinion, perhaps one of the most interesting things to happen to education in a very very long time. If I may, "disruptive". Anyone with an Internet connection can go from basic fundamentals all the way up to a college level education in many topics in a clear, organized, understandable manner... all for free. His teaching style is more effective along many dimensions than any I have personally experienced in any classroom.
Recently, I've realized that I need to learn more linear algebra. Over the years, I have picked up little bits here and there doing computer graphics, basic data analysis, but I never had a proper understanding of it enough to understand why it really works or more importantly... apply it to solve completely new problems that might be somewhat non-standard. I managed to never take a proper linear algebra course in college or grad school.
So, why do I suddenly care about learning linear algebra now? and consequently should you care? Well if you want to understand how the Wiimote Whiteboard program works, you need a little bit of linear algebra. If you want to understand how video games are rendered on the screen you need linear algebra. If want to understand how Google works, how parts of Kinect work, or how the $1 million dollar Netflix Prize was won, financial modeling, or in general analyze the relationship between two large data sets in the world... you need linear algebra. I'm discovering more and more that any modern sophisticated engineering, modeling, prediction, analysis, fitting, optimization problem now usually involves computers crunching on linear algebra equations. Unfortunately, this fact was never properly explained to me in college so I never prioritized taking a class.
I understood the basics enough to do computer graphics, rendering stuff of the screen (3D to 2D). But much of modern computer vision, is about reversing those equations, going from 2D back to 3D, which involves solving a lot of linear algebra equations to recover unknown data. And I've found that computer vision papers seem to be the worst places to look for a clear explanation of the math being performed. The almost appears to be a desire to see how obtusely one can describe their work.
Fortunately, Khan Academy has over 130 videos on Linear Algebra. Since I knew I would be traveling this holiday, I decided to load up all of the videos on my phone to watch during down time. Watching videos here and there while sitting on the plane, trains, buses, or waiting in lines, I was able to watch all 130+ videos, which cover a 1st year college Linear Algebra course, in about 3 weeks. Pure awesome.
However, the Linear Algebra lectures stopped just as I though it was getting to the interesting part. I was hoping it would get to covering topics such as Singular Value Decomposition, numerical analysis, perspective projections...reversing them, sparse matrices, bundle adjustment, and then real-world application examples. I'm going to order some books on these topics, but I really really love the video lecture format Sal uses in the Khan Academy and wish they continued.
The $5000 challenge: As an attempt to continue expanding this lecture series in the Khan Academy I want to encourage people who feel like they can give clearly understandable lectures on these topics to pick up where Sal left off. Apparently, there is an informal method of adding your own videos to the academy. I've already donated some money to the Khan Academy (a not-for-profit 501(c)(3)). But, as a call the community, to incentivize people who are able to produce good video lectures on advanced Linear Algebra - for each video posted (and passes the "clearly understandable", Khan academy style, 10 minute video lecture) that continues the Linear Algebra series I will donate $100 to Khan Academy up to $5000. So, not only would you be educating thousands (possibly millions of people), you will be ensuring that your material stays free.
If you do take me up on this offer and do post a video, let me know at johnny@johnnylee.net, and I will review the video. If it passes the bar, I will donate the money and then send you the receipt.
It's been a long while since I've posted a personal procrastineering project. The past two and half years have been pretty heads down with developing Kinect (congrats to all those involved). But, since that has now successfully launched. I've had a some time to spend on little side projects.
As an exercise to teach myself a little bit more real-time computer vision/robotics, I wanted to see if I could get a computer to autonomously play certain console video games. Video games are nice because they can provide a relatively decent simulation of a 3D environment, emphasize demand for real-time vision processing, I don't have to go out in the field to run a test, and there is no penalty for screwing up. This could also be done with PC games, but rich games are more often console based and it kind of black boxes the activity so I am really forced to depend solely on the data contained in the video stream.
However, a pre-requisite to this activity is programmatically sending controller commands to a console. This is not something that appears to be very common on the intertubes, so I thought I would detail my efforts here to hopefully fill that gap. There used to be a product that would allow you to do this with an Xbox 360 called the XIM2. It was primarily targeting people who wanted to use a mouse and keyboard to play first person shooters on a console. Unfortunately, the XIM2 is now discontinued and the nice looking soon-to-be-available XIM3, I'm told, will not have PC-to-Console control functionality, bummer. So, off to build my own.
The approach I ended up taking is actually very similar to the origins of the XIM. I used a microcontroller to simulate a PS2 controller. The nice thing about the PS2 controller is that it is a relatively simple communication protocol, and due to its vast popularity you can find low-cost adapters to use it with a PS3 or Xbox 360. These are each the order of $5-$10. So by creating a computer controlled PS2 controller, you can support 3 consoles. So, you can programmatically control current generation games as well.
Simulating a controller
Most of my starting points were nice websites such as this and this, which neatly describes the Playstation controller protocol. Most of the material online is geared toward using a PS2 controller with your hobby/robot projects rather than trying to simulate a PS2 controller to command the console. In theory, you simply would reverse the instructions and it would magically work. Unfortunately, that wasn't quite the case. The PS2 console is very picky (sometimes erratic) about timing, goes through a reasonably sophisticated handshaking and configuration process when you plug it in (which is game dependent) before it will accept data from the controller as input. Thus, my controller simulator had to survive that entire start up process looking like a valid controller.
The microcontroller platform I chose was the Teensy 2.0 USB development board which has an ATMEGA32U4 and is programmable via the built-in USB port (so, no external programmer needed). It can be configured to become a variety of USB devices and is only $18. Since it can become a USB serial interface and has hardware for SPI (for the PS2 controller protocol), it can do this entire project without any additional communication hardware.
The PS2 controllers use the SPI protocol for communicating with the console, typically using a transmission clock of 250KHz. But, SOME games, will cause it to shift to 500KHz unexpectedly between messages. So, hardware SPI is a must in the microcontroller. BTW, if you are selecting a different micro, you need to make sure you can simultaneously clock data out as new data is clocked in. Lastly, you need to manually pull the ACK pin low to tell the console something is listening on the line which isn't part of the standard SPI spec.
The PS2 controller initially boots up as a simple Playstation One controller which only provides digital button output (mode 0x41). The game console will make a variety of queries and configuration messages to enable the joysticks, analog button pressure, motors, etc. (mode 0x79) It's unclear what all of the configuration message mean. There are a lot of them, and they vary quite a bit depending on what game is running. Not very simple. But, as long as you mimic the behavior of a valid controller under several games, it appears to work. There was a minor issue that the USB interrupts were causing de-synchronization between the console and micro. So, the USB interrupt handling is disabled while communicating with the console (which is roughly 100us every 15ms). This did not cause an issue with my computers, but may be one possible source of USB errors on other systems just as an FYI.
I won't dive into the details, but above are the firmware files for the Teensy development board. These are (of course) provided at your own risk with no implication of support. There is a precompiled .hex file which you can point your Teensy programmer toward that should get you up and running quickly. The C source is in there. For those of you who are industrious, also provided are logs of the SPI messages sent between a PS2 and a valid controller for a few different games.
It's worth noting that I probably didn't have to make such a thorough simulation of a valid controller to work with the PS3 or Xbox 360 adapter since they are likely much simpler and less varied in their setup handshaking. But, I wanted to use my PS2 since it is just laying around and a better simulation would increase the likelihood some random PS3 or Xbox 360 adapter will work.
Wiring
Wiring the Teensy is pretty easy. You need a PS2 cable (from an old controller or extension cable) that you are willing to cut and solder. This only requires 6 wires, detailed in the picture below. I read online that if you bought a cheap knock-off cable, the color coding may not match. So, refer to this pinout diagram if you are unsure.
Program your Teensy with the .hex file in the archive provided, and the LED should begin to blink in "search mode". When you plug in the PS2 connector to the console (while the USB side is connected to your PC since it is USB powered), the LED should go solid but slightly dim. This means it is steadily chatting with the console. If it is flickering without your involvement, that means it is struggling to pass the setup handshake and you should feel free to debug that problem on your own. =o) It seems to work on my PS2 and my PS3(via adapter). I haven't gotten a Xbox 360 adapter yet, but I'm optimistic this will work assuming the adapter works at all.
Sending commands
As I mentioned, the Teensy 2.0 can be configured to provide a USB serial interface. If you are using Windows, you need to download these drivers. Mac and Linux don't need drivers. Now, a serial port should appear in your device list. Any programming environment that can access a serial port can now send button presses to the game console. When you do send a message, the LED on the Teensy will blink bright.
The Teensy is expecting 7 byte messages of the format:
Joystick neutral is 0x80, up and left are 0x00.
These are the bit masks for the two button bytes, which should be set to 0 when pressed, set 1 otherwise:
The starting 0x5A byte is used as a start byte to help segment message packets. If you send data without prepending with this byte, you will get 'x' characters sent back to you on the serial port which is the microcontroller complaining your messages aren't formatted correctly.
Autonomously Playing Guitar Hero
As I mentioned at the beginning of the post, the whole purpose of this project was to see if I could analyze the video stream from the game console to autonomously play the game. Arguably one of the easiest games to automate is Guitar Hero, which basically turned millions of willing people into bad midi file readers. So, I'm about to use a lot more processing power and hardware to also make a bad midi file reader. =o) In case it wasn't clear from my description above, this is a flow chart of what is going on here:
The Playstation spits out the video image, the PC analyzes that image, and then sends controller events back to the Playstation. Now, this may seem absolutely absurd to a normal human being. But, I'm an engineer at heart and this makes total sense to me. The nice thing is that assuming I can make the "analyze that image" software infinitely smart, I can hopefully play all sorts of games autonomously - perhaps even driving games, first person shooters, or platformers. Doing all that will be tough, but I'll learn a lot in the process. Some of that may even be useful for analyzing things in the real world as well. But, for now, lets just start with Guitar Hero on Expert mode.
For video capture, I'm just using a $30 USB capture device. Since real-time processing was priority for me, 640x480 images @ 30Hz is plenty of resolution to start with. In fact, for this starter project, I'm just using 320x240. For a computer vision library I am using OpenCV. Although it has some eccentrcities, it is a very powerful toolkit for getting up and running quickly. I'm not going to go into too much detail about OpenCV. But roughly, what I do for playing Guitar Hero is:
1. Unwarp the fret board so it is square (using a homography)
2. Use a template match to find notes along each column.
3. Track notes over time to ensure they are not spurrious.
4. Once notes hit a trigger line, queue the button press.
There are a bunch more subtle details to make it work just right. It seems to work pretty well hitting 95% of the notes on expert mode. It doesn't really handle star notes, sustains, star power, or other weird special effects. But, it can finish all the songs on Expert Mode. Thanks to the simple UI design of "just hit the X button to advance", it can play the entire Career mode from start to finish without me touching it. Here's a video of it finishing the last rift of the last expert song.
For those extremely interested, here is a copy of the OpenCV project that does this. This is DEFINITELY provided without any support and requires getting OpenCV installed and running.
It's that time of year again, and we are coming up to UIST 2010 - a research conference on new interface hardware and software technology. Once again, Microsoft Applied Sciences is sponsoring the student innovation contest. Last year, it was the pressure sensitive keyboard. This year is the: Adaptive Keyboard.
From the Applied Sciences page, "[The adaptive keyboard] is a research prototype developed by Microsoft Hardware to explore how combining display and input capabilities in a keyboard can allow users to be more productive. The keyboard incorporates a large, touch-sensitive display strip at the top. In addition, the display continues underneath the keys, allowing the legends to be modified in real time. This lets you do things like change the character set to a different language or display command icons."
If you are interested in participating, visit the UIST student contest page for rules and important dates. The entry deadline is August 17th!
I am engineer that lives in the bay area. This project blog is for personal projects outside the scope of my work. In 2008, I graduated from Carnegie Mellon University with a PhD in Human-Computer Interaction. My research interests are in exploring novel interface technology that can influence the lives of many people. My main website can be found at johnnylee.net