Monthly Archives: September 2016
One way to handle big data is to shrink it. If you can identify a small subset of your data set that preserves its salient mathematical relationships, you may be able to perform useful analyses on it that would be prohibitively time consuming on the full set.
The methods for creating such “coresets” vary according to application, however. Last week, at the Annual Conference on Neural Information Processing Systems, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory and the University of Haifa in Israel presented a new coreset-generation technique that’s tailored to a whole family of data analysis tools with applications in natural-language processing, computer vision, signal processing, recommendation systems, weather prediction, finance, and neuroscience, among many others.
“These are all very general algorithms that are used in so many applications,” says Daniela Rus, the Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science at MIT and senior author on the new paper. “They’re fundamental to so many problems. By figuring out the coreset for a huge matrix for one of these tools, you can enable computations that at the moment are simply not possible.”
As an example, in their paper the researchers apply their technique to a matrix — that is, a table — that maps every article on the English version of Wikipedia against every word that appears on the site. That’s 1.4 million articles, or matrix rows, and 4.4 million words, or matrix columns.
That matrix would be much too large to analyze using low-rank approximation, an algorithm that can deduce the topics of free-form texts. But with their coreset, the researchers were able to use low-rank approximation to extract clusters of words that denote the 100 most common topics on Wikipedia. The cluster that contains “dress,” “brides,” “bridesmaids,” and “wedding,” for instance, appears to denote the topic of weddings; the cluster that contains “gun,” “fired,” “jammed,” “pistol,” and “shootings” appears to designate the topic of shootings.
Joining Rus on the paper are Mikhail Volkov, an MIT postdoc in electrical engineering and computer science, and Dan Feldman, director of the University of Haifa’s Robotics and Big Data Lab and a former postdoc in Rus’s group.
The researchers’ new coreset technique is useful for a range of tools with names like singular-value decomposition, principal-component analysis, and latent semantic analysis. But what they all have in common is dimension reduction: They take data sets with large numbers of variables and find approximations of them with far fewer variables.
Machines that predict the future, robots that patch wounds, and wireless emotion-detectors are just a few of the exciting projects that came out of MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) this year. Here’s a sampling of 16 highlights from 2016 that span the many computer science disciplines that make up CSAIL.
Robots for exploring Mars — and your stomach
- A team led by CSAIL director Daniela Rus developed an ingestible origami robot that unfolds in the stomach to patch wounds and remove swallowed batteries.
- Researchers are working on NASA’s humanoid robot, “Valkyrie,” who will be programmed for trips into outer space and to autonomously perform tasks.
- A 3-D printed robot was made of both solids and liquids and printed in one single step, with no assembly required.
Keeping data safe and secure
- CSAIL hosted a cyber summit that convened members of academia, industry, and government, including featured speakers Admiral Michael Rogers, director of the National Security Agency; and Andrew McCabe, deputy director of the Federal Bureau of Investigation.
- Researchers came up with a system for staying anonymous online that uses less bandwidth to transfer large files between anonymous users.
- A deep-learning system called AI2 was shown to be able to predict 85 percent of cyberattacks with the help of some human input.
Advancements in computer vision
- A new imaging technique called Interactive Dynamic Video lets you reach in and “touch” objects in videos using a normal camera.
- Researchers from CSAIL and Israel’s Weizmann Institute of Science produced a movie display called Cinema 3D that uses special lenses and mirrors to allow viewers to watch 3-D movies in a theater without having to wear those clunky 3-D glasses.
- A new deep-learning algorithm can predict human interactions more accurately than ever before, by training itself on footage from TV shows like “Desperate Housewives” and “The Office.”
- A group from MIT and Harvard University developed an algorithm that may help astronomers produce the first image of a black hole, stitching together telescope data to essentially turn the planet into one large telescope dish.
Tech to help with health
- A team produced a robot that can help schedule and assign tasks by learning from humans, in fields like medicine and the military.
- Researchers came up with an algorithm for identifying organs in fetal MRI scans to extensively evaluate prenatal health.
- A wireless device called EQ-Radio can tell if you’re excited, happy, angry, or sad, by measuring breathing and heart rhythms.
Today, loading a web page on a big website usually involves a database query — to retrieve the latest contributions to a discussion you’re participating in, a list of news stories related to the one you’re reading, links targeted to your geographic location, or the like.
But database queries are time consuming, so many websites store — or “cache” — the results of common queries on web servers for faster delivery.
If a site user changes a value in the database, however, the cache needs to be updated, too. The complex task of analyzing a website’s code to identify which operations necessitate updates to which cached values generally falls to the web programmer. Missing one such operation can result in an unusable site.
This week, at the Association for Computing Machinery’s Symposium on Principles of Programming Languages, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory presented a new system that automatically handles caching of database queries for web applications written in the web-programming language Ur/Web.
Although a website may be fielding many requests in parallel — sending different users different cached data, or even data cached on different servers — the system guarantees that, to the user, every transaction will look exactly as it would if requests were handled in sequence. So a user won’t, for instance, click on a link showing that tickets to an event are available, only to find that they’ve been snatched up when it comes time to pay.
In experiments involving two websites that had been built using Ur/Web, the new system’s automatic caching offered twofold and 30-fold speedups.
“Most very popular websites backed by databases don’t actually ask the database over and over again for each request,” says Adam Chlipala, an associate professor of electrical engineering and computer science at MIT and senior author on the conference paper. “They notice that, ‘Oh, I seem to have asked this question quite recently, and I saved the result, so I’ll just pull that out of memory.’”
“But the tricky part here is that you have to realize when you make changes to the database that some of your saved answers are no longer necessarily correct, and you have to do what’s called ‘invalidating’ them. And in the mainstream way of implementing this, the programmer needs to manually add invalidation logic. For every line of code that changes the database, the programmer has to sit down and think, ‘Okay, for every other line of code that reads the database and saves the result in a cache, which ones of those are going to be broken by the change I just made?’”