Marti Hearst talks about user interfaces and visualization for information retrieval, empirical computational linguistics, text data mining, and the Flamenco Search Interface Project.
|
Episode 12: In Search of a Better Search
>> See all videos in this series
|
|
It's hard enough to find a piece of paper on your physical desk. At least that's a finite physical space. What happens when the data collection is vast?
Increasingly, searching for information on the Web or within a particular domain — that is, a doctor trying to make sense of all the medical papers being released — is becoming a huge problem. That's why our latest DevSource video interview is with Marti Hearst, an associate professor in the School of Information at UC Berkeley. Hearst discusses user interfaces and visualization for information retrieval, empirical computational linguistics, text data mining, and the Flamenco Search Interface Project.
Despite a lot of interesting alternatives, Hearst says, the UI for search for the Web or collections of data continues to be text-based. We find it easiest to scan through a list of titles, perhaps helped along by highlighted keywords.
But, says, Hearst, people want organization in a query result: some structure that they can navigate and search in. As a developer, you want to create web sites that allow easy navigation and browsing through the collection, move from one category to another, browse in a systematic way. The aim, Hearst says, is to create a natural experience, as though you're browsing bookshelves. That's among the reasons for the Flamenco Search Interface Project she's involved in, which involves "faceted metatdata:" the use of different categories to organize items, and to add hierarchy in the categories.
The tag phenomenon is really interesting, in the context of text search. Tags enable people to spontaneously add meaningful information to data collections, but, points out Hearst, people don't necessarily choose the same tags. "All we need is some algorithms that look for commonalities among the tags," she says.
Perhaps the most useful part of this video is the clear distinction she makes between between search (finding one item in all the billions of options, moving stuff you aren't interested out of the way), text mining (knitting together something new from several data nuggets), and data mining (looking for patterns and trends). Hearst gives a few useful examples from the business intelligence community, such as the use of text analysis to improve customer service and to automatically detect or hypothesize causes of disease by examining text links that were not anticipated.
To watch the video in Windows Media Player, click here.
Watch all the videos in the Great Minds in Development series!
Tell us what you think of the video (and the series!) in the DevSource Forum.