Opera Software on Wednesday revealed a search engine that indexes structural information about Web pages so Web developers and standards bodies can see what technologies are being used to build Web sites and how they are being used.
The Metadata Analysis and Mining Application search engine -- "MAMA" for short -- is being tested by the company and should be released in an invitation-only beta by the end of the year, said Snorre Grimsby, vice president of quality assurance at Opera in Oslo, Norway.
MAMA grew out of tests Opera routinely does to make sure its own browser software products work well with existing Web pages that use the most commonly used Web site-creation technology, he said.
"We realized internally that we needed to be able to find lots of live sites out there that used certain technologies in certain combinations so we could test our browser on them," Grimsby said.
The resulting search engine crawls the Web, but instead of indexing the content of Web sites, as most search engines do, it discards the content and indexes the types of technologies being used on sites, such as Cascading Style Sheets (CSS), Hypertext Markup Language (HTML), XHTML (Extensible HTML) and the like, Grimsby said.
This information is helpful for Web developers, who can use MAMA to identify sites that are using certain kinds of technology and see how other developers have implemented it, he said.
"It's a known fact that Web developers borrow ideas from each other," Grimsby said. If developers are working with a Web application that needs, for example, a new menu system, MAMA can help them find sites that use the technology being considered to build the system to get ideas for their own implementation.
Developers also can use MAMA to see how well sites conform to current World Wide Web Consortium (W3C) specifications for commonly used Web standards, such as CSS, HTML and others. The W3C oversees the creation and maintenance of specs for many of the most prevalent Web-site development technologies.
Grimsby said that in Opera's own use of MAMA, Opera found that the average Web page has 47 discrepancies in how the site renders W3C-maintained technologies and the W3C specifications themselves.
MAMA also can be useful for the W3C and other standards bodies to help them set priorities for developing specifications. For example, if a technology is used a certain way on the majority of Web sites, or not used very much at all, the W3C "can change the spec or take something out of the spec," Grimsby said.
During an interview Wednesday, Grimsby demonstrated MAMA in real time by using it to crawl an International Data Group Web page, http://www.idg.net/idgns, to find out what technologies the site used.
According to the search engine, the site is running on version 2.2.8 of the Apache Web Server on a Windows 32-bit hardware server, has 56 hyperlinks and uses XHTML (Extensible HTML) 1.0 and CSS, he said.
In the next eight weeks Opera expects to publish a series of articles on its developer Web site about its own internal use of MAMA, noting key findings, statistics and trends the search engine discovers, he said.
By the end of the year, the company will invite key people within standards bodies to test the search engine, with a goal of releasing it publicly to developers sometime in the first or second quarter of next year, Grimsby said.
Latest on Web Analytics
- Google adds analytics for Apps admins
- Hot search terms: Joe the plumber, 'lipness test'
- Palin piques the blogosphere
- US Election: What's in a domain name
- Friending Obama
- Can the Web predict the next president?
- Opera to Web developers: Come to MAMA
- Google will appeal German copyright decisions
- Google in curious alliance with click-fraud detection firm
- Yahoo investor: Sell company to Microsoft for $22 a share
Digital Marketing Essentials
- IDC: US Internet ad spending to boom
- Motrin maker feels pain from social media backlash
- Sensis’s Yellow finds home on Google Maps
- Elastic stretches campaign management to Web 2.0
- Marketers unlikely to increase spending
- Beyond Phelps: The Olympics' big high-tech winners
- Why Web design is an IT-marketing tug-of-war
- Life after page views: Web analytics 2.0
- What is SEO?
TechWorld Jobs (beta)
Recent Jobs
TechWorld Blogs
-

TalkingTech
The view from the top of IT with TechWorld Editor Rodney Gedda
-

Entrenched
Cooking up better code, IDG's developers reveal some of their secrets
-

Broadband Voice
Darren Pauli digs in from the front line of Australia's broadband battleground
Recent blog posts
- A Novell approach to business
- An open storage stack? I like the sound of that
- The mobile clone wars: fighting for a better phone experience
- Stopping the "Clean Feed"
- Identifying web platforms
- Clean Feed ‘not technically possible’
- No Clean Feed - well duh!
- Conroy's content cops still on the cards
- Will open source ruin the economy? Please help
- Linux kernel 2.6.27 is out!
Recent comments
- Hello this is Brianna
22 hours 6 min ago - Turn any PC into a media center
1 day 11 hours ago - How About the Correct Title?
2 days 3 hours ago - who are you kidding?
2 days 8 hours ago - Seriously, how much did they pay for this advertisement
3 days 23 hours ago - SF Bay Area - free Seminar on Enterprise Cloud Computing
4 days 2 hours ago - video conferening but not telepresence...
4 days 9 hours ago - SAMSUNG OLED 40" TECHNOLOGY
4 days 18 hours ago - What was the question again, oh well this was prepared earlier
1 week 1 hour ago - Worldwide broadband prices continue to drop which means ? in AU
1 week 1 hour ago - Not a Problem Here in Australia and New Zealand
1 week 2 days ago - Clear the air
1 week 3 days ago - Tabbed browsing, Quick Find,
1 week 5 days ago - Microsoft details plans for new social bookmarking tool
2 weeks 45 min ago - There is a 3rd party tool
2 weeks 1 day ago - Demise of Windows
2 weeks 1 day ago - new OS
2 weeks 1 day ago - Re: Favicon
2 weeks 3 days ago - Multi Camera Kino
2 weeks 3 days ago - Favicon
2 weeks 4 days ago



