Wednesday, April 30, 2008

 

Google Comes Knocking In Search Of Hidden Data

Google Comes Knocking In Search Of Hidden Data

By crawling using HTML forms (and abiding by robots.txt), Google claims it leads search engine users to documents that otherwise would not be easily found -- but privacy concerns remain.


By Thomas Claburn, InformationWeek
April 14, 2008
URL: http://www.informationweek.com/story/showArticle.jhtml?articleID=207200561



Google on Friday said that it has been testing ways to index data that is normally hidden to search engine crawlers, a change that should improve the breadth of information available through Google.

The so-called "hidden Web" that Google has begun indexing refers to data beyond static Web pages, such as Web pages generated dynamically from a database, based on input such as might be provided through a Web submission form.

"This experiment is part of Google's broader effort to increase its coverage of the Web," Google engineers Jayant Madhavan and Alon Halevy said in a blog post. "In fact, HTML forms have long been thought to be the gateway to large volumes of data beyond the normal scope of search engines. The terms Deep Web, Hidden Web, or Invisible Web have been used collectively to refer to such content that has so far been invisible to search engine users. By crawling using HTML forms (and abiding by robots.txt), we are able to lead search engine users to documents that would otherwise not be easily found in search engines, and provide Webmasters and users alike with a better and more comprehensive search experience."

Robots.txt is a file Web publishers place on their servers that specifies what data can or can't be accessed by crawling programs, should those programs chose to abide by its rules.

In their post, Madhavan and Halevy twice mention that Google follows robots.txt rules, perhaps to allay fears that Google's more curious crawler will expose sensitive data. Google's wariness of being seen as an invader of privacy is underscored by the fact that its two engineers characterize the Google crawler as "the ever-friendly Googlebot."

"Needless to say, this experiment follows good Internet citizenry practices," Madhavan and Halevy said in their post. "Only a small number of particularly useful sites receive this treatment, and our crawl agent, the ever-friendly Googlebot, always adheres to robots.txt, nofollow, and noindex directives. That means that if a search form is forbidden in robots.txt, we won't crawl any of the URLs that a form would generate. Similarly, we only retrieve GET forms and avoid forms that require any kind of user information."

Given that Google has and continues to be accused of disregarding privacy concerns -- a charge it has and continues to rebut -- such prudence is quite understandable.

In a 2001 paper, Michael K. Bergman, CTO of BrightPlanet, estimated that the hidden Web was 400 to 550 times larger than the exposed Web. Though it's not immediately clear whether this ratio still holds after seven years, Google's decision to explore the hidden Web more thoroughly should make its massive index even more useful, and perhaps even more controversial.

Indeed, not everyone has been won over. In a blog post, Robin Schuil, a software developer at eBay, criticized what Google was doing for creating an extra burden on sites.

He said it's "really awfully close to what some of the search engine spammers do: targeted scraping of Web sites."

Labels:


Monday, March 03, 2008

 

Google-Powered Hacking Makes Search A Threat

Google-Powered Hacking Makes Search A Threat

A hacker group has released Goolag Scanner, a tool that scans Web sites for vulnerabilities.


By Thomas Claburn, InformationWeek
Feb. 22, 2008
URL: http://www.informationweek.com/story/showArticle.jhtml?articleID=206801429



Over the past few years, cybersecurity professionals have watched as the cinematic cliche of police with pistols being outgunned by thieves with automatic weapons has become applicable to their industry. Increasingly, they find themselves defending against automated attacks that can easily overwhelm the technologically underequipped.

Wednesday saw the debut of the latest such tool, which derives its power from Google's vast index. That's when the Cult of the Dead Cow, the self-proclaimed "world's most attractive hacker group," released a Web auditing tool called Goolag Scanner.

"It's no big secret that the Web is the platform," said cDc official Oxblood Ruffin, in a statement. "And this platform pretty much sucks from a security perspective. Goolag Scanner provides one more tool for Web site owners to patch up their online properties. We've seen some pretty scary holes through random tests with the scanner in North America, Europe, and the Middle East. If I were a government, a large corporation, or anyone with a large Web site, I'd be downloading this beast and aiming it at my site yesterday. The vulnerabilities are that serious."

To prove that point, Ruffin provided InformationWeek with a list of 11 high-profile U.S. government agency and lab Web sites that had been scanned and found to have what appear to be significant security holes, including satellite access codes, credentials for VPNs and routers, and open proxies. He asked that the information not be published, as the group's intent is not to embarrass government officials or encourage attempts to hack government systems.

The Department of Homeland Security, which Ruffin several weeks ago said was notified of the flaws, did not respond to a request for comment.

Goolag Scanner presently exists only as a Windows application, though it is being ported to other platforms. It allows the user to quickly scan Google's index for files on Web sites that may reveal security vulnerabilities. For example, Goolag Scanner allows you to search Web sites for containing file called "unattend.txt," which is used to drive unattended Microsoft Windows installations. The file may include information useful to hackers, such as administrator passwords.

Goolag Scanner doesn't do anything a hacker or penetration tester couldn't do by typing text into Google and using certain operator commands to constrain the search to a specific domain or file type. But it makes searching for holes much easier.

"The Goolag Scan tool isn't especially innovative in terms of the methods it implements," said Mark Kraynak, senior director of strategic marketing for data protection company Imperva, in an e-mail. "These techniques have been well known in the security community for some time."

What is does do, Kraynak said, is allow less-sophisticated attackers to exploit application and data layer vulnerabilities. "This will result in even more application attacks," he said. "This is bad news, since SQL Injection and Cross-Site Scripting already rank among the most common attacks lodged against online applications. ... The bad guys now have automatic weapons, so as a security community we need to upgrade our defense systems for these new threats."

What that means, in addition to addressing specific vulnerabilities, is defending against search.

As Petko D. Petkov, founder of security consulting firm GnuCitizen, explained in a blog post on Friday, search engines can be used very efficiently to collect information about vulnerabilities, particularly metadata that isn't ordinarily indexed.

Petkov proposes using the Amazon Web Services platform to build a custom search application for identifying vulnerabilities. "By using Amazon's Services and more specifically their Elastic [Compute] Cloud infrastructure, attackers can gain immense scalability, which they can use for their own evil good," he explained. "The cloud allows developers to spawn ritualized instances of any type of operating system, which can be instructed to go through any kind of heavy machine processing task, such as crawling Web sites, port-scanning, etc. The information can be stored on Amazon's Simple Storage Service. The whole package is quite cheap and very affordable."

But for the organization that gets hacked, the expense could be considerable.

Labels: ,


 

For Sale: Passwords To Fortune 500's Servers

For Sale: Passwords To Fortune 500's Servers

Cybercriminals are paying premiums based on compromised sites' Google PageRank to buy thousands of login names and FTP credentials, a security software company reports.


By Thomas Claburn, InformationWeek
Feb. 27, 2008
URL: http://www.informationweek.com/story/showArticle.jhtml?articleID=206900557



More than 8,700 FTP login names and passwords, some of which grant access to Fortune 500 servers, are being sold online through a sort of eBay for stolen data, a security company revealed this week.

Prices vary in relation to the Google PageRank of the compromised sites. The customers are cybercriminals who seek access to trusted sites in order to launch malware or hide files.

Finjan, a computer security company based in Israel, made the discovery and elaborates on its findings in its February Malicious Page of the Month report.

Finjan CTO Yuval Ben-Itzhak describes the online crime database application the company found as "the holy grail of hackers." It contains the "hacked FTP credentials of very large companies, some of them in the Fortune 500." More than 100 stolen login names are associated with one of the 500 most visited Web sites on the Internet, as measured by Alexa.com.

"There is a whole industry of buying and selling all these stolen credentials," said Ben-Itzhak. "It opens for us a new window to see how they really manage to infect all these companies and legitimate Web sites very quickly."

Ben-Itzhak declined to be more specific to avoid embarrassing the affected organizations but said that one of set of FTP credentials found granted access to a state court Web site. A state court site appears on p. 14 of the Finjan report, but the URLs in the printed screen shot have been obscured to prevent identification.

However, a Google search for a conspicuous portion of one of the obscured URLs suggests that the featured site belongs to California's Mono County Superior Court. (The Great Seal of the State of California can be easily identified on the Web site screen shot in the report despite an effort to blur it.)

A spokesperson for Finjan said the company could not name the compromised organizations it had identified for legal reasons.

Robert Dennis, the executive officer of the Mono County Superior Court, said he is not aware of the Finjan report or of any current problem with the court's Web site. However, he said that in January he had moved the court's Web site to a new ISP, and from a .gov domain to a .org domain, and that there had been occasional security issues in the past with the court's old ISP and site. The semi-obscured court URL in the Finjan report shows a .gov address.

"When we were with the prior host, we would occasionally have a problem where someone would hack the site," Dennis said, noting that it might have happened two or three times over the course of a year. "Somebody was adding code to our home page."

Dennis declined to name the court's old ISP, a large hosting provider that had served the court for eight years, but said a technical contact there had told him about difficulties keeping a specific server clean. "The guy said they'd clean it out and [the malware] would come back," he said.

The countries of origin for the stolen FTP credentials include the United States (2,621), Russia (1,247), Australia (392), and various Asia-Pacific Region countries (354), to name a few.

The Finjan report also says that the creators of crimeware toolkits have adopted the software-as-a-service model. It describes Neosploit 2.0, a Web-based hacking application that provides detailed infection statistics and other attack management tools. The result, as Ben-Itzhak describes it, is push-button cybercrime.

Labels: , ,


This page is powered by Blogger. Isn't yours?