17 posts • joined 18 Dec 2007
A judge or jury should decide...
...if CloudFlare is a content delivery network, or if they are closer to a hosting provider. I think Stratfor should sue CloudFlare, and I don't even like Stratfor! See our tax-exempt, nonprofit site at www.cloudflare-watch.org
This is news?
This is news? Doesn't anyone ever learn anything about what goes on at Google?
Seven years ago, Scroogle.org made its first appearance with a tool that exposed the "Florida" update. You can still read about it at scroogle.org/fiasco.html in an essay titled "The Great Google Filter Fiasco." In December 2003 Google blocked our server to suppress our evidence. Within another month they fixed their algorithm so that it didn't show dramatically different results by adding a "nonsense term." This "nonsense term" worked the same way that adding a comma to CSCO works today.
Since then Scroogle.org has been just a Google scraper without a side-by-side comparison with alternative results, although for a several years we offered a comparison with Yahoo's results. What you see on Scroogle.org today are the so-called "organic" results from Google. This is the old-fashioned Google, before the Universal Search, the customized results, the personalized results, and the IP-location results. When the day comes that we can no longer find a usable Google interface that provides these results, and also shows up to 100 links per page, Scroogle.org will give up.
There is enough funny business going on with these "organic" results that should worry everyone. For example, read google-watch.org/goohate.html about how Google discriminated against my new anti-Wikipedia site for more than a year. I know that Google does "hand jobs" for sites they like, and also for sites they dislike. For years it was all about making more money while pretending to be objective. There's even a problem with "objectivity" -- PageRank was "objective," but only as a measure of popularity. As Google became a monopoly, PageRank operated as sort of a self-fulfilling prophecy, which undermined whatever objectivity it claimed.
Today with Google, the appearance of objectivity is no longer on its agenda. Now it's only about making more money. When you get to a certain size with billions in the bank, you no longer care what people or governments think about you.
Thanks for the suggestions, but...
I appreciate suggestions from Scroogle users, but there are issues that many of them have missed.
As far as I can tell, the www.google.com/pda, www.google.com/xhtml, and www.google.com/m (mobile) searches only serve up a maximum of 10 results per page. The www.google.com/search can do 100 results with the num=100 parameter. So can the news search and the blog search. This parameter has been a Google staple for ten years.
The so-called "simple interfaces" everyone is recommending to me are pathetic in this respect. I like 100 results per page, and there is no way I'd do 10 successive fetches just so I can put them all in one package.
Also, I compared the standard Google output page with the output=ie page that I was using, for the same search term and using 20 links per page. My search was for "obama". The output=ie page came into my server 7,201 bytes and the regular page came in at 63,070 bytes. The regular page is more than eight times the bloat! Scroogle was doing over 300,000 scrapes a day with six or fewer servers, and had performance issues to consider.
Everything at Scroogle was written in compiled 'C' for speed and efficiency. There is no way I would use Perl, or PHP, or Python.
The other issue worth mentioning is that the output=ie interface skipped not only the ads, but all that "Universal Search" stuff that Google added three years ago. I'm talking about news links, book search links, image links, Youtube links, and whatever else Google uses to make you click harder for clean "organic" (I call them "generic") links from non-Google sites. The output=ie dates back to the day when Google wasn't making 97 percent of its revenue from ads. In fact, they were just starting to get interested in ads when it began. It hasn't changed since then, which is probably why it had to come down.
Scroogle will not return unless Google brings back the output=ie interface.
It still surprises me...
It still surprises me that all the attention is focused on the collection of unencrypted WiFi data. Yes, I understand that this is the one characteristic of Google's collection that most clearly crosses the line into illegality in numerous jurisdictions. But for me the line would have been crossed even if Google had not collected payload data.
The MAC address, which is globally unique, is burned into each piece of networking hardware. With the addition of fairly precise geolocation data, probably within a radius of several households or one or two apartment buildings, this is information that takes on an entirely new dimension. It no longer identifies a specific WiFi router or WiFi-enabled smartphone, but also ties it to a time and place. In many cases, this amounts to a fairly short list of suspects. That makes it personally identifiable, given the investigative resources of any government.
This MAC address is part of the WiFi protocol, and all WiFi devices broadcast the MAC in the clear, whether they are encrypted or unencrypted. This is what makes WiFi work. The SSID may or may not be present, but if it is present, it is also sent in the clear. The SSID will make it much easier to exactly identify the owner once the geolocation from the MAC is known, but it's not necessary. All you really need is a search warrant to check out the device that broadcasted the MAC and confirm the number.
All governments would like to have a database of MAC addresses being used for WiFi within their jurisdiction, with a time stamp and precise geolocation data. It's an invaluable resource. Google has this information for some 30 countries.
Even assuming that government intelligence agencies are always good guys, I nevertheless object to Google acquiring this information. If my WiFi router was plugged in while the cam car drove by, Google knows where my WiFi router lives. In the future, my neighbors will be using devices that depend on good geolocation data for Google's advertising feeds, and their device will be sniffing local MAC address automatically, in the background, in order to zero in on my neighbor's location. This information will provide an ongoing confirmation of not only their location, but also my MAC's location, once it is corroborated with other neighborhood MAC addresses. It amounts to an ongoing, dynamically updated, cross-referenced system that no longer needs a Google cam car driving by periodically.
Google did not ask anyone with devices that broadcast MAC addresses if they wished to be part of its evil system. An opt-out is not possible unless you stop using WiFi. That's the essence of the entire issue. Google's so-called "mistake" of collecting unsecured payload data is frosting on the cake because it is getting the attention of the proper authorities. But it's the cake itself that worries me.
Google doesn't know what you clicked on
At the moment, Google does not see the links you click on from a Google results page. This is true with or without SSL. Hold your mouse over a link, and in your browser's status bar you will see the real URL of the target page.
Compare this to Yahoo. Hold your mouse over a link that looks like the target page, and you will see a long ugly string that goes to rds.yahoo.com. Occasionally Google has done some redirects too, but as far as I know, only on a very limited "research" basis. Yahoo is a major culprit here -- for many years they've had at least one, and sometimes two, redirects on every link. I scraped Yahoo for about three years, and it was not much fun parsing out those redirects to get to the actual URL. Obviously, Yahoo and Google both know what search terms you used. But Yahoo also knows what you clicked on from their results page.
One thing to watch out for is whether Google will be tempted to install redirects on their links on SSL pages. That way they'd know what you clicked on, and could sell this data to webmasters. I actually don't think they will do this because it would be too obvious. It would undermine the public relations value of the SSL option they just rolled out.
Another advantage of SSL for search is that the search results page with its links come back via SSL. If you click on a link to some non-SSL page (over 99 percent of all the links will be non-SSL), then when you arrive at that page you will arrive with your referrer stripped. The webmaster on that site won't know that you came from Google, and won't know what search terms you used to get there. He won't even know if you used a search engine (you could have just keyed in the URL in your address bar, which would also cause no referrer). Also, most bots that steal stuff all day long do so without a referrer, which makes you even more obscure.
Sometimes your search terms can be revealing, and it is best to keep these out of the logs of the pages you click on. Remember, these logs always have your IP address. Why give them your search terms too?
The stripped referrer when going from a SSL page to a non-SSL page is part of the SSL specification, which all browsers must follow.
Markoff's unnamed sources are unreliable
I suspect that Mr. Markoff's unnamed sources are pulling his leg again, or perhaps he is over-dramatizing some hearsay. The only time I've spoken with Markoff was to insist on a correction to an article that appeared in the New York Times on June 22, 2004. Here's the correction that was appended to the article:
"An article in Business Day on Tuesday about a revised prospectus for the initial public stock offering of Google misidentified the source of a 'Google bomb,' a form of online manipulation that causes a designated Web page to appear as the first response to a particular phrase search. (In this case, the tactic caused the search engine to reply to the phrase 'out of touch management' by displaying a Web page that described Google's top management.) While a person close to the company said Google employees had engaged in the practice, Daniel Brandt, the operator of a Web site critical of Google, later acknowledged that he was the source."
My effort was a modest little bomb that caused the phrase "out-of-touch executives" or "out-of-touch management" to show Google's corporate-executives page at number one. It took only a few links on a few domains to do this.
By the way, Google did a "hand job" and killed my bomb the following month. Now here was some real news, because Google constantly claimed at that time that they never mess with decisions made by their brilliant computers. Guru Danny Sullivan himself swore up and down that Google would never do such a thing, and he and I argued at length over this. The bomb still works in Yahoo and Bing, by the way, even though I took down my links immediately after Google's hand job.
This important news about the hand job didn't get reported by NYT. My next Google bomb will be "out-of-touch reporters." Will that one get blamed on his NYT colleagues, or will Markoff's editors catch him this time?
Here's what should happen next...
Now that the settlement is dead, the Justice Department should ask Google to stop all scanning of in-copyright works, and place all previously-scanned, in-copyright works that were scanned without express permission of the rights holder, in a dark archive. Google can use them when opt-in permission of the rights-holder is obtained, or when Congress or the Supreme Court resolves copyright infringement issues.
Google acquired those books illegally
As the plaintiffs argued when they file their lawsuit four years ago, Google acquired most of these books illegally. What changed the plaintiffs' mind? Well, the proposed Settlement shows Google agreeing to pay up to $30 million to the plaintiffs' attorneys.
It's the broad, international public interest that's at issue, not the interests of pirate-Google fanboys who want the convenience of access by mouse.
It's a book grab, pure and simple, and there's a new website that tries to refocus on the grab itself rather than the intricacies of a convoluted settlement arranged by greedy attorneys who pulled a class-action rabbit out of their hat. Check it out: http://www.book-grab.com/
This is a job for the Antitrust Division, not for some district judge presiding over a civil case.
The settlement should be scuttled
In the original agreement between the University of Michigan and Google, which was confidential and was acquired in mid-2005 only by using a Michigan freedom of information law, Google indemnifies the University against any and all legal threats. At that time the major concern was whether the University might be sued under the "fair use" provisions of U.S. copyright law for allowing Google to copy its books. That guarantee from Google got Google's huge foot into Michigan's door, and the scanning commenced.
As it turned out, no participating universities were named as defendants by the authors and publishers, and the settlement itself avoids any mention of "fair use." Nevertheless, many feel that the "fair use" language in U.S. copyright law remains a key issue. The settlement, announced last October, included Google's agreement to pay for plaintiff's attorney's fees up to $30 million, subject to court approval. That helps explain why the plaintiff's attorneys are happy with it.
But we still don't know why a handful of authors, publishers, universities, and Google are calling all the shots, and it doesn't explain why the the concept of "reader privacy" is not mentioned anywhere in the settlement, or in the extended agreement with the University of Michigan announced this week. We do learn that Google is giving the University of Michigan a free ride on subscription fees for 25 years. Is it possible that Google's money is more powerful than a public university's interest in the public good?
"We are always concerned about protecting our users' privacy and privacy in general, but we have no particular concern with Google or other search engines in a networked world," said James Hilton, University of Michigan librarian, in June 2005. Does this mean that the University of Michigan plans to stuff Google's cookies down the throats of students and faculty for the next 25 years? We still have no idea how that's going to work. That's one of the reasons why the proposed settlement should be rejected by the judge.
Google is already watching you watch YouTube
For many months now, any YouTube video that you view anywhere on the web, even if it's on Obama's site, phones home to google.com with a GET request that shows the page that the YouTube video was on, and the ID of the video you are watching. It happens several seconds into the video, and the phone home is done by YouTube's Flash code. While at google.com, your browser offers up your Google cookie with the globally-unique ID. Google adds a date and time stamp, and your IP address.
So you may as well combine your accounts, because then you won't forget that Google is watching you watch YouTube. And you won't have to fire up your packet sniffer to discover this.
Schmidt wants to dumb-down the public?
Let me get this straignt -- in Eric Schmidt's brave new world of goonews, readers will only see news that their past reading history suggests that they may want to read? In other words, if America suddenly institutionalizes torture in its war against terror, no one will read about it on Google -- simply because the issue has not been on the American political agenda prior to that war?
There are certainly problems with a lack of objectivity in mainstream media, due to various economic, political, and ideological pressures, as well as secrecy in high places. But putting readers in a box and feeding them only what they prefer to read is a huge step backwards. This might be okay for mindless entertainment media, but not for news media. Taken to the extreme, it would make everyone oblivious to global current events, and unqualified to vote in a democratic society.
We know that Schmidt pushes this because the ads alongside the news will be highly targeted and make Google richer than it already is. But for Schmidt to pretend that it's also a step forward for society is absurd and potentially dangerous.
Be sure to wash your hands after surfing
Think about how many times a day you click to watch a YouTube video, no matter which site it's on. It might be Obama's weekly chat at change.gov, or even Consumer Watchdog's YouTube video on Chrome's privacy problems. Before you even click to watch the video, you collected several YouTube cookies. And after you click to watch, about ten seconds into the video, Google reads your universal google.com cookie. This is the one with the globally-unique ID. It used expire in 2038, but now it pretends to expire in two years. However, every time you visit any Google site, it gets pushed two years ahead, which means it expires when your hard disk is replaced.
If you don't already have a Google cookie, you get a new one with a new ID. If you have one already, it reads the old cookie. Put your PC on a packet scanner and click on a YouTube video. The GET request to google.com, which apparently is done from the embedded Flash code from YouTube, includes the site you are on, as well as the video you are watching.
This information is available to the U.S. government without a court order. It's called a "National Security Letter" and when Google gets one, it comes with a gag order. How many other governments around the world have similar laws?
Delete your Google cookies and your YouTube cookies when you exit your browser. It's common-sense hygiene - the equivalent of washing your hands after you visit a dirty bathroom at a gas station.
I don't care about Accept-Encoding
I'm not acting on the Accept-Encoding information. I like the htaccess file shown at www.avg-watch.org and I'm not going to change my recommendation based on this new information. I guess you could say that I don't trust AVG to be consistent about this, even if it is reliable information at this point in time.
But my htaccess file does have an extra condition, in that it allows the HEAD requests from AVG to pass through without redirection. These requests are only 11 percent of my approximately 7,000 AVG requests per day. I discovered yesterday that the effect of AVG's HEAD request is such that if it is redirected, then AVG detects this and does not immediately follow with a GET request from that same IP address. This means that AVG's HEAD request is being used to detect redirection. I've lost any capacity to trust AVG, and I'm not willing to give them information about what my sites are doing or not doing with regard to LinkScanner. That's why I let the HEAD requests through unmolested. They're only a few bytes anyway. The GET request that immediately follows that HEAD request from the same user does get redirected.
Yes, I'm probably redirecting some non-LinkScanner users, but they must have a MSIE 6.0 user-agent that is truncated after the SV1, and they have to be coming into my site without a referrer. Both of these conditions are relatively rare. The first one is rare because there's usually lots of junk added by most browsers to the simple user-agent used by LinkScanner, and the second one is rare because my sites that redirect LinkScanner are not the sort of sites that users often bookmark for further reference. (If they come into my site from their own bookmark, then they'd come in without a referrer.) Put the two conditions together, and I don't think I'm redirecting very much legitimate traffic.
I just started avg-watch.org
I think AVG has made a big mistake with LinkScanner.
Us "common folk" webmasters need to protect ourselves from greedy dot-coms. I'm collecting log info from my sites, and unless AVG abandons prefetching of search-engine results, I plan to make available a list of the IP addresses of AVG users I've collected. It won't happen until I have a few thousand or so to start it off.
With such a list, webmasters won't have to rely on the user-agent. No, it will never be as good as a reliable, unique user-agent. But by adding an IP address search engine on this new site, as well as making the list available for download so that other webmasters can use it as they wish, it will help focus attention on AVG's users.
My message to these users is, "Turn the LinkScanner off! We're watching you watching our sites!"
Go for the dumb, plain sticks
I'm a firm believer in completely dumb flash sticks, and own several Kingston DataTraveler sticks that come FAT32-formatted with zero software on them. They work on XP, Mac, and Linux. I also own a 4 GB non-encrypted Sandisk stick that came with software. I spent hours trying to delete hidden directories, but they keep popping up. XP sees it as two drives, one that is read-only and auto-launched, and the other where you can put data. Sandisk is trying to lock you into their thing, and I'm ready to throw their stick into the trash.
The reason why a dumb flash stick is important is because sticks are finding all sorts of uses. I have a DVD player from Philips for my digital television. It can play my MP3 tracks through its USB port. It likes my Kingston sticks, but does not recognize the Sandisk stick at all.
If you want encryption, don't fall for any stick that comes with its own software. Get a dumb stick with zero files on it, and put your own encryption software on it.
"Conflict of interest" puts it mildly
It gets even more ridiculous in my case. An uber-administrator named SlimVirgin started a biographical article on me in September, 2005. Nine months earlier she dismissed my life's work on a talk page for an article she was acquiring for political purposes. Her dismissive comment said, "I removed Daniel Brandt. He's not a credible source..."
After months of Internet sleuthing, it turns out that SlimVirgin is one and the same as Linda Mack, who was a PanAm 103 researcher for Pierre Salinger at ABC News in London in 1989-91. Yes, this is the same Linda Mack who ordered sources from my library at that time because they were reliable, and never paid the invoice. The same Linda Mack whom ABC bureau chief Salinger came to believe was working as a secret MI5 asset all along, and locked out of her office after ABC was raided by Scotland Yard. They were looking for information on the Libya suspects. The same Linda Mack who organized a petition against a documentary film on Lockerbie that presented an alternative theory that ran counter to the official CIA/FBI/MI6/MI5 line that blamed Libya.
More than two years after SlimVirgin started this biographical article on me (which immediately rose to the top of all search engines in a search for my name), I've finally managed to get it mostly deleted. That meant fighting anonymous teenagers for two years. It wasn't particularly fun, but at least it was educational. I now call it "Spookypedia."
Wikipedia is beyond repair. It needs to be dismantled, first one administrator at a time, and then one anonymous teenager at a time. Fortunately, there's considerable overlap between these two categories, which means it's not quite as hard as it looks.