Friam - Re: [sfx: Discuss] Fwd: DuckDuckGo

Friam

Re: [sfx: Discuss] Fwd: DuckDuckGo

Posted by Steve Smith on
URL: http://friam.383.s1.nabble.com/Re-sfx-Discuss-Fwd-DuckDuckGo-tp6502174.html

Owen -
> I thought this might be of broader interest:
>
> This article: http://dontbubble.us/ discusses the "bubble effect" of
> search engines, where you slowly evolve into a bit of a ghetto. Your
> search usage creates a profile that can paint you into a corner.
After reading the DuckDuckGo article, I was (mildly) puzzled (offended?)
by their rhetoric. They act as if deleting a feature (personalized
search results based on prior searches) is a big plus. At best, it
might be a preferred default? Their strategy seems to be to prey on the
naivete and the paranoia of the masses to make their less capable search
engine seem more capable? I don't doubt they are working hard on other
features but to make their *lack of personalization* out to be the
prime feature seems... duplicitous.

Search engines are essentially "recommender" systems. One strategy for
improving the recommendation *is* to track searches and customize
results. If you don't sign into Google, I don't think they apply this
to your search results at all (at best for the subnets where you
access?). You can also clear your history, selectively edit it, and
turn it off (Pause they call it).

I happen to have multiple google IDs but remain logged out of Google
most of the time. For those of you who have given over to letting
Google manage your mail, this is probably too inconvenient (logging
out/in all the time). I log in for google docs and for blog management
now and again, with my different IDs. Each of the IDs roughly
corresponds to one of my alter egos or personalities. For example, my
personal interests overlap my professional interests, but only to a
modest extent. In principle, my personal account profile and my
professional account profile will be informed differently and produce
different results.

3 years ago when one of our new kittens was dying and I was searching
far and wide for information, I was annoyed (offended?) by the many ads
popping up trying to sell me catfood, cat leashes, cat nip, cat toys,
even cat pet insurance. They knew I had a cat and was interested in
cat things, but didn't know that the very same cat was nearly dead and
wouldn't be needing any of the stuff they were peddling. If there was a
human in the loop, it would have been quite rude.

I also think referring to it as a bubble is part of their duplicitous
rhetoric (any marketing, self-promotion is going to use this). If
anything, I would compare it to canalization on an epigenetic landscape.
Of course that metaphor would be lost on most of their (potential)
users. I use the term because it feels more accurate... essentially,
there is an "erosion" of the search landscape going on, informed by the
searches that have gone before.

I had tried to pitch Google maybe 7 years ago on the idea of studying
search in this context... I never heard back... but I would not be
surprised if this isn't effectively what they are doing anyway.
>
> When Steve and I were working on a project to visualize SFI working
> papers, we stumbled across search engine mashups that gave you
> categories of responses .. which seemed more useful than a single long
> list of results. Yahoo also seems to me, anyway to have a better
> display of search results.

Your noticing that a set of categories is more interesting/useful than a
simple ordered list of course, begs the question of how does one arrive
at the categories? Are these human-derived? Are these derived by the
structure of their relations? Are they derived by *your use*?! I
suppose your comments are making a case for *exposing* more of the
qualities used to personalize your search... help expose *why* the list
is ordered the way it is, or the categories of reasons they are offering
you things in those categories, etc.

In my vernacular, it would be to show you the erosion patterns of your
own search landscape I suppose. And I agree, and this is what I was
vaguely trying to propose to the Googleteers... to help us see the
basins of attraction carved out not only by our own personal searches
but by the linking and general search and followup patterns. I haven't
tracked their tech work in years, but at the time, Spectral Graph Theory
was an important part of the game it seemed.

The problem (one of them) is really that this is a high dimensional
problem... and reducing it to one dimension (ordered list) is only a
little worse than reducing it to a (2d) landscape by some measures. I
am often surprised that google doesn't offer multiple sorts on their
results. Sometimes I am interested in *recent* things (I'm now using
Google Realtime sometimes and wondering what happened to Collecta...
currently offline?) and other times I *might* be interested in ordering
*without* personalization and *with* personalization... or
personalization weighted different ways, etc.

Visualizing complex multi/hypergraphs is a holy grail for me. I've
done a bit with various real-world problems but it remains an
interesting and hard one. More to the point, in this context, it is not
the actual *graphs* one needs to visualize but rather the systems and
data that are encoded in the graphs. In this case, the networks of
interconnected web sites/links and the search patterns and utilization
over those networks are what Google (or DuckDuckGo) has to work with,
and a structured ordering/layout of the available resources, possibly
annotated, is what we want returned when we enter a query.

I like the landscape metaphor for many reasons. In general I believe
all visualizations are rooted in metaphors, even simple (usually
geometric) ones. The landscape is a familiar one (geometrically it is
a simple single-valued function) which human brains were evolved to
parse well. The Topic Maps of PNNL's Spire and Sandias VxInsight are a
beginning of this. In this case, they only really encode proximity and
density. In the case at hand, one would also like to encode erosion,
accelerated wear, and possibly growth and diversity. The "height" of
the landscape is one interesting and primary measure, but the relative
height and the raggedness and the size of an "ideashed" , etc. are also
interesting/useful.

I guess I wandered into the other thread started by Tom Johnson on Data
Visualization in general.

- Steve

============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org