Archive for the ‘Semantic Web’ Category

Can Algorithms Discern My Meanings?

Tuesday, August 7th, 2007

I suppose that, theoretically, they could, but only theoretically, and only given a lifetime’s worth of customized definitions.

In current reality? …not even close.

Let’s say that I spend hours exploring a site that sells sex toys. How can it be known that I’m doing that because a friend told me about how easy the site is to use and I’m interested in examples of user-friendly design? Or perhaps that I stumble across that site on a home computer that my son last used and wander around in a stunned state before learning that it was the cute girl that my son’s friend brought over who had been hanging out there. Separately, my husband repeatedly snipes auctions of decidedly femme collectibles for me, but does not shop for himself online. Take a number of such examples without context, and you start constructing the image of a decidedly eccentric (to put it mildly) family, when the reality could be the opposite.

Most of the gatherers of information on what we do online aren’t specifically interested in me and my family or in what might appear to be aberrant behavior for people in our demographic, although they would be if they could determine context. Tracking the habits and behavior of millions results in such anomalies falling by the wayside. If the systems are sophisticated, and aided by additional market information tools, the resulting information can then be developed in more detail, revealing individual wavelets within the waves across the tides of traffic movement. Little of it, though, is presently about individuals in particular.

Its primary usefulness is to marketers. For example, putting the wavelets together, they might discover that people who like to look at or buy red cars are twice as likely to buy a Camaro, and this information will lead them to put a red Camaro in their online ad. Even this group behavior information could be an anomaly, though, and needs to be consistently repeated over time to become reliable data. By also plugging this sort of information into a recommendation engine, they might increase online sales (or at least clicks) substantially. This is generally an inoffensive process to us, and many online shoppers actually like the results, finding books, music, and other merchandise to their taste that they might never otherwise have found.

Marketers also include those packaging and presenting a political candidate to potential voters, and this segment’s agendas can be tangentially tied to the secondary group to whom this information is useful, namely researchers (including academics). A tie can occur, for example, where the researchers’ data aids political agendas and results in elected representatives supporting funding for more data collection.

Regardless of the goal and end uses, none of this data gathering and weighing involves relevance and meaning to the individual except when it is 1. part of an accurate enough wavelet to filter back product recommendations or 2. the individual seeks widely popular or repeatedly accessed/recommended data, thereby improving shopping and researching experience and results.

Developing better search engines and relevancy structures matters a lot to anyone with something to sell online. They know that they can barely see the tip of the iceberg of things that I, or anyone else, might be interested in buying, joining, etc. Hundreds of millions of dollars are poured into tracking our paths and destinations and online behavior towards hopefully uncovering this ‘iceberg’. Such building could also matter a lot to societies as a whole and organizations and communities in particular. It might, in an integrated view, matter more to the second constituency if it can be customized. Recognizing that could also turn out to be a boon to the first group.

Such discovery could happen. It won’t happen, however, by following us all around with ever more complex and sensitive algorithms, and not just because the richer a user’s activities, the more likely that user is to be aware of being tracked and of learning how to evade it. It won’t happen until I, the user, have a stake in revealing my context and meanings, incentives to do so that benefit me and my fellowes on levels far beyond those deemed sufficient so far, and the structural tools that empower me to do it.

Our personal meanings are important to us. They have value. They represent enrichment far beyond the scope of today’s online transactions, social or commercial. I believe that sharing and connecting them responsibly and respectfully would result in societal changes that we haven’t fully conceived yet.

User Classification?

Tuesday, December 5th, 2006

It seems to me that the focus on classification of data is disproportionately large compared to the focus on classification by, for, and of users. There’s a huge ‘either or’ gap between data structures created by experts and streams of user created data. Among other things, we’re so used to the users being anonymous, and, by definition, that means that no responsibility is taken for any data they generate.

Does classification of organically grown content on the web have to be an oxymoron? There’s quite a conundrum behind this question. Enabling free unlimited growth to create value results in both lower value and increasing chaos. Establishing structures and imposing rules limits growth and concentrates (shrinks) power.

We talk constantly about enabling users, but what we really mean is giving them a useful tool that we either sell to them, or give in exchange for their tacit agreement to become part of our asset. No one talks about giving the individual an identity and a role. The best thing we have doing that is still eBay, and that is just a platform connecting individuals to one another. Blogging accidentally serves a corner of the human need for individualization, but what an unwieldy and disconnected hodgepodge it is already, and how does it connect, for most of us, to communal contribution and benefit beyond, once again, those individual personal connections?

The Google model, based on putting the search in the user’s hands is really great, but its resulting offer of quantity without quality remains frustrating. I’ve noted the Google search altering somewhat as a result of social networking aggregating the traffic of individuals who have learned to play the link game, combined with the element of popularity which is supposed to reflect quality content. Therefore, I now get a lot of blog and ‘news’ clutter on searches about certain medical or legal topics (for example). Specialized engines such as Lexis Nexis are fine for many things, but I know that there’s a lot more out there.

I see the internet as a looping linking maze. There are billions of web sites, many of them formally organized by one authoritative entity or another. Anyone with a website learns to work on how to be listed and categorized and found. Many an individual user, however, can often feel like a piece of flotsam, retreating most often to a safe corner (such as a community they’re comfortable in), and venturing out to wider realms only in determined forays for a specific result. Could addressing differentiation of identities and acknowledged value contribution make a difference to them, and each of us, as well?

Most users, for example, wouldn’t take time to tag, or to contribute to Wikis, on today’s web. They come here to find something for themselves, and then leave. I think only part of that is due to people being busy and/or selfish. What’s in it for me? is a question most ask automatically in response to such a proposition from an anonymous stranger. It is not necessarily our first or only response to an identified person who recognizes and knows us in a community where we have an identity and a sense of belonging. Millions of us do things every day for our common good and without wanting public credit or compensation. We often do these things anonymously. We don’t, however, spend time doing them for anonymous strangers about whom we know nothing. We have to be able to clearly make the connection between our personal contribution and a specific rewarding result in order to reach square one and be open to motivation. Since it’s not likely that the majority of users are going to ‘get’ the potential of the internet and become passionately devoted to it anytime soon, isn’t working on a place they ‘will’ want to contribute to and inhabit worthwhile? …unless, of course, the real future here belongs only to an elite few.

Non-Techie Musings: Can Searches and Tags Modify a Taxonomy?

Tuesday, December 5th, 2006

Has this already been done or tried?

Can a traditional hierarchical structure be automatically modified by searches and, separately, by tags? If there were set thresholds inherent in the modification instruction, such as 10 or 100 either search word combinations or tags attached to the same image or word combinations, would that address the random clutter problem? Could potential risk to the database(s) be addressed sufficiently through security filters?

The related question is how many and how wide a range of different modification rules and rights could be assigned for a single data structure. If you have a dozen or more classifications of users with different access and privileges, does each of their input enter via a different track, or can segments of it be pooled after leaving the space where the user status is defined and protected?

Credentialed users, for example, could create sub categories without limit, make multiple and faceted entry of items, etc. New users, at the other end of the spectrum, could make their own tags and links in their accounts and these would form a component of communal classification which would be automatically collated via being pooled. Between the two, various levels of intermediate and non-tech expert users could be given appropriate levels of access and rights, and so on. So I’ve imagined it, anyway. :)

Giving a non-tech user a way to make faceted entries would obviously have to be done in common language rather than jargon. As an example, apples can relate to (beyond food and fruit) cooking and dessert and even biblical symbolism. Sorting things from one’s own special interest list could be both easy and popular fun. There’s an assumption in this concept that a user putting a ‘gemstone’ tag beside an apple picture or article is going to be an aberration. This example does, however, indicate a requirement for defining category levels and relationships between them, as apple could easily be found at a different level down a sculpture or jewelry branch.

Many, perhaps even all, of the different components in my imaginings already exist, but the applications of them, in my experience, are usually very limited. It sometimes feels as though every basic body movement (in analogy) is already enabled, on a computer, over the internet, on a website or through a web app. Continuing the analogy, I have to go over there and sign in to lift my little finger, and somewhere else to walk, and somewhere else again to sit down. Spending too much time on the web is beginning to make ‘me’ feel like a jigsaw puzzle that no one has assembled, so that fragmentation of my identity results. Enterprise applications, on the other hand, make me seem (to myself) rigidly 2-dimensional, like a cardboard cutout figure. Sometimes, if they work well, the image is of a paper chain of figures. Not exactly fragmented, in this case, but rather constrained within a 3rd party’s narrow definition.

Although I specifically made the original question in this post about a single traditional data structure, I am also naturally wondering about applying it to relational databases together with ontological meta-tagging. The first focus of my thinking, though, remains on how to begin achieving balance between authoritatively compiled data and user generated data, while retaining the maximum value of both.

Anonymity Search on Sphere

Friday, November 10th, 2006

My post yesterday on anonymity led me into several conversations on the topic (issues of privacy and attribution/plagiarism are related but also completely different topics), and this led to my searching it on Sphere.

I’d missed Jeff Jarvis’s post on the Tim Berners Lee interview in the Guardian plus Tim’s response to the Guardian interview on his blog (numerous bloggers linking and commenting on that one), and found other related posts as well.

Out of curiosity, I searched “Tim Berners Lee” on Google Blog search - 19,123 results. On Technorati - 185 results. On Sphere - 188 results. In this instance, Sphere’s reach matched Technorati with comparable or better overall quality, depending on your interests and pov.

Sphere is a blog search engine that has just come out of Beta. So far, only a few of my searches there have returned no results, but that percentage is falling fast, while the overall quality of results remains excellent. Check out their cool bookmarklet to display related topics to the subject of any post you’re reading.