blogging AoIR – part 1

Amanda Spink
Tracking Web searching Trends 1997 – 2005

– started in 1997 w/data from Exite
web search: Public Searching of the Web (Kluwer) – now Springer

– track web search trends
– identify characteristics of web searching – session length query length and use of operators
– examine the distribution of query topics terms queries sources language used on engines
– implication for theoretical & user models & web services , interfaces & system design

1st data set 1997 – 51,000 queries
10+million searches but different data

Excite achieves (only 50% of the users ask questions the rest use keywords, alltheweb (mainly European data), altavista, vivisimo (small Pittsburgh based engine), dogpile no google or MSN

any given query will return only 1% of the same results on the 1st page of the search (complexity of query) (can get overlap study from her)

– in data set – mostly from US
– per session 70% of users enter 3 or less queries per session (session? what is a session?)
– 46-60% session include query modification
– 10% use multitasking (start looking for one topic, look for another, flip back)

56% less than one minute
72% less than 5 minutes
81% less than 15 min

60% use 1 or 2 terms
2004 – 70% of users enter 3 or less terms per query
not noticing significant gender difference (how would they know?)

low use of Boolean operators; many are incorrect. search engines are trying to figure out how to educate users

– 2004 very few users beyond the first 2 pages
14% of users view pages for less thank 30 seconds

Distribution of Terms
– very small # of distinct terms used with high frequency
– bottom unusually high number of distinct terms used with low frequency
– web query vocabulary contains a very large number of distinct terms (strings of numbers, unintelligible …looking for very

sex searches = more Boolean & more pages viewed

Britney Spears high search term

in real #s sex searches going up but because internet usage has also gone up – it’s less than 3%

long trends
– entertainment queries to e-commerce /people queries
– increasing # of non-English queries
– more query reformulation
– less results & page viewing
– large number of spelling errors (looking for correct spelling?)
– web definitely being used as source of information

Q – strings of “unintelligible” looking for technical information and terms? combination of languages
A – she used example of a search like “run dog” qualitatively what does that mean, Dr. Spink offered the question – are they looking for information on running their dog, etc? how do you qualify that with more information.

personally i wonder if it’s a combination of informal use of the interface, looking for obscure info, lack of search skills & possibly poor language skills or non native speaker?

Q – gender how did they know
A – really can’t get this data. based on consistency of data. this really needs to be looked into more specifically

Dr. Spink offered data set to those interested. i thought i overheard something about an excel file….wondering if this massive data set is organized into just spreadsheets? sounds incredibly difficult to manage even for qualitative research. is it in a more complex system? what could prolog do with this kind of data as far as virtually instantaneous “databasing” ? how do you take this kind of dataset and get deeper into it?… to really see which subgroups are doing what with searching? what would query data coming out of a public library look like with the combination of “power user” librarians and the public? who uses what resources? how do they use them? are the user groups obvious just from looking at the data?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: