Quantcast
Channel: IR Tutorials – IR Thoughts
Browsing latest articles
Browse All 66 View Live

Image may be NSFW.
Clik here to view.

Why IDF is Expressed Using Logs

Recently a known SEO (name reserved) inquired me about some aspects of IDF (Inverse Document Frequency). Below are three of his questions. I am partially reproducing/editing my responses, so it might...

View Article



Image may be NSFW.
Clik here to view.

Vector Normalization with Excel – Part II

Back in March, we explained how to normalize column vectors with Excel. But, what about normalizing row vectors? This question is addressed in the current QA column of IRW. I think it might be useful...

View Article

Image may be NSFW.
Clik here to view.

IR Videos in Spanish

I normally do not put online my lecture notes (ppt, pdf, videos). However, there are two public conferences that event organizers taped. Both last over 1 hour and are in Spanish, but with slides in...

View Article

Image may be NSFW.
Clik here to view.

Thesaurus as a Complex Network

I came across Thesaurus as a complex network, a fascinating 2003 paper written by Adriano de Jesus Holanda, Ivan Torres Pisa, Osame Kinouchi, Alexandre Souto Martinez and Evandro Eduardo Seron Ruiz...

View Article

Image may be NSFW.
Clik here to view.

IRW-2010-June: Uncorrelation & Correlation: The Expectation Values Way

The current issue of the IRW Newsletter is out and should reach subscribers inboxes during the day. The featuring column is about Uncorrelation and Correlation: The Expectation Values Way. “Soon or...

View Article


Image may be NSFW.
Clik here to view.

A Tutorial on Standard Errors

Soon or later those conducting data mining studies will need to compute standard errors for several statistics. Every statistic from a sample distribution has a standard error that is specific to that...

View Article

Image may be NSFW.
Clik here to view.

A Tutorial on Correlation Coefficients

I’m pleased to publish the tutorial on correlation coefficients. So, start ignoring the quack science from the usual SEO losers. Statistics is a loss for them, anyway. Instead of speculating about the...

View Article

Image may be NSFW.
Clik here to view.

On SEOMOZ “Knowledge” about Statistics

In the 04/23/2010 blog post, Beware of SEO Statistical Studies, we commented on incorrect statistical methods used by SEOs in two different blogs. In that post, it was mentioned that we agreed with the...

View Article


Image may be NSFW.
Clik here to view.

On Correlation Strength Scales and a tutorial update

Today, I’ve updated the Tutorial on Correlation Coefficients in order to add a new section on correlation strength scales. I feel this is granted. In a 7/16/2010 Search Engine Watch post, a search...

View Article


Image may be NSFW.
Clik here to view.

Understanding Fisher’s Z Transformations

As mentioned in my Tutorial on Correlation Coefficients, the best known technique for transforming correlation coefficient (r) values into weighted additive quantities is the r-to-Z transformation due...

View Article

Image may be NSFW.
Clik here to view.

Understanding Accuracy and Precision

Students often have hard time understanding the difference between accuracy and precision, particularly when they read quack “science” “studies” when surfing  the Web. This post might help them to...

View Article

Image may be NSFW.
Clik here to view.

On Correlation Coefficients and Sample Size

Today I updated my Tutorial on Correlation Coefficients to include a new section on the effect of sample size on the significance of correlation coefficients. This was motivated by some comments from...

View Article

Image may be NSFW.
Clik here to view.

On the Non-Additivity of Correlation Coefficients

Regardless of your research field, soon or later you need to generate average statistics, for instance a weighted correlation coefficient between any two variables, x and y. Computing weighted averages...

View Article


Image may be NSFW.
Clik here to view.

A Tutorial on Okapi BM25

We have uploaded a new tutorial: Okapi BM25. See http://www.miislita.com/information-retrieval-tutorial/okapi-bm25-tutorial.pdf This is a tutorial on the classic Okapi Best Match 25. Enjoy it.   Filed...

View Article

Image may be NSFW.
Clik here to view.

New Tutorials: Okapi BM25F and BM25

We have a new tutorial on Okapi Simple BM25 with Extension to Multiple Fields. http://www.miislita.com/information-retrieval-tutorial/okapi-simple-bm25f-tutorial.pdf Unlike the BM25, this model (known...

View Article


Image may be NSFW.
Clik here to view.

BM25 and BM25F: Implications to SEO and Web Design

Yesterday we published two great tutorials on the BM25 and BM25F algorithms. The “take away home” from the theory behind these algorithms: 1. A term (e.g., a keyword) has more information gain when it...

View Article

Image may be NSFW.
Clik here to view.

The Scope Hypothesis in IR: Who is Right?

In previous posts, we have presented two tutorials on Okapi BM25 and BM25F, which are based on the Verbosity and Scope Hypotheses. However… Here I would like to reference research at both sides of the...

View Article


Image may be NSFW.
Clik here to view.

The Information Retrieval Collection (IRC)

A new miner is available at Minerazzi.com: The Information Retrieval Collection (http://www.minerazzi.com/irc). What you can do with it? Use this topic-specific search engine to find information...

View Article

Image may be NSFW.
Clik here to view.

Two Essential Tools for Data Miners

We are moving from our old site to Minerazzi.com two tools essential to data miners interested in comparing data sets. 1. The Binary Distance Calculator, available at...

View Article

Image may be NSFW.
Clik here to view.

A Tutorial on Distance and Similarity

The first of a series of companion tutorials for some of our tools is available now at http://www.minerazzi.com/tutorials/index.php In this tutorial we present a general overview of two association...

View Article

Image may be NSFW.
Clik here to view.

Levenshtein Distance Calculator

The Levenshtein Distance Calculator is back. This tool was removed from our old site, but now is available at http://www.minerazzi.com/tools/index.php This is a visual and interactive tool great for...

View Article


Image may be NSFW.
Clik here to view.

A Tutorial on the Levenshtein Distance

A short tutorial on the Levenshtein Distance is available now at http://www.minerazzi.com/tutorials/index.php Did you know that Levenshtein Distance is at the heart of sequence analysis and text...

View Article


Image may be NSFW.
Clik here to view.

A Quantile-Quantile Tutorial

We have restored our quantile-quantile tutorial from our previous site and is now available at http://www.minerazzi.com/tutorials This is an Excel-based tutorial. Quantile analysis by means of...

View Article

Image may be NSFW.
Clik here to view.

The Self-Weighting Model Tutorial: Parts 1 and 2

This is a two-parts tutorial on The Self-Weighting Model (SWM), available at http://www.minerazzi.com/tutorials/self-weighting-model-tutorial-part-1.pdf...

View Article

Image may be NSFW.
Clik here to view.

A Cosine Similarity Tool and Companion Tutorial

  On Cosine Similarity Cosine similarity is commonly used in data mining and information retrieval as a measure of the resemblance between data sets; i.e. how similar or alike these are. It is an...

View Article


Image may be NSFW.
Clik here to view.

How to Report Temperature Relative Errors

In the documentation of the Hydrocarbons Parser, we have explained why temperature relative errors should be expressed in the Kelvin scale. According to a Wikipedia article, a relative error only makes...

View Article

Image may be NSFW.
Clik here to view.

On IR Tutorials, Google, and SEOs

In a Google patent article on user similarities (https://www.google.com/patents/US8458195) my old tutorial on cosine similarity is cited. If you try to follow that link, you won’t be able to access it...

View Article

Image may be NSFW.
Clik here to view.

MTU, MSS, and IP Packet Fragmentation Legacy Tutorials are Back!

Two new legacy tutorials aimed at those data mining information security and written way back, in 2009, are now available at http://www.minerazzi.com/tutorials/ MTU and MSS Tutorial This tutorial...

View Article

Image may be NSFW.
Clik here to view.

A Novel Mnemonic for Rydberg Rule

We have uploaded a new tutorial based on Rydberg Rule. It is available at http://www.minerazzi.com/tutorials/rydberg-rule-mnemonic.pdf We will soon upload a new version of the electron configurations...

View Article



Image may be NSFW.
Clik here to view.

A Novel Mnemonic for the Rydberg Rule: Updated Version

I have uploaded a new, updated, and improved version of the tutorial: A Novel Mnemonic for the Rydberg Rule, http://www.minerazzi.com/tutorials/rydberg-rule-mnemonic.pdf Abstract – This tutorial...

View Article

Image may be NSFW.
Clik here to view.

Classic Information Retrieval Tutorials Are Back!

1. Starting with our classic Term Vector Theory series, we are republishing our series of tutorials on Information Retrieval from the early and mid 2000s. See http://www.minerazzi.com/tutorials/ 2. A...

View Article

Image may be NSFW.
Clik here to view.

A Linear Algebra Approach to the Vector Space Model

Another fast-track tutorial updated and improved is back! http://minerazzi.com/tutorials/term-vector-linear-algebra.pdf This is a fast track tutorial on vector space calculations. A linear algebra...

View Article

Image may be NSFW.
Clik here to view.

Binary and Term Count Models Tutorial

This is Part 2 of our introductory tutorial series on Term Vector Theory as used in Information Retrieval and Data Mining. The Binary (BNRY) and Term Count (FREQ) models are discussed. The tutorial is...

View Article


Image may be NSFW.
Clik here to view.

The Classic TF-IDF Vector Space Model

This is Part 3 of an introductory tutorial series on Term Vector Theory. The classic term frequency-inverse document frequency model or TF-IDF, is discussed. Its advantages and limitations are...

View Article

Image may be NSFW.
Clik here to view.

An Introduction to Local Weight Models

This is Part 4 of a tutorial series on Term Vector Theory. An introduction to several local weight models is presented. The tutorial is available at http://www.minerazzi.com/tutorials/term-vector-4.pdf...

View Article

Image may be NSFW.
Clik here to view.

A Tutorial on Standard Errors

Our tutorial on standard errors is back! It is now available at http://minerazzi.com/tutorials/a-tutorial-on-standard-errors.pdf We have edited and updated the tutorial. New material was added. Enjoy...

View Article


Image may be NSFW.
Clik here to view.

Fisher Transformations Tool

A new tool, The Fisher Transformations Tool is available at http://www.minerazzi.com/tools/fisher/transformations.php We have placed a link to its companion tutorial. BTW we have fixed some typos made...

View Article


Image may be NSFW.
Clik here to view.

Introduction to Global Weights with Applications to MySQL

This is Part 5 of a tutorial series on Term Vector Theory. Several global weight models are discussed and a brief introduction to MySQL implementation of the Vector Space Model presented. The tutorial...

View Article

Image may be NSFW.
Clik here to view.

Extended Boolean Model Tutorial

Our 2006 legacy tutorial on the Extended Boolean Model is back, with its content edited and updated. It is available now at http://www.minerazzi.com/tutorials/term-vector-6.pdf For additional tutorials...

View Article

Image may be NSFW.
Clik here to view.

BM25 and Power Transformations: Introducing BM25IR

Local Term Weight Models from Power Transformations Development of BM25IR: A Best Match Model based on Inverse Regression In this article we show how power transformations can be used as a common...

View Article

Image may be NSFW.
Clik here to view.

72 Binary Similarity Measures

We have expanded the number of similarity measures that our Binary Similarity Calculator computes from 30 to 72 (and counting…) Same measures with different names have been consolidated into a single...

View Article


Image may be NSFW.
Clik here to view.

Probabilistic Model Tutorial

This is an updated version of a tutorial on the Robertson-Spärck-Jones Probabilistic Model. It is available now at http://www.minerazzi.com/tutorials/probabilistic-model-tutorial.pdf The model computes...

View Article

Image may be NSFW.
Clik here to view.

OKAPI BM25 Tutorial

We have restored, refined, and updated this tutorial and added some historical background. Abstract This is a light tutorial on OKAPI BM25, a Best Match model where local weights are computed as...

View Article


Image may be NSFW.
Clik here to view.

BM25F Model Tutorial

We have restored, expanded, and updated our tutorial on the BM25 Extension to Multiple Weighted Fields Model, best known as BM25F. It is now available at...

View Article

Image may be NSFW.
Clik here to view.

Memory Based Reasoning in AI

This is a nice course on Memory Based Reasoning in AI taught by Dr. Deepak Khemani and Dr. Sutanu Chakraborti at the Department of Computer Science & Engineering, Indian Institute of Technology...

View Article


Image may be NSFW.
Clik here to view.

On the Nonadditivity of Correlation Coefficients Part 1: Pearson and Spearman...

This is Part 1 of a tutorial series on the nonadditivity of correlation coefficients. We demonstrate why it is not possible to arithmetically add, subtract, and average Pearson’s r or Spearman’s rs....

View Article

Image may be NSFW.
Clik here to view.

On the Nonadditivity of Correlation Coefficients Part 2: Fisher Transformations

This is Part 2 of a tutorial series on the nonadditivity of correlation coefficients. This time we discuss Fisher r-to-Z and Z-to-r transformations and the risks of arbitrarily implementing these....

View Article

Why I chose to be a multidisciplinary scientist?

When you are a multidisciplinary scientist or teacher, one way of measuring your success is by looking at what students and others in different fields and countries do with the tools and resources you...

View Article

Semantic Similarity of Healthcare Data

In “Aggregating the syntactic and semantic similarity of healthcare data towards their transformation to HL7 FHIR through ontology matching“, published in the International Journal of Medical...

View Article


Image may be NSFW.
Clik here to view.

The Extended Boolean Model for Information Retrieval

The Extended Boolean Model for Information Retrieval. This is an IR tutorial I wrote circa 2006 (http://www.minerazzi.com/tutorials/term-vector-6.pdf). It may be useful to those interested in learning...

View Article

Browsing latest articles
Browse All 66 View Live




Latest Images