Sharp bounds for the maximum of the chi-square index in a class of contingency tables with given marginals

Bahman Kalantari, I. Lari, A. Rizzi, B. Simeone

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Our research is motivated by the problem of finding a joint frequency distribution with given marginals which is 'as far as possible' from the independent distribution with the same marginals. If the distance between these two distributions is measured by Pearson's chi-square index, the problem can be formulated as maximizing a quadratic convex separable function subject to transportation constraints. We present three heuristics for this problem: two of them are greedy heuristics; the third one amounts to solve a linear relaxation of the problem, and yields also an upper bound of the optimum; thus one can estimate the relative error E of any given heuristic. The lower bounds given by the three heuristics may be improved via Frank-Wolfe's algorithm. Numerical experiments on 600 randomly generated test problems with up to 50 rows and 100 columns show that the above heuristics provide sharp bounds on the optimum (often one has E < 0.01). Even more interestingly, these bounds become sharper and sharper as the problem size increases.

Original languageEnglish (US)
Pages (from-to)19-34
Number of pages16
JournalComputational Statistics and Data Analysis
Volume16
Issue number1
DOIs
StatePublished - Jan 1 1993

Fingerprint

Chi-square
Sharp Bound
Contingency Table
Heuristics
Experiments
Linear Relaxation
Greedy Heuristics
Relative Error
Test Problems
Numerical Experiment
Class
Contingency table
Lower bound
Upper bound
Estimate

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Computational Mathematics
  • Computational Theory and Mathematics
  • Applied Mathematics

Keywords

  • Association indexes
  • Contingency tables
  • Fréchet class
  • Heuristics
  • Quadratic transportation

Cite this

@article{c2d8d8bac83c4a9abb03a05e11eff789,
title = "Sharp bounds for the maximum of the chi-square index in a class of contingency tables with given marginals",
abstract = "Our research is motivated by the problem of finding a joint frequency distribution with given marginals which is 'as far as possible' from the independent distribution with the same marginals. If the distance between these two distributions is measured by Pearson's chi-square index, the problem can be formulated as maximizing a quadratic convex separable function subject to transportation constraints. We present three heuristics for this problem: two of them are greedy heuristics; the third one amounts to solve a linear relaxation of the problem, and yields also an upper bound of the optimum; thus one can estimate the relative error E of any given heuristic. The lower bounds given by the three heuristics may be improved via Frank-Wolfe's algorithm. Numerical experiments on 600 randomly generated test problems with up to 50 rows and 100 columns show that the above heuristics provide sharp bounds on the optimum (often one has E < 0.01). Even more interestingly, these bounds become sharper and sharper as the problem size increases.",
keywords = "Association indexes, Contingency tables, Fr{\'e}chet class, Heuristics, Quadratic transportation",
author = "Bahman Kalantari and I. Lari and A. Rizzi and B. Simeone",
year = "1993",
month = "1",
day = "1",
doi = "10.1016/0167-9473(93)90242-L",
language = "English (US)",
volume = "16",
pages = "19--34",
journal = "Computational Statistics and Data Analysis",
issn = "0167-9473",
publisher = "Elsevier",
number = "1",

}

Sharp bounds for the maximum of the chi-square index in a class of contingency tables with given marginals. / Kalantari, Bahman; Lari, I.; Rizzi, A.; Simeone, B.

In: Computational Statistics and Data Analysis, Vol. 16, No. 1, 01.01.1993, p. 19-34.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Sharp bounds for the maximum of the chi-square index in a class of contingency tables with given marginals

AU - Kalantari, Bahman

AU - Lari, I.

AU - Rizzi, A.

AU - Simeone, B.

PY - 1993/1/1

Y1 - 1993/1/1

N2 - Our research is motivated by the problem of finding a joint frequency distribution with given marginals which is 'as far as possible' from the independent distribution with the same marginals. If the distance between these two distributions is measured by Pearson's chi-square index, the problem can be formulated as maximizing a quadratic convex separable function subject to transportation constraints. We present three heuristics for this problem: two of them are greedy heuristics; the third one amounts to solve a linear relaxation of the problem, and yields also an upper bound of the optimum; thus one can estimate the relative error E of any given heuristic. The lower bounds given by the three heuristics may be improved via Frank-Wolfe's algorithm. Numerical experiments on 600 randomly generated test problems with up to 50 rows and 100 columns show that the above heuristics provide sharp bounds on the optimum (often one has E < 0.01). Even more interestingly, these bounds become sharper and sharper as the problem size increases.

AB - Our research is motivated by the problem of finding a joint frequency distribution with given marginals which is 'as far as possible' from the independent distribution with the same marginals. If the distance between these two distributions is measured by Pearson's chi-square index, the problem can be formulated as maximizing a quadratic convex separable function subject to transportation constraints. We present three heuristics for this problem: two of them are greedy heuristics; the third one amounts to solve a linear relaxation of the problem, and yields also an upper bound of the optimum; thus one can estimate the relative error E of any given heuristic. The lower bounds given by the three heuristics may be improved via Frank-Wolfe's algorithm. Numerical experiments on 600 randomly generated test problems with up to 50 rows and 100 columns show that the above heuristics provide sharp bounds on the optimum (often one has E < 0.01). Even more interestingly, these bounds become sharper and sharper as the problem size increases.

KW - Association indexes

KW - Contingency tables

KW - Fréchet class

KW - Heuristics

KW - Quadratic transportation

UR - http://www.scopus.com/inward/record.url?scp=38249003741&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38249003741&partnerID=8YFLogxK

U2 - 10.1016/0167-9473(93)90242-L

DO - 10.1016/0167-9473(93)90242-L

M3 - Article

AN - SCOPUS:38249003741

VL - 16

SP - 19

EP - 34

JO - Computational Statistics and Data Analysis

JF - Computational Statistics and Data Analysis

SN - 0167-9473

IS - 1

ER -