Exact and approximate discrete optimization algorithms for finding useful disjunctions of categorical predicates in data analysis

Endre Boros, Vladimir Menkov

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

We discuss a discrete optimization problem that arises in data analysis from the binarization of categorical attributes. It can be described as the maximization of a function F(l1 (x), l22(x)), where l 1(x) and l2(x) are linear functions of binary variables x ε+ {0, 1}n, and F: ℝ2 ℝ. Though this problem is NP-hard, in general, an optimal solution x* of it can be found, under some mild monotonicity conditions on F, in pseudo-polynomial time. We also present an approximation algorithm which finds an approximate binary solution xε, for any given ε > 0, such that F(l 1 (x*), l2(x*)) - F(l1 (x ε) l2(xε) < ε at the cost of no more than O(n log n + 2C/√εn) operations. Though in general C depends on the problem instance, for the problems arising from [en]binarization of categorical variables it depends only on F, and for all functions considered we have C≤1/√2.

Original languageEnglish (US)
Pages (from-to)43-58
Number of pages16
JournalDiscrete Applied Mathematics
Volume144
Issue number1-2
DOIs
StatePublished - Nov 30 2004

All Science Journal Classification (ASJC) codes

  • Discrete Mathematics and Combinatorics
  • Applied Mathematics

Keywords

  • Binary optimization
  • Feature generation
  • Machine learning

Fingerprint

Dive into the research topics of 'Exact and approximate discrete optimization algorithms for finding useful disjunctions of categorical predicates in data analysis'. Together they form a unique fingerprint.

Cite this