TY - JOUR

T1 - Exact and approximate discrete optimization algorithms for finding useful disjunctions of categorical predicates in data analysis

AU - Boros, Endre

AU - Menkov, Vladimir

N1 - Funding Information:
This research was partially supported by the National Science Foundation, Grant IIS-0118635, and by the Office of Naval Research, Grant N00014-92-J-1375. The authors are also grateful for the partial support by the Alexandria Project Laboratory at Rutgers University.

PY - 2004/11/30

Y1 - 2004/11/30

N2 - We discuss a discrete optimization problem that arises in data analysis from the binarization of categorical attributes. It can be described as the maximization of a function F(l1 (x), l22(x)), where l 1(x) and l2(x) are linear functions of binary variables x ε+ {0, 1}n, and F: ℝ2 ℝ. Though this problem is NP-hard, in general, an optimal solution x* of it can be found, under some mild monotonicity conditions on F, in pseudo-polynomial time. We also present an approximation algorithm which finds an approximate binary solution xε, for any given ε > 0, such that F(l 1 (x*), l2(x*)) - F(l1 (x ε) l2(xε) < ε at the cost of no more than O(n log n + 2C/√εn) operations. Though in general C depends on the problem instance, for the problems arising from [en]binarization of categorical variables it depends only on F, and for all functions considered we have C≤1/√2.

AB - We discuss a discrete optimization problem that arises in data analysis from the binarization of categorical attributes. It can be described as the maximization of a function F(l1 (x), l22(x)), where l 1(x) and l2(x) are linear functions of binary variables x ε+ {0, 1}n, and F: ℝ2 ℝ. Though this problem is NP-hard, in general, an optimal solution x* of it can be found, under some mild monotonicity conditions on F, in pseudo-polynomial time. We also present an approximation algorithm which finds an approximate binary solution xε, for any given ε > 0, such that F(l 1 (x*), l2(x*)) - F(l1 (x ε) l2(xε) < ε at the cost of no more than O(n log n + 2C/√εn) operations. Though in general C depends on the problem instance, for the problems arising from [en]binarization of categorical variables it depends only on F, and for all functions considered we have C≤1/√2.

KW - Binary optimization

KW - Feature generation

KW - Machine learning

UR - http://www.scopus.com/inward/record.url?scp=4544295363&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=4544295363&partnerID=8YFLogxK

U2 - 10.1016/j.dam.2004.06.006

DO - 10.1016/j.dam.2004.06.006

M3 - Article

AN - SCOPUS:4544295363

VL - 144

SP - 43

EP - 58

JO - Discrete Applied Mathematics

JF - Discrete Applied Mathematics

SN - 0166-218X

IS - 1-2

ER -