Decision Trees in Data Mining

In this chapter, I explain what happened to make data become so much more available and where Big Data emerged from. I will show what can be searched for in these data and what tools are needed for mining the data. The differences and similarities between a classification and regression are described. Then, the focus is moved to decision trees and classical methods in their induction, but the presentation should not be treated as an extensive overview of this wide area of research. The most important information about decision trees is provided, and this subjective selection is intended to be helpful in understanding the proposed global approach. Finally, the related works on applying evolutionary computation in decision trees are studied.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic €32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (France)

eBook EUR 106.99 Price includes VAT (France)

Softcover Book EUR 137.14 Price includes VAT (France)

Hardcover Book EUR 137.14 Price includes VAT (France)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Notes

In a regression, the independent features are called regressors. We assume that the tree has at least one internal node and is not reduced to just one leaf. ID3 stands for Iterative Dichotomiser 3.

References

Chen C, Zhang C (2014) Inf Sci 275:314–347 ArticleGoogle Scholar
Wu P, Cheng C, Kaddi C, Venugopalan J, Hoffman R, Wang M (2017) IEEE Trans Biomed Eng 64(2):263–273 ArticleGoogle Scholar
Zhong R, Huang G, Lan S, Dai Q, Chen X, Zhang T (2015) Int J Prod Econ 165:260–272 ArticleGoogle Scholar
Gungor V, Sahin D, Kocak T, Ergut S, Buccella C, Cecati C, Hancke G (2013) IEEE Trans Ind Inform 9(1):28–42 ArticleGoogle Scholar
Emani C, Cullot N, Nicolle C (2015) Comput Sci Rev 17:70–81 ArticleMathSciNetGoogle Scholar
Gupta U, Gupta A (2016) J Int Bus Res Mark 1(3):50–56 Google Scholar
Fayyad U, Uthurusamy R (2002) Commun ACM 45(8):28–31 ArticleGoogle Scholar
Vassiliadis P (2009) Int J Data Warehous Min 5(3):1–27 ArticleGoogle Scholar
Wu X, Zhu X, Wu G, Ding W (2014) IEEE Trans Knowl Data Eng 26(1):97–107 ArticleGoogle Scholar
Fayyad U, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (1996) Advances in knowledge discovery and data mining. AAAI Press Google Scholar
Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning. Springer, Berlin MATHGoogle Scholar
Duda O, Heart P, Stork D (2001) Pattern classification. 2nd edn. Wiley, New York Google Scholar
Mitchell T (1997) Machine learning. McGraw-Hill, New York Google Scholar
Hand D, Mannila H, Smyth P (2001) Principles of data mining. The MIT Press, Cambridge Google Scholar
McGarry K (2005) Knowl Eng Rev 20(1):39–61 ArticleGoogle Scholar
Domingos P (2012) Commun ACM 55(10):78–87 ArticleGoogle Scholar
Liu H, Hussain F, Tan C, Dash M (2002) Data Min Knowl Discov 6(4):393–423 ArticleMathSciNetGoogle Scholar
Kotsiantis S (2013) Artif Intell Rev 39:261–283 ArticleGoogle Scholar
Rokach L, Maimon O (2014) Data mining with decision trees: theory and applications, 2nd edn. World Scientific Google Scholar
Polikar R (2006) IEEE Circuits Syst Mag 6(3):21–45 ArticleGoogle Scholar
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105 Google Scholar
Quinlan J (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco Google Scholar
Murthy S (1998) Data Min Knowl Discov 2:345–389 ArticleGoogle Scholar
Utgoff P (1989) Connect Sci 1(4):377–391 ArticleGoogle Scholar
Llora X, Wilson S (2004) Mixed decision trees: minimizing knowledge representation bias in LCS. In: Proceedings of GECCO’04. Lecture notes in computer science, vol 3103, pp 797–809 ChapterGoogle Scholar
Yildiz O, Alpaydin E (2001) IEEE Trans Neural Netw 12(6):1539–1546 ArticleGoogle Scholar
Loh W-Y (2014) Int Stat Rev 82(3):329–348 ArticleMathSciNetGoogle Scholar
Quinlan J (1986) Mach Learn 1(1):81–106 Google Scholar
Kass G (1980) Appl Stat 29(2):119–127 ArticleGoogle Scholar
Brodley C, Utgoff P (1995) Mach Learn 19(1):45–77 Google Scholar
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, Monterey Google Scholar
Murthy S, Kasif S, Salzberg S (1994) J Artif Intell Res 2:1–33 ArticleGoogle Scholar
Gama J, Brazdil P (1999) Intel Data Anal 3(1):1–22 ArticleGoogle Scholar
Quinlan J (1992) Learning with continuous classes. In: Proceedings of AI’92, pp 343–348 Google Scholar
Torgo L (1997) Functional models for regression tree leaves. In: proceedings of ICML’97. Morgan Kaufmann, pp 385–393 Google Scholar
Hayfil L, Rivest R (1976) Inf Process Lett 5(1):15–17 ArticleGoogle Scholar
Brassard G, Bratley P (1996) Fundamentals of algorithmics. Prentice Hall Google Scholar
Rokach L, Maimon O (2005) IEEE Trans SMC C 35(4):476–487 Google Scholar
Esposito F, Malerba D, Semeraro G (1997) IEEE Trans Pattern Anal Mach Intell 19(5):476–491 ArticleGoogle Scholar
Quinlan J (1987) Int J Man Mach Stud 27:221–234 ArticleGoogle Scholar
Bobrowski L (1996) Piecewise-linear classifiers, formal neurons and separability of the learning sets. In: Proceedings of 13 ICPR. IEEE computer society press, pp 224–228 Google Scholar
Czajkowski M, Kretowski M (2014) Inf Sci 288:153–173 ArticleGoogle Scholar
Shah S, Sastry P (1999) IEEE Trans SMC C 29(4):494–505 Google Scholar
Vogel D, Asparouhov O, Scheffer T (2007) Scalable look-ahead linear regression trees. In: Proceedings of KDD’07. ACM Press, New York, pp 757–764 Google Scholar
Wang Y, Xia S, Wu J (2017) Knowl-Based Syst 120:34–42 ArticleGoogle Scholar
Kozak J (2019) Decision tree and ensemble learning based on ant colony optimization. Springer, Berlin Google Scholar
Freitas A (2002) Data mining and knowledge discovery with evolutionary algorithms. Springer, Berlin Heidelberg Google Scholar
Chai B, Huang T, Zhuang X, Zhao Y, Sklansky J (1996) Pattern Recognit 29(11):1905–1917 ArticleGoogle Scholar
Cantu-Paz E, Kamath C (2003) IEEE Trans Evol Comput 7(1):54–68 ArticleGoogle Scholar
Ng S, Leung K (2005) Induction of linear decision trees with real-coded genetic algorithms and k-D trees. In: Proceedings of IDEAL’05. Lecture notes in compter science, vol 3578, pp 264–271 Google Scholar
Tan P, Dowe D (2004) MML inference of oblique decision trees. In: Proceedings of AJCAI’04. Lecture notes in computer science, vol 3339, pp 1082–1088 Google Scholar
Kretowski M (2004) An evolutionary algorithm for oblique decision tree induction. In: Proceedings of ICAISC’04. Lecture notes in artificial intelligence, vol 3070, pp 432–437 Google Scholar
Vilalta R, Drissi Y (2002) Artif Intell Rev 18(2):77–95 ArticleGoogle Scholar
Barros R, Basgalupp M, Freitas A, Carvalho A (2014) IEEE Trans Evol Comput 18(6):873–892 ArticleGoogle Scholar
Karabadji N, Seidi H, Bousetouane F, Dhifi W, Aridhi S (2017) Knowl-Based Syst 119:166–177 ArticleGoogle Scholar
Frank E, Hall M, Witten I (2016) The WEKA workbench. Online appendix for “data mining: practical machine learning tools and techniques”, 4th edn. Morgan Kaufmann, San Francisco Google Scholar
Koza J (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge Google Scholar
Koza J (1991) Concept formation and decision tree induction using genetic programming paradigm, In: Proceedings of PPSN 1. Lecture notes in computer science, vol 496, pp 124–128 Google Scholar
Nikolaev N, Slavov V (1998) Intell Data Anal 2:31–44 ArticleGoogle Scholar
Tanigawa T, Zhao Q (2000) A study on efficient generation of decision tree using genetic programming. In: Proceedigns of GECCO’00, pp 1047–1052 Google Scholar
Bot M, Langdon W (2000) Application of genetic programming to induction of linear classification trees. In: EuroGP 2000. Lecture notes in computer science, vol 1802, pp 247–258 Google Scholar
Folino G, Pizzuti C, Spezzano G (1999) A cellular genetic programming approach to classification. In: Proceedings of GECCO’99, Morgan Kaufmann, pp 1015–1020 Google Scholar
Folino G, Pizzuti C, Spezzano G (2000) Genetic programming and simulated annealing: a hybrid method to evolve decision trees. In: EuroGP’00, Lecture notes in computer science, vol 1802, pp 294–303 Google Scholar
Folino G, Pizzuti C, Spezzano G (2002) Improving induction decision trees with parallel genetic programming. In: Proceedings of EUROMICROPDP’02, IEEE Press, pp 181–187 Google Scholar
Kuo C, Hong T, Chen C (2007) Soft Comput 11:1165–1172 ArticleGoogle Scholar
Saremi M, Yaghmaee F (2018) Comput Intell 34:495–514 ArticleMathSciNetGoogle Scholar
Papagelis A, Kalles D (2001) Breeding decision trees using evolutionary techniques. In: Proceedings of ICML’01. Morgan Kaufmann, pp 393–400 Google Scholar
Kalles D, Papagelis A (2010) Soft Comput 14(9):973–993 ArticleGoogle Scholar
Fu Z, Golden B, Lele S, Raghavan S, Wasil E (2003) INFORMS J Comput 15(1):3–22 ArticleMathSciNetGoogle Scholar
Fu Z, Golden B, Lele S, Raghavan S, Wasil E (2003) Oper Res 51(6):894–907 ArticleMathSciNetGoogle Scholar
Sorensen K, Janssens G (2003) Eur J Oper Res 151:253–264 ArticleGoogle Scholar
Llora X, Garrell J (2001) Evolution of decision trees. In: Proceedings of CCAI’01. ACIA Press, pp 115–122 Google Scholar
Cha S, Tappert C (2009) J Pattern Recognit Res 4(1):1–13 ArticleGoogle Scholar
Fan G, Gray JB (2005) J Comput Graph Stat 14(1):206–218 ArticleGoogle Scholar
Schwarz G (1978) Ann Stat 6:461–464 ArticleGoogle Scholar
Gray J, Fan G (2008) Comput Stat Data Anal 52(3):1362–1372 ArticleGoogle Scholar
Hazan A, Ramirez R, Maestre E, Perez A, Pertusa A. (2006) In: Applications of evolutionary computing. Lecture notes in computer science, vol 3907, pp 676–687 Google Scholar
Barros R, Ruiz D, Basgalupp M (2011) Inf Sci 181:954–971 ArticleGoogle Scholar
Potgieter G, Engelbrecht A (2008) Expert Syst Appl 35:1513–1532 ArticleGoogle Scholar
Potgieter G, Engelbrecht A (2007) Appl Math Comput 186(2):1441–1466 MathSciNetGoogle Scholar
Sprogar M (2015) Genet Program Evolvable Mach 16:499 ArticleGoogle Scholar
Rivera-Lopez R, Canul-Reich J (2018) IEEE Access 6:5548–5563 ArticleGoogle Scholar
Ferreira C (2006) Gene expression programming: mathematical modeling by an artificial intelligence. Springer, Berlin Google Scholar
Wang W, Li Q, Han S, Lin H (2006) A preliminary study on constructing decision tree with gene expression programming. In: Proceedings of ICICIC’06. IEEE computer society, vol 1, pp 222–225 Google Scholar
Jedrzejowicz J, Jedrzejowicz P (2011) Expert Syst Appl 38(9):10932–39 ArticleGoogle Scholar
Vukobratovic B, Struharik R (2016) Microprocess Microsyst 45B:253–269 ArticleGoogle Scholar
Barros R, Basgalupp M, Carvalho A, Freitas A (2012) IEEE Trans SMC C 42(3):291–312 Google Scholar
Podgorelec V, Sprogar M, Pohorec S (2013) WIREs Data Min Knowl Discov 3:63–82 ArticleGoogle Scholar

Author information

Authors and Affiliations

Faculty of Computer Science, Bialystok University of Technology, Bialystok, Poland Marek Kretowski

Marek Kretowski