Construcción de clasificadores de malware para agencias de seguridad del Estado

  • David Esteban Useche-Peláez Escuela Colombiana de Ingeniería Julio Garavito
  • Daniel Orlando Díaz-López Escuela Colombiana de Ingeniería Julio Garavito
  • Daniela Sepúlveda-Alzate Escuela Colombiana de Ingeniería Julio Garavito
  • Diego Edison Cabuya-Padilla Comando Conjunto Cibernético
Keywords: Cuckoo sandbox, ciencia de datos, aprendizaje de máquina, análisis de malware, sandboxing

Abstract

El sandboxing ha sido usado de manera regular para analizar muestras de software y determinar si estas contienen propiedades o comportamientos sospechosos. A pesar de que el sandboxing es una técnica poderosa para desarrollar análisis de malware, esta requiere que un analista de malware desarrolle un análisis riguroso de los resultados para determinar la naturaleza de la muestra: goodware o malware. Este artículo propone dos modelos de aprendizaje automáticos capaces de clasificar muestras con base a un análisis de firmas o permisos extraídos por medio de Cuckoo sandbox, Androguard y VirusTotal. En este artículo también se presenta una propuesta de arquitectura de centinela IoT que protege dispositivos IoT, usando uno de los modelos de aprendizaje automáticos desarrollados anteriormente. Finalmente, diferentes enfoques y perspectivas acerca del uso de sandboxing y aprendizaje automático por parte de agencias de seguridad del Estado también son aportados.

Downloads

Download data is not yet available.

Author Biographies

David Esteban Useche-Peláez, Escuela Colombiana de Ingeniería Julio Garavito

Ingeniero de Sistemas (c). Escuela Colombiana de Ingeniería Julio Garavito. Bogotá, Colombia

Daniel Orlando Díaz-López, Escuela Colombiana de Ingeniería Julio Garavito

Ph.D. en Informática. Escuela Colombiana de Ingeniería Julio Garavito. Bogotá, Colombia

Daniela Sepúlveda-Alzate, Escuela Colombiana de Ingeniería Julio Garavito

Ingeniera de Sistemas (c). Escuela Colombiana de Ingeniería Julio Garavito. Bogotá, Colombia

Diego Edison Cabuya-Padilla, Comando Conjunto Cibernético

M.Sc. en Gestión de la Información. Comando Conjunto Cibernético. Bogotá, Colombia.

References

[1] Kaspersky, “Kaspersky Lab detects 360,000 new malicious files daily – up 11.5% from 2016,” 2014. [Online]. Available: https://kaspersky.com/about/press-releases/2017_kaspersky-lab-detects-360000-new-malicious-files-daily. [Accessed: 13-Aug-2018].

[2] M. Sikorski and A. Honig, Practical Malware Analysis : a Hands-On Guide to Dissecting Malicious Software. No Starch Press, 2012.

[3] J. M. Ehrenfeld, “WannaCry, Cybersecurity and Health Information Technology: A Time to Act,” J. Med. Syst., vol. 41, no. 7, p. 104, Jul. 2017.

[4] M. Miettinen, S. Marchal, I. Hafeez, N. Asokan, A.-R. Sadeghi, and S. Tarkoma, “IoT SENTINEL: Automated Device-Type Identification for Security Enforcement in IoT,” in 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), 2017, pp. 2177-2184.

[5] C. Wang, J. Ding, T. Guo, and B. Cui, “A Malware Detection Method Based on Sandbox, Binary Instrumentation and Multidimensional Feature Extraction,” in Advances on Broad-Band Wireless Computing, Communication and Applications, 2018, pp. 427-438.

[6] I. Santos, J. Devesa, F. Brezo, J. Nieves, and P. G. Bringas, “OPEM: A static-dynamic approach for machine-learning-based malware detection,” in Advances in Intelligent Systems and Computing, 2013, vol. 189 AISC, pp. 271-280.

[7] P. Burnap, R. French, F. Turner, and K. Jones, “Malware classification using self organising feature maps and machine activity data,” Comput. Secur., vol. 73, pp. 399-410, Mar. 2018.

[8] S. E. Donaldson, S. G. Siegel, C. K. Williams, and A. Aslam, “Defining the Cybersecurity Challenge,” in Enterprise Cybersecurity Study Guide: How to Build a Successful Cyberdefense Program Against Advanced Threats, Berkeley, CA: Apress, 2018, pp. 3-51.

[9] O. Ferrand, “How to detect the Cuckoo Sandbox and hardeningit ? Keywords.”

[10] T. Teller and A. Hayon, “Enhancing Automated Malware Analysis Machines with Memory Analysis.”

[11] R. Messier, Network Forensics. Wiley, 2017.

[12] D. Oktavianto and I. Muhardianto, Cuckoo malware analysis: analyze malware using Cuckoo Sandbox.

[13] M. A. Waller and S. E. Fawcett, “Data Science, Predictive Analytics, and Big Data: A Revolution That Will Transform Supply Chain Design and Management.”

[14] F. Provost and T. Fawcett, Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking. O’Reilly Media, 2013.

[15] G. S. Nelson, The analytics lifecycle toolkit: a practical guide for an effective analytics capability.

[16] D. (Computer scientist) Dietrich, R. Heller, B. Yang, and EMC Education Services, Data science and big data analytics: discovering, analyzing, visualizing and presenting data.

[17] T. Dunning and B. E. Friedman, Practical machine learning: a new look at anomaly detection. O’Reilly Media, 2014.

[18] H. Chen, R. H. L. Chiang, and V. C. Storey, “Business Intelligence and Analytics: From Big Data to Big Impact,” MIS Quarterly, vol. 36. Management Information Systems Research Center, University of Minnesota, pp. 1165-1188, 2012.

[19] L. Sebastian-Coleman, Navigating the Labyrinth: An Executive Guide to Data Management. Technics Publications, 2018.

[20] A. L’heureux, K. Grolinger, H. F. El Yamany, M. A. M. Capretz, A. L’heureux, and K. Grolinger, “Machine Learning with Big Data: Challenges and Approaches 4 PUBLICATIONS 100 CITATIONS SEE PROFILE,” 2017.

[21] B. Kaluža, Instant Weka how-to: implement cutting-edge data mining aspects in Weka to your applications. Packt Pub, 2013.

[22] D. Tao, S. Member, X. Tang, S. Member, X. Li, and X. Wu, “Asymmetric Bagging and Random Subspace for Support Vector Machines-Based Relevance Feedback in Image Retrieval.”

[23] J. M. G. Anthony J. Viera, “Understanding interobserver agreement: the kappa statistic,” 2005.

[24] C. Willmott and K. Matsuura, “Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance,” Clim. Res., vol. 30, no. 1, pp. 79-82, Dec. 2005.

[25] R. Lippmann et al., “Validating and Restoring Defense in Depth Using Attack Graphs,” in MILCOM 2006, 2006, pp. 1-10.

[26] S. Snapp et al., “DIDS (Distributed Intrusion Detection System) - Motivation, Architecture, and An Early Prototype,” http://www.academia.edu/download/4378230/10.1.1.46.4991.pdf, 2017.

[27] M. Mansoori, I. Welch, and Q. Fu, “YALIH, yet another low interaction honeyclient,” Proc. Twelfth Australas. Inf. Secur. Conf. - Vol. 149, pp. 7-15, 2014.

[28] Symantec Corporation, “ISTR Internet Security Threat Report.,” Mountain View, CA 94043, 2018.

[29] S. Corporation, “ISTR Internet Security Threat Report Volume 23,” Mountain View, CA 94043, 2018.

[30] A. Yokoyama et al., “Sandprint: Fingerprinting malware sandboxes to provide intelligence for sandbox evasion,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, vol. 9854 LNCS, pp. 165-187.

[31] D. Harley, R. Slade, and U. E. Gattiker, “Polymorphism,” in Viruses Revealed: Understand and counter maliciosus software, United States: McGraw-Hill/Osborne, 2001, p. 10.

[32] M. Stephens, “Sandbox,” in Encyclopedia of Cryptography and Security, H. C. A. van Tilborg and S. Jajodia, Eds. Boston, MA: Springer US, 2011, pp. 1075-1078.

[33] Gass S.I., Ed., “Machine Learning,” in Encyclopedia of Operations Research and Management Science, Boston, MA: Springer US, 2013, pp. 909-909.

[34] Z. C. Schreuders, T. McGill, and C. Payne, “The state of the art of application restrictions and sandboxes: A survey of application-oriented access controls and their shortfalls,” Comput. Secur, vol. 32, pp. 219-241, Feb. 2013.

[35] D. P. (Daniel P. Bovet and M. Cesati, Understanding the Linux kernel. United States of America: O’Reilly, 2002.

[36] CGFM, “Comando Conjunto Cibernético,” 2018. [Online]. Available: http://www.ccoc.mil.co/. [Accessed: 13-Aug-2018].

[37] PONAL, “CSIRT - Equipo de Respuesta a Incidentes Informáticos.” [Online]. Available: https://cc-csirt.policia.gov.co/Sandbox. [Accessed: 13-Aug-2018].
Published
2018-12-07
How to Cite
Useche-Peláez, D., Díaz-López, D., Sepúlveda-Alzate, D., & Cabuya-Padilla, D. (2018). Construcción de clasificadores de malware para agencias de seguridad del Estado. ITECKNE, 15(2), 107-121. https://doi.org/https://doi.org/10.15332/iteckne.v15i2.2072
Section
Research and Innovation Articles