Building malware classificators usable by State security agencies
DOI:
https://doi.org/10.15332/iteckne.v15i2.2072Palabras clave:
Cuckoo sandbox, data science, machine learning, malware analysis, sandboxingResumen
Sandboxing has been used regularly to analyze software samples and determine if these contain suspicious properties or behaviors. Even if sandboxing is a powerful technique to perform malware analysis, it requires that a malware analyst performs a rigorous analysis of the results to determine the nature of the sample: goodware or malware. This paper proposes two machine learning models able to classify samples based on signatures and permissions obtained through Cuckoo sandbox, Androguard and VirusTotal. The developed models are also tested obtaining an acceptable percentage of correctly classified samples, being in this way useful tools for a malware analyst. A proposal of architecture for an IoT sentinel that uses one of the developed machine learning model is also showed. Finally, different approaches, perspectives, and challenges about the use of sandboxing and machine learning by security teams in State security agencies are also shared.
Descargas
Citas
Kaspersky, “Kaspersky Lab detects 360,000 new malicious files daily – up 11.5% from 2016,” 2014. [Online]. Available: https://kaspersky.com/about/press-releases/2017_kaspersky-lab-detects-360000-new-malicious-files-daily. [Accessed: 13-Aug-2018].
M. Sikorski and A. Honig, Practical Malware Analysis : a Hands-On Guide to Dissecting Malicious Software. No Starch Press, 2012.
J. M. Ehrenfeld, “WannaCry, Cybersecurity and Health Information Technology: A Time to Act,” J. Med. Syst., vol. 41, no. 7, p. 104, Jul. 2017.
M. Miettinen, S. Marchal, I. Hafeez, N. Asokan, A.-R. Sadeghi, and S. Tarkoma, “IoT SENTINEL: Automated Device-Type Identification for Security Enforcement in IoT,” in 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), 2017, pp. 2177-2184.
C. Wang, J. Ding, T. Guo, and B. Cui, “A Malware Detection Method Based on Sandbox, Binary Instrumentation and Multidimensional Feature Extraction,” in Advances on Broad-Band Wireless Computing, Communication and Applications, 2018, pp. 427-438.
I. Santos, J. Devesa, F. Brezo, J. Nieves, and P. G. Bringas, “OPEM: A static-dynamic approach for machine-learning-based malware detection,” in Advances in Intelligent Systems and Computing, 2013, vol. 189 AISC, pp. 271-280.
P. Burnap, R. French, F. Turner, and K. Jones, “Malware classification using self organising feature maps and machine activity data,” Comput. Secur., vol. 73, pp. 399-410, Mar. 2018.
S. E. Donaldson, S. G. Siegel, C. K. Williams, and A. Aslam, “Defining the Cybersecurity Challenge,” in Enterprise Cybersecurity Study Guide: How to Build a Successful Cyberdefense Program Against Advanced Threats, Berkeley, CA: Apress, 2018, pp. 3-51.
O. Ferrand, “How to detect the Cuckoo Sandbox and hardeningit ? Keywords.”
T. Teller and A. Hayon, “Enhancing Automated Malware Analysis Machines with Memory Analysis.”
R. Messier, Network Forensics. Wiley, 2017.
D. Oktavianto and I. Muhardianto, Cuckoo malware analysis: analyze malware using Cuckoo Sandbox.
M. A. Waller and S. E. Fawcett, “Data Science, Predictive Analytics, and Big Data: A Revolution That Will Transform Supply Chain Design and Management.”
F. Provost and T. Fawcett, Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking. O’Reilly Media, 2013.
G. S. Nelson, The analytics lifecycle toolkit: a practical guide for an effective analytics capability.
D. (Computer scientist) Dietrich, R. Heller, B. Yang, and EMC Education Services, Data science and big data analytics: discovering, analyzing, visualizing and presenting data.
T. Dunning and B. E. Friedman, Practical machine learning: a new look at anomaly detection. O’Reilly Media, 2014.
H. Chen, R. H. L. Chiang, and V. C. Storey, “Business Intelligence and Analytics: From Big Data to Big Impact,” MIS Quarterly, vol. 36. Management Information Systems Research Center, University of Minnesota, pp. 1165-1188, 2012.
L. Sebastian-Coleman, Navigating the Labyrinth: An Executive Guide to Data Management. Technics Publications, 2018.
A. L’heureux, K. Grolinger, H. F. El Yamany, M. A. M. Capretz, A. L’heureux, and K. Grolinger, “Machine Learning with Big Data: Challenges and Approaches 4 PUBLICATIONS 100 CITATIONS SEE PROFILE,” 2017.
B. Kaluža, Instant Weka how-to: implement cutting-edge data mining aspects in Weka to your applications. Packt Pub, 2013.
D. Tao, S. Member, X. Tang, S. Member, X. Li, and X. Wu, “Asymmetric Bagging and Random Subspace for Support Vector Machines-Based Relevance Feedback in Image Retrieval.”
J. M. G. Anthony J. Viera, “Understanding interobserver agreement: the kappa statistic,” 2005.
C. Willmott and K. Matsuura, “Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance,” Clim. Res., vol. 30, no. 1, pp. 79-82, Dec. 2005.
R. Lippmann et al., “Validating and Restoring Defense in Depth Using Attack Graphs,” in MILCOM 2006, 2006, pp. 1-10.
S. Snapp et al., “DIDS (Distributed Intrusion Detection System) - Motivation, Architecture, and An Early Prototype,” http://www.academia.edu/download/4378230/10.1.1.46.4991.pdf, 2017.
M. Mansoori, I. Welch, and Q. Fu, “YALIH, yet another low interaction honeyclient,” Proc. Twelfth Australas. Inf. Secur. Conf. - Vol. 149, pp. 7-15, 2014.
Symantec Corporation, “ISTR Internet Security Threat Report.,” Mountain View, CA 94043, 2018.
S. Corporation, “ISTR Internet Security Threat Report Volume 23,” Mountain View, CA 94043, 2018.
A. Yokoyama et al., “Sandprint: Fingerprinting malware sandboxes to provide intelligence for sandbox evasion,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, vol. 9854 LNCS, pp. 165-187.
D. Harley, R. Slade, and U. E. Gattiker, “Polymorphism,” in Viruses Revealed: Understand and counter maliciosus software, United States: McGraw-Hill/Osborne, 2001, p. 10.
M. Stephens, “Sandbox,” in Encyclopedia of Cryptography and Security, H. C. A. van Tilborg and S. Jajodia, Eds. Boston, MA: Springer US, 2011, pp. 1075-1078.
Gass S.I., Ed., “Machine Learning,” in Encyclopedia of Operations Research and Management Science, Boston, MA: Springer US, 2013, pp. 909-909.
Z. C. Schreuders, T. McGill, and C. Payne, “The state of the art of application restrictions and sandboxes: A survey of application-oriented access controls and their shortfalls,” Comput. Secur, vol. 32, pp. 219-241, Feb. 2013.
D. P. (Daniel P. Bovet and M. Cesati, Understanding the Linux kernel. United States of America: O’Reilly, 2002.
CGFM, “Comando Conjunto Cibernético,” 2018. [Online]. Available: http://www.ccoc.mil.co/. [Accessed: 13-Aug-2018].
PONAL, “CSIRT - Equipo de Respuesta a Incidentes Informáticos.” [Online]. Available: https://cc-csirt.policia.gov.co/Sandbox. [Accessed: 13-Aug-2018].
Descargas
Publicado
Cómo citar
Número
Sección
Licencia
La revista ITECKNE se encuentra registrada bajo una licencia de Creative Commons Reconocimiento-NoComercial 4.0 Internacional Por lo tanto, esta obra se puede reproducir, distribuir y comunicar públicamente, siempre que se reconozca el nombre de los autores y a la Universidad Santo Tomás. Se permite citar, adaptar, transformar, autoarchivar, republicar y crear a partir del material, siempre que se reconozca adecuadamente la autoría, se proporcione un enlace a la obra original y se indique si se han realizado cambios.
La Revista ITECKNE no retiene los derechos sobre las obras publicadas y los contenidos son responsabilidad exclusiva de los autores, quienes conservan sus derechos morales, intelectuales, de privacidad y publicidad. Sin embargo esta facultada para editar, publicar, reproducir y distribuir tanto en medios impresos como digitales, además de incluir el artículo en índices internacionales y/o bases de datos, de igual manera, se faculta a la editorial para utilizar las imágenes, tablas y/o cualquier material gráfico presentado en el artículo para el diseño de carátulas o posters de la misma revista.


