Page Header

Clasificación de género basada en señales de voz mediante modelos difusos y algoritmos de optimización

Gender classification based on voice signals using fuzzy models and optimization algorithms

Luis Miguel Cortés-Martinez, Helbert Eduardo Espitia-Cuchango

Abstract - 304 | PDF - 153

Full Text:



En este documento se describe un esquema de clasificación de género, basado en señales de voz, en el que se proponen y prueban 16 modelos difusos diferentes que son optimizados mediante cuatro algoritmos bioinspirados y el método cuasi-Newton. El esquema de clasificación considera cuatro conjuntos de datos y cinco características de voz diferentes para definir los valores de entrada de un algoritmo en el proceso de optimización. Los valores de entrada de cada modelo difuso definen la media y varianza de sus funciones de pertenencia gaussianas, y su desempeño se evalúa mediante los valores de entrada del algoritmo de optimización y el error cuadrático medio como función objetivo para minimizar. Se hace un análisis comparativo entre modelos, algoritmos y conjuntos de datos para obtener conclusiones de acuerdo con los resultados de cada modelo optimizado.


Lógica difusa; optimización; algoritmos genéticos; búsqueda armónica; evolución diferencial; optimización con enjambre de partículas; método cuasi-Newton; clasificación de género


This paper describes a gender classification scheme based on voice signals in which 16 different fuzzy models are proposed and optimized using four bio-inspired optimization algorithms and the quasi-Newton method. The classification scheme considers four data sets and five different voice features to define the input values of an algorithm in the optimization process. The inputs of each fuzzy model define the mean and variance of their Gaussian membership functions, and their fitness is evaluated by the input values of the algorithm and mean squared error as objective function to be minimized. A comparative analysis between models, algorithms and data sets is made to obtain conclusions according to the results of each optimized model.


Fuzzy logic; optimization; genetic algorithms; harmony search; differential evolution; particle swarm optimization; quasi Newton method; gender classification


K. Meena, K. Subramaniam and M. Gomathy, “Gender Classification in Speech Recognition using Fuzzy Logic and Neural Network,” The International Arab Journal of Information Technology, vol. 10 (5), Sept. 2013.

T. Jayasankar, K. Vinothkumar and A. Vijayaselvi, “Automatic gender identification in speech recognition by genetic algorithm,” Appl. Math. Inf. Sci, vol. 11 (3), pp. 907-913, 2017.

S. Lakra, J. Singh and A. K. Singh, “Automated pitch-based gender recognition using an adaptive neuro-fuzzy inference system,” in IEEE International Conference on Intelligent Systems and Signal Processing (ISSP), 2013.

P. Gupta, S. Goel and A. Purwar, “A stacked technique for gender recognition through voice,” in IEEE Eleventh International Conference on Contemporary Computing (IC3), Noida, India, 2018.

P. Kumar, P. Baheti, R. K. Jha, P. Sarmah and K. Sathish, “Voice gender detection using gaussian mixture model,” Journal of Network Communications and Emerging Technologies (JNCET), vol. 8 (4), pp. 132-136, Apr. 2018.

M. Gomathy, K. Meena and K. Subramaniam, “Gender clustering and classification algorithms in speech processing: a comprehensive performance analysis,” International Journal of Computer Applications, vol. 51 (20), pp. 9-17, 2012.

M. P. Gual, “Voice gender identification using deep neural networks running on FPGA,” B.S. thesis, Fac. d’Inf. de Barcelona (FIB), Univ. Politècnica de Catalunya (UPC), 2016.

M. Algabri, M. Alsulaiman, G. Muhammad, M. Zakariah, M. Bencherif and Z. Ali, “Voice and unvoiced classification using fuzzy logic,” Int’l Conf. IP, Comp. Vision, and Pattern Recognition (IPCV’15), pp. 416-420, 2015.

G. Sun, Z. Fan, N. E. Mastorakis, S. D. Kaminaris and X. Zhuang, “The complexity analysis of voiced and unvoiced speech signal based on sample entropy,” in IEEE Fourth International Conference on Mathematics and Computers in Sciences and in Industry, 2017.

S. Jain, P. Jha and R. Suresh, “Design and implementation of an automatic speaker recognition system using neural and fuzzy logic in matlab,” in 2013 Int. Conf. on Signal Processing and Communication (ICSC), Noida, India, 2013.

R. Kiran, K. Nivedha, S. Pavithra Devi and T. Subha, “Voice and speech recognition in Tamil language,” in 2017 Second International Conference on Computing and Communications Technologies (ICCCT’17), 2017.

A. Austermann, N. Esau, L. Kleinjohann and B. Kleinjohann, “Fuzzy emotion recognition in natural speech dialogue,” de ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005.

D. Gharavian, M. Sheikhan, A. Nazerieh and S. Garoucy, “Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network,” Neural Computing and Applications, vol. 21 (8), pp. 2115-2126, May. 2011.

D. Panek, A. Skalski and J. Gajda, “Voice pathology detection by fuzzy logic,” in 2015 IEEE Int. Instrumentation and Measurement Technology Conf. (I2MTC) Proc., Pisa, Italy, 2015.

H. Cordeiro, C. Meneses and J. Fonseca, “Continuous speech classification systems for voice pathologies identification,” in IFIP AICT, vol. 450, L. Camarinha-Matos et al, 2015, pp. 217-224.

A. Asemi, S. S. B. Salim, S. R. Shahamiri, A. Asemi and N. Houshangi, “Adaptive neuro-fuzzy inference system for evaluating dysarthric automatic speech recognition (ASR) systems: a case study on MVML-based ASR,” Springer-Verlag GmbH Germany, part of Springer Nature, Feb. 2018.

F. T. Putri, M. Ariyanto, W. Caesarendra, R. Ismail, K. A. Pambudi and E. D. Pasmanasari, “Low cost parkinson’s disease early detection and classification based on voice and electromyography signal,” in Computational Intelligence for Pattern Recognition. Studies in Computational Intelligence, vol. 777, W. Pedrycz and S. Chen, Ed. 2018, pp. 397-426.

J. Chen, S. Liu and Z. Chen, “Gender classification in live videos,” in IEEE International Conference on Image Processing, Beijing, China, 2017.

G. Amayeh, G. Bebis and M. Nicolescu, “Gender classification from hand shape,” in 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Anchorage, AK, USA, 2008.

J. Lei, J. Zhou and M. Abdel-Mottaleb, “Gender classification using automatically detected and aligned 3D ear range data,” in 2013 IEEE International Conference on Biometrics (ICB), Madrid, Spain, 2013.

A. Bansal, R. Agarwal and R. Sharma, “SVM based gender classification using iris images,” in 2012 Fourth International Conference on Computational Intelligence and Communication Networks, Mathura, India, 2012.

S. S. Lee, H. G. Kim, K. Kim and Y. M. Ro, “Adversarial spatial frequency domain critic learning for age and gender classification,” in 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 2018.

S. E. Bekhouche, A. Ouafi, A. Benlamoudi, A. Taleb-Ahmed and A. Hadid, “Facial age estimation and gender classification using multi-level local phase quantization,” in 2015 3rd International Conference on Control, Engineering & Information Technology (CEIT), May. 2015.

M. Shin, J.-H. Seo and D.-S. Kwon, “Face image-based age and gender estimation with consideration of ethnic difference,” in 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), Lisbon, Portugal, 2017.

D. Yaman, F. I. Eyiokur, N. Sezgin and H. K. Ekenel, “Age and gender classification from ear images” in 2018 International Workshop on Biometrics and Forensics (IWBF), Sassari, Italy, 2018.

P. P. Bonissone, “A fuzzy sets based linguistic approach: theory and applications,” in Proc. of the 1980 Winter Simulation Conf., California, 1980.

T. Haider and M. Yusuf, “A fuzzy approach to energy optimized routing for wireless sensor networks,” The Int. Arab Journal of Information Technology (IAJIT), vol. 6 (2), pp. 179-185, 2009.

M. Gacto, R. Alcalá and F. Herrera, “Interpretability of linguistic fuzzy rule-based systems: An overview of interpretability measures,” Information Sciences, vol. 181, pp. 4340-4360, 2011.

C. Carlsson, “On the relevance of fuzzy sets in analytics,” in On fuzziness, Studies in fuzziness and soft computing 298, vol. 1, pp. 83-89, 2013.

T. R. Razak, J. M. Garibaldi, C. Wagner, A. Pourabdollah and D. Soria, “Interpretability and complexity of design in the creation of fuzzy logic systems — a user study,” in IEEE Symposium Series on Computational Intelligence (SSCI), pp. 420-426, 2018.

J. P. Carvalho, F. Batista and L. Coheur, “A critical survey on the use of fuzzy sets in speech and natural language processing,” in 2012 IEEE World Congress on Computational Intelligence (WCCI), Brisbane, Australia, Jun. 2012.

H.-N. L. Teodorescu, “A retrospective assessment of fuzzy logic applications in voice communications and speech analytics,” International Journal of Computers

C. E. Borges and J. L. Montaña, “Algoritmos bioinspirados,”, 2011 [Online]. Available: os.html. [Accessed: 1- Oct- 2018].

S. Forrest, “Genetic algorithms: Principles of natural selection applied to computation,” in Science, vol. 261(5123), pp. 872-878, Aug.1993.

M. Mitchell, An introduction to genetic algorithms, Cambridge, Massachusetts. London, England: A Bradford Book The MIT Press, 1996.

T. Weise, Global optimization algorithms. Theory and application, 2009.

Z. W. Geem, J. H. Kim and G. V. Loganathan, “A new heuristic optimization algorithm: Harmony search,” Simulation, vol. 76 (2), pp. 60-68, 2001.

Z. W. Geem, “Global optimization using harmony search: Theoretical foundations and applications,” Studies in computacional intelligence, vol. 203, pp. 57-73, 2009.

R. Storn and K. Price, “Differential Evolution - A simple and efficient adaptive scheme for global optimization over continuous spaces,” Journal of Global Optimization, vol. 11 (4), pp. 341-359, 1997.

J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proc. of the IEEE Int. Conf. on Neural Networks, vol. 8 (3), pp. 1943-1948, 1995.

J. C. Bansal, P. K. Singh, M. Saraswat, A. Verma, S. S. Jadon and A. Abraham, “Inertia weight strategies in particle swarm optimization,” in 2011 Third World Congress on Nature and Biologically Inspired Computing (NaBIC), pp. 633-640, 2011.

Y. Shi and R. C. Eberhart, “Empirical study of particle swarm optimization,” in Proc. of IEEE Int. Conf. on Evolutionary Computation., vol. 3, pp. 1945-1950, 1999.

J. Nocedal and S. J. Wright, Numerical optimization, 2 ed., Berlin, Nueva York: Springer Verlag, 2006.

D. F. Shanno and K. H. Phua, “Effective comparison of unconstrained optimization techniques,” Management science, vol. 22 (3), pp. 321-330, Nov. 1975.

M. Contreras and R. A. Tapia, “Sizing the BFGS and DFP updates: A numerical study,” Optim. Theory Appl., vol. 78, pp. 93-108, 1993.

F. Rong, “Audio classification method based on machine learning,” in 2016 International Conference on Intelligent Transportation, Big Data & Smart City, 2017.

A. DeMarco and S. J. Cox, “An accurate and robust gender identification algorithm,” Journal of Neuroscience Methods, vol. 172 (1), pp. 122-130, 2008.

D. E. Rey Lancheros, H. J. Gavilán Acosta y H. E. Espitia Cuchango, “Implementación de un algoritmo para la identificación de usuarios considerando problemas fisiológicos que afectan el habla,” Revista ITECKNE, vol. 14 (2), pp. 131-139, 2017.

T. T. Swee, S. H. S. Salleh and M. R. Jamaludin, “Speech pitch detection using short-time energy,” in International Conference on Computer and Communication Engineering (ICCCE), Kuala Lumpur, Malasia, 2010.

G. Saha, S. Chakroborty and S. Senapati, “A new silence removal and endpoint detection algorithm for speech and speaker recognition applications,” Indian Institute of Technology Kharagpur, Kharagpur, India, 2005.

D. Ortiz P, L. F. Villa, C. Salazar and O. L. Quintero, “A simple but efficient voice activity detection algorithm through Hilbert transform and dynamic threshold for speech pathologies,” in 20th Argentinean Bioengineering Society Congress (SABI 2015), 2016.

E. Schubert and J. Wolfe, “Does timbral brightness scale with frequency and spectral centroid,” Acta Acustica united with Acustica, vol. 92, pp. 820-825, 2006.

J. M. Grey and J. W. Gordon, “Perceptual effects of spectral modifications on musical timbres,” The Journal of the Acoustical Society of America (JASA), vol. 63, pp. 1493-1500, 1978.

R. Thiruvengatanadhan, P. Dhanalakshmi and S. Palanivel, “GMM based indexing and retrieval of music using MFCC and MPEG-7 features,” in Proceedings of the 49th Annual Conference of the Computer Society of India (CSI), Chidambaram, Tamil Nadu, India, 2015.

B. Wu, A. Horner and C. Lee, “Musical timbre and emotion: The identification of salient timbral features in sustained musical instrument tones equalized in attack time and spectral centroid,” in Proc. of 40th International Computer Music Conference (ICMC) 2014 and 11th Sound and Music Computing Conference (SMC), Athens, Greece, 2014.

T. Uzunović, S. Konjicija and I. Turković, “Adjustment of fuzzy reasoning for implementation on microcontroller,” in 2011 18th International Conference on Systems, Signals and Image Processing, Sarajevo, Bosnia-Herzegovina, Jan. 2011.

I. A. Hameed, “Using Gaussian membership functions for improving the reliability and robustness of students’ evaluation systems,” Expert Systems with Applications, vol. 38, p. 7135–7142, 2011.

Liu, Xiang-Jie; Zhou, Xiao-Xin, “Structural analysis of fuzzy controller with gaussian membership function,” in 14th World Congress of International Federation of Automatic Control (IFAC), Beijing, China, 1999.

W. Meiniar, F. A. Afrida, A. Irmasari, A. Mukti and D. Astharini, “Human voice filtering with band-stop filter design in MATLAB,” in 2017 International Conference on Broadband Communication, Wireless Sensors and Powering (BCWSP), Jakarta, Indonesia, 2017.

G. K. Berdibaeva, O. N. Bodin, V. V. Kozlov, D. I. Nefed’ev, K. A. Ozhikenov and Y. A. Pizhonkov, “Pre-processing voice signals for voice recognition systems,” in 18th Int. Conf. of Young Specialists on Micro/Nanotechnologies and Electron Devices (EDM), 2017.

M. Suell Dutra, C. H. Valencia Niño, S. García y Rodolfo, “Codificación y compresión de señales de voz con cuantización vectorial no determinística,” Revista ITECKNE, vol. 6 (1), pp. 14-19, Jun. 2009.

“Wavsource,” [Online]. Available: [Accessed: 15-Sep-2018].

F. R. Jimenez López, C. E. Pardo Beainy and E. A. Gutiérrez Cáceres, “Adaptive filtering implemented over TMS320c6713 DSP platform for system identification,” Revista ITECKNE, vol. 11 (2), pp. 157-171, Dec. 2014.

Abstract - 304 | PDF - 153


  • There are currently no refbacks.
ISSN: 1692-1798 (impreso)
ISSN: 2393-3483 (en línea)