•  
  •  
 

Abstract

Deaf communities still struggle with communication, partly due to the inefficiency of current sign language recognition systems, their poor generalization, and their inability to manage regional and linguistictions. This work suggests a novel architecture that blends attention-based spatiotemporal processing (RTC-ModeRNN-BSF) with a reinforcement threshold–controlled ModeRNN to solve these problems. The model adapts its computation based on the complexity of the input gesture, using between 2 and 8 attention slots, while gradually reducing exploration during training ($\varepsilon$: 0.9→ 0.1). Dual-stream memory pathways are optimized using joint log-likelihood maximization (J-Star) and computational pruning (Q-Max) to capture both immediate sequential patterns (Ct) and hierarchical spatiotemporal dependencies (Mt). The hybrid gradient descent using the Adam W optimizer ensures dependable convergence while avoiding feature memorization. The proposed system converges 47% faster than conventional techniques, with an average classification accuracy of 99% across datasets of American Sign Language (ASL), Indian Sign Language (ISL), and Chinese Sign Language (CSL). Furthermore, it shows notable cross-lingual adaptation with 78.5% accuracy on unseen sign languages without retraining, consistently maintaining 93–97% performance under real-world challenges such as partial occlusion, changing lighting, and increasing signing speeds.

Keywords

Attention, Bilingual, Memory transition, Reinforcement, Spatiotemporal, Unified threshold

Subject Area

Computer Science

Article Type

Article

First Page

1694

Last Page

1710

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

 
COinS