role of human computer interaction research paper

Information

Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

Active Journals
Find a Journal
Proceedings Series
For Authors
For Reviewers
For Editors
For Librarians
For Publishers
For Societies
For Conference Organizers
Open Access Policy
Institutional Open Access Program
Special Issues Guidelines
Editorial Process
Research and Publication Ethics
Article Processing Charges
Testimonials
Preprints.org
SciProfiles
Encyclopedia

Article Menu

role of human computer interaction research paper

Subscribe SciFeed
Recommended Articles
Google Scholar
on Google Scholar
Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

A review of human–computer interaction and virtual reality research fields in cognitive infocommunications.

1. Introduction

2. overview of international conference on cognitive infocommunications (coginfocom) and its special issues, 3. related papers, 3.1. human–computer interaction (hci), 3.2. virtual reality (vr), 4. discussion and conclusions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest.

Baranyi, P.; Csapo, A. Cognitive Infocommunications: Coginfocom. In Proceedings of the 2010 11th International Symposium on Computational Intelligence and Informatics (CINTI), Budapest, Hungary, 18–20 November 2010; pp. 141–146. [ Google Scholar ]
Baranyi, P.; Csapó, Á. Definition and Synergies of Cognitive Infocommunications. Acta Polytech. Hung. 2012 , 9 , 67–83. [ Google Scholar ]
Sallai, G. The Cradle of Cognitive Infocommunications. Acta Polytech. Hung. 2012 , 9 , 171–181. [ Google Scholar ]
Izsó, L. The Significance of Cognitive Infocommunications in Developing Assistive Technologies for People with Non-Standard Cognitive Characteristics: CogInfoCom for People with Non-Standard Cognitive Characteristics. In Proceedings of the 2015 6th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Gyor, Hungary, 19–21 October 2015; pp. 77–82. [ Google Scholar ]
Katona, J.; Ujbanyi, T.; Sziladi, G.; Kovari, A. Speed Control of Festo Robotino Mobile Robot Using NeuroSky MindWave EEG Headset Based Brain-Computer Interface. In Proceedings of the 2016 7th IEEE international conference on cognitive infocommunications (CogInfoCom), Wrocław, Poland, 16–18 October 2016; pp. 251–256. [ Google Scholar ]
Tariq, M.; Uhlenberg, L.; Trivailo, P.; Munir, K.S.; Simic, M. Mu-Beta Rhythm ERD/ERS Quantification for Foot Motor Execution and Imagery Tasks in BCI Applications. In Proceedings of the 2017 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Debrecen, Hungary, 11–14 September 2017; pp. 91–96. [ Google Scholar ]
Katona, J.; Kovari, A. Examining the Learning Efficiency by a Brain-Computer Interface System. Acta Polytech. Hung. 2018 , 15 , 251–280. [ Google Scholar ]
Katona, J.; Kovari, A. The Evaluation of BCI and PEBL-Based Attention Tests. Acta Polytech. Hung. 2018 , 15 , 225–249. [ Google Scholar ]
Sziladi, G.; Ujbanyi, T.; Katona, J.; Kovari, A. The Analysis of Hand Gesture Based Cursor Position Control during Solve an IT Related Task. In Proceedings of the 2017 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Debrecen, Hungary, 11–14 September 2017; pp. 413–418. [ Google Scholar ]
Csapo, A.B.; Nagy, H.; Kristjánsson, Á.; Wersényi, G. Evaluation of Human-Myo Gesture Control Capabilities in Continuous Search and Select Operations. In Proceedings of the 2016 7th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Wrocław, Poland, 16–18 October 2016; pp. 415–420. [ Google Scholar ]
Zsolt, J.; Levente, H. Improving Human-Computer Interaction by Gaze Tracking. In Proceedings of the 2012 IEEE 3rd International Conference on Cognitive Infocommunications (CogInfoCom), Kosice, Slovakia, 2–5 December 2010; pp. 155–160. [ Google Scholar ]
Török, Á.; Török, Z.G.; Tölgyesi, B. Cluttered Centres: Interaction between Eccentricity and Clutter in Attracting Visual Attention of Readers of a 16th Century Map. In Proceedings of the 2017 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Debrecen, Hungary, 11–14 September 2017; pp. 433–438. [ Google Scholar ]
Hercegfi, K.; Komlódi, A.; Köles, M.; Tóvölgyi, S. Eye-Tracking-Based Wizard-of-Oz Usability Evaluation of an Emotional Display Agent Integrated to a Virtual Environment. Acta Polytech. Hung. 2019 , 16 , 145–162. [ Google Scholar ]
Kovari, A.; Katona, J.; Costescu, C. Evaluation of Eye-Movement Metrics in a Software Debbuging Task Using Gp3 Eye Tracker. Acta Polytech. Hung. 2020 , 17 , 57–76. [ Google Scholar ] [ CrossRef ]
Ujbanyi, T.; Katona, J.; Sziladi, G.; Kovari, A. Eye-Tracking Analysis of Computer Networks Exam Question besides Different Skilled Groups. In Proceedings of the 2016 7th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Wrocław, Poland, 16–18 October 2016; pp. 277–282. [ Google Scholar ]
Garai, Á.; Attila, A.; Péntek, I. Cognitive Telemedicine IoT Technology for Dynamically Adaptive EHealth Content Management Reference Framework Embedded in Cloud Architecture. In Proceedings of the 2016 7th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Wrocław, Poland, 16–18 October 2016; pp. 187–192. [ Google Scholar ]
Solvang, B.; Sziebig, G.; Korondi, P. Shop-Floor Architecture for Effective Human-Machine and Inter-Machine Interaction. Acta Polytech. Hung. 2012 , 9 , 183–201. [ Google Scholar ]
Torok, A. From Human-Computer Interaction to Cognitive Infocommunications: A Cognitive Science Perspective. In Proceedings of the 2016 7th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Wrocław, Poland, 16–18 October 2016; pp. 433–438. [ Google Scholar ]
Siegert, I.; Bock, R.; Wendemuth, A.; Vlasenko, B.; Ohnemus, K. Overlapping Speech, Utterance Duration and Affective Content in HHI and HCI-An Comparison. In Proceedings of the 2015 6th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Gyor, Hungary, 19–21 October 2015; pp. 83–88. [ Google Scholar ]
Markopoulos, E.; Lauronen, J.; Luimula, M.; Lehto, P.; Laukkanen, S. Maritime Safety Education with VR Technology (MarSEVR). In Proceedings of the 2019 10th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Naples, Italy, 23–25 October 2019; pp. 283–288. [ Google Scholar ]
Al-Adawi, M.; Luimula, M. Demo Paper: Virtual Reality in Fire Safety–Electric Cabin Fire Simulation. In Proceedings of the 2019 10th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Naples, Italy, 23–25 October 2019; pp. 551–552. [ Google Scholar ]
Korečko, Š.; Hudák, M.; Sobota, B.; Marko, M.; Cimrová, B.; Farkaš, I.; Rosipal, R. Assessment and Training of Visuospatial Cognitive Functions in Virtual Reality: Proposal and Perspective. In Proceedings of the 2018 9th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Budapest, Hungary, 22–24 August 2018; pp. 39–44. [ Google Scholar ]
Budai, T.; Kuczmann, M. Towards a Modern, Integrated Virtual Laboratory System. Acta Polytech. Hung. 2018 , 15 , 191–204. [ Google Scholar ]
Csapó, G. Sprego Virtual Collaboration Space. In Proceedings of the 2017 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Debrecen, Hungary, 11–14 September 2017; pp. 137–142. [ Google Scholar ]
Kvasznicza, Z. Teaching Electrical Machines in a 3D Virtual Space. In Proceedings of the 2017 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Debrecen, Hungary, 11–14 September 2017; pp. 385–388. [ Google Scholar ]
Bujdosó, G.; Novac, O.C.; Szimkovics, T. Developing Cognitive Processes for Improving Inventive Thinking in System Development Using a Collaborative Virtual Reality System. In Proceedings of the 2017 8th IEEE international conference on cognitive infocommunications (coginfocom), Debrecen, Hungary, 11–14 September 2017; pp. 79–84. [ Google Scholar ]
Kovari, A. CogInfoCom Supported Education: A Review of CogInfoCom Based Conference Papers. In Proceedings of the 2018 9th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Budapest, Hungary, 22–24 August 2018; pp. 000233–000236. [ Google Scholar ]
Csapo, A.; Horváth, I.; Galambos, P.; Baranyi, P. VR as a Medium of Communication: From Memory Palaces to Comprehensive Memory Management. In Proceedings of the 2018 9th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Budapest, Hungary, 22–24 August 2018. [ Google Scholar ]
Horváth, I. Evolution of Teaching Roles and Tasks in VR/AR-Based Education. In Proceedings of the 2018 9th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Budapest, Hungary, 22–24 August 2018; pp. 355–360. [ Google Scholar ]
Kovács, A.D.; Kvasznicza, Z. Use of 3D VR Environment for Educational Administration Efficiency Purposes. In Proceedings of the 2018 9th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Budapest, Hungary, 22–24 August 2018; pp. 361–366. [ Google Scholar ]
Lampert, B.; Pongrácz, A.; Sipos, J.; Vehrer, A.; Horvath, I. MaxWhere VR-Learning Improves Effectiveness over Clasiccal Tools of e-Learning. Acta Polytech. Hung. 2018 , 15 , 125–147. [ Google Scholar ]
Komlósi, L.I.; Waldbuesser, P. The Cognitive Entity Generation: Emergent Properties in Social Cognition. In Proceedings of the 2015 6th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Gyor, Hungary, 19–21 October 2015; pp. 439–442. [ Google Scholar ]
Kövecses-Gosi, V. Cooperative Learning in VR Environment. Acta Polytech. Hung. 2018 , 15 , 205–224. [ Google Scholar ]
Horváth, I. The IT Device Demand of the Edu-Coaching Method in the Higher Education of Engineering. In Proceedings of the 2017 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Debrecen, Hungary, 11–14 September 2017; pp. 379–384. [ Google Scholar ]
Horvath, I. Innovative Engineering Education in the Cooperative VR Environment. In Proceedings of the 2016 7th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Wrocław, Poland, 16–18 October 2016; pp. 359–364. [ Google Scholar ]
Edler, D.; Keil, J.; Wiedenlübbert, T.; Sossna, M.; Kühne, O.; Dickmann, F. Immersive VR Experience of Redeveloped Post-Industrial Sites: The Example of “Zeche Holland” in Bochum-Wattenscheid. KN-J. Cartogr. Geogr. Inf. 2019 , 69 , 267–284. [ Google Scholar ] [ CrossRef ] [ Green Version ]
Boletsis, C.; Cedergren, J.E. VR Locomotion in the New Era of Virtual Reality: An Empirical Comparison of Prevalent Techniques. Adv. Hum. Comput. Interact. 2019 , 2019 . [ Google Scholar ] [ CrossRef ]
Hruby, F.; Castellanos, I.; Ressl, R. Cartographic Scale in Immersive Virtual Environments. KN-J. Cartogr. Geogr. Inf. 2020 , 1–7. [ Google Scholar ] [ CrossRef ]
Lokka, I.E.; Çöltekin, A.; Wiener, J.; Fabrikant, S.I.; Röcke, C. Virtual Environments as Memory Training Devices in Navigational Tasks for Older Adults. Sci. Rep. 2018 , 8 , 10809. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Hruby, F. The Sound of Being There: Audiovisual Cartography with Immersive Virtual Environments. KN-J. Cartogr. Geogr. Inf. 2019 , 69 , 19–28. [ Google Scholar ] [ CrossRef ] [ Green Version ]

Click here to enlarge figure

Year	Special Issues	Editor(s)
2021	Applications of Cognitive Infocommunications (CogInfoCom)	J. Katona
2021	Digital Transformation Environment for Education in the Space of CogInfoCom	Gy. Molnar
2020	Special Issue on Digital Transformation Environment for Education in the Space of CogInfoCom	Gy. Molnar
2019	Special Issue on Cognitive Infocommunications	P. Baranyi
2019	Special Issue on Cognitive Infocommunications	P. Baranyi
2018	Joint Special Issue on TP Model Transformation and Cognitive Infocommunications	P. Baranyi
2018	Special Issue on Cognitive Infocommunications	P. Baranyi
2015	CogInfoCom Enabled Research and Applications in Engineering	B. Solvang, W.D. Solvang
2014	Knowledge Bases for Cognitive Infocommunications Systems	P. Baranyi, H. Fujita
2014	Multimodal Interfaces in Cognitive Infocommunication Systems	P. Baranyi, A. Csapo
2014	Speechability of CogInfoCom Systems	A. Esposito, K. Vicsi
2013	Special Issue on Cognitive Infocommunications	P. Baranyi
2012	CogInfoCom 2012	H. Charaf
2012	Cognitive Infocommunications	P. Baranyi, G. Sallai, A. Csapo
2012	Cognitive Infocommunications	P. Baranyi, H. Hashimoto, G. Sallai

Authors	Work	Year	The area of applicability of the results	HCI/HMI Component
J. Katona et al.	[ ]	2016	human-robot interaction, mobile robots, velocity control	EEG-based BCI
M. Tariq et al.	[ ]	2017	medical signal detection, robotics, neurophysiology, patient rehabilitation	EEG-based BCI
J. Katona et al.	[ , ]	2018	education, observe the level of vigilance, cognitive actions	EEG-based BCI
G. Sziladi et al.	[ ]	2017	gesture recognition, human-computer interaction, mouse controllers (computers), controlling systems	Gesture Control
B. A. Csapo et al.	[ ]	2016	gesture recognition, haptic interfaces, image motion analysis, motion control, auditory control	Gesture Control
J. Zsolt et al.	[ ]	2012	human-computer interaction, emotional recognition, iris detection	Eye/Gaze tracking
A. Torok et al.	[ ]	2017	data visualization, user interfaces, task analysis, cognition	Eye/Gaze tracking
K. Hercegfi et al.	[ ]	2019	human-robot interaction, virtual reality, virtual agent	Eye/Gaze tracking
A. Kovari et al.	[ ]	2020	programming, debugging, education	Eye/Gaze tracking
T. Ujbanyi et al.	[ ]	2016	computer network, visualization, education	Eye/Gaze tracking
A. Garai et al.	[ ]	2016	medical computing, telemedicine, cloud computing, health care, embedded systems	Body-sensors
B. Solvang et al.	[ ]	2012	human-machine interaction, inter-machine interaction, manufacturing equipment	Shop-Floor architecture

Authors	Work	Year	The area of applicability of the results	VR Application
E. Markopoulos et al.	[ ]	2019	computer based training, marine engineering, ergonomics, maritime safety training	MarSEVR (Maritime Safety Education with VR)
M. Al-Adawi et al.	[ ]	2019	simulation, fire safety, computer-based training, industrial training, occupational safety	Electric Cabin Fire Simulation
Š. Korečko et al.	[ ]	2018	visuospatial cognitive functions, computer games, cognition, neurophysiology	CAVE system
T. Budai	[ ]	2018	virtual laboratory, design, simulation, education, learning management system	MaxWhere 3D VR Framework
G. Csapo	[ ]	2017	computer science education, spreadsheet programs, collaboration, problem solving	MaxWhere 3D VR Framework
Z. Kvasznicza	[ ]	2017	electrical engineering computing/training, education, computer animation, electric machines, mechatronics	3D VR educational environment of a pilot project
G. Bujdoso	[ ]	2017	computer science education, computer aided instruction, collaborative work, iVR system, inventive thinking	MaxWhere 3D VR Framework
A. Kovari	[ ]	2018	engineering education, learning, problem solving, cognition, mathematics computing	-
A. Csapo	[ ]	2018	comprehensive memory management, cognition, AI-enhanced CogInfoCom	MaxWhere 3D VR Framework
I. Horvath	[ ]	2018	computer aided instruction, teaching, user interfaces, e-learning platform, digital workflows	MaxWhere 3D VR Framework
A. D. Kovacs et al.	[ ]	2019	educational administrative data processing, teaching, cooperation, collaboration	MaxWhere 3D VR Framework
B. Lampert et al.	[ ]	2018	education, VR-learning, workflow, digital content sharing	MaxWhere 3D VR Framework
I. L. Komlosi et al.	[ ]	2016	cognitive entity generation, social cognition, information processing, digital culture, knowledge management	MaxWhere 3D VR Framework
V. Kovecses-Gosi	[ ]	2018	cooperative learning, teaching methodology, digital culture, interactive learning-teaching	MaxWhere 3D VR Framework
I. Horvath	[ ]	2017	computer aided instruction, edu-coaching, educational, informatics	MaxWhere 3D VR Framework
I. Horvath	[ ]	2016	engineering education, innovation management, computer aided instruction, visualization ICT	Virtual Collaboration Arena (VirCA)
D. Edler et al.	[ ]	2019	3D cartography, multimedia cartography, urban transformation, navigation, constructivism	3D iVR based Unreal Engine 4 (UE4)
C. Boletsis et al.	[ ]	2019	VR locomotion techniques, human-computer interaction, user experience	3D iVR based Unreal Engine 4 (UE4)
F. Hruby et al.	[ ]	2020	VR, scale, immersion, Immersive virtual environments	highly immersive VR-system (HIVE)
I. E. Lokka et al.	[ ]	2018	VR navigational tasks, memory training, older adults, cognitive training	Mixed Virtual Environment (MixedVR)
F. Hruby	[ ]	2019	Immersion, spatial presence, immersive virtual environments, audiovisual cartography	Immersive Virtual Environments (IVE), GeoIVE

MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

Katona, J. A Review of Human–Computer Interaction and Virtual Reality Research Fields in Cognitive InfoCommunications. Appl. Sci. 2021 , 11 , 2646. https://doi.org/10.3390/app11062646

Katona J. A Review of Human–Computer Interaction and Virtual Reality Research Fields in Cognitive InfoCommunications. Applied Sciences . 2021; 11(6):2646. https://doi.org/10.3390/app11062646

Katona, Jozsef. 2021. "A Review of Human–Computer Interaction and Virtual Reality Research Fields in Cognitive InfoCommunications" Applied Sciences 11, no. 6: 2646. https://doi.org/10.3390/app11062646

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

Subscribe to receive issue release notifications and newsletters from MDPI journals

A Collaborative Model of Feedback in Human-Computer Interaction

Feedback plays an important role in human-computer interaction. It provides the user with evidence of closure, thus satisfying the communication expectations that users have when engaging in a dialogue. In this paper we present a model identifying five feedback states that must be communicated to the user to fulfill the communication expectations of a dialogue. The model is based on a linguistics theory of conversation, but is applied to a graphical user interface. An experiment is described in which we test users' expectations and their behavior when those expectations are not met. The model subsumes some of the temporal requirements for feedback previously reported in the human- computer interaction literature.

Human-computer dialogues, feedback, conversational dialogues, states of understanding, collaborative view of conversations.

INTRODUCTION

Adequate feedback is a necessary component of both human-human and human-computer interaction. In human conversations, we use language, gestures, and body language to inform our conversational partners that we have heard and understood their communication. These communicative events fulfill a very important role in a conversation; they satisfy certain communication expectations of the dialogue participant. To understand these expectations imagine what happens when they are missing. In a conversation, if you do not hear a short uh hum or see a nod of the head, you might think that the other person did not understand and as a result you might interrupt the conversation while you question the listener's attentiveness; "hey, are you awake?"

This idea of communication expectations, also called "psychological closure" [16, 25], is a common human behavioral characteristic that also exists when we are interacting with a computer. For example, if the user types a command but receives no response from the user interface, the user might repeatedly press the return key to make sure that the system has "heard" (or received) the command. At times it is not even clear "who is waiting for whom" [17].

Many studies of feedback have been reported in the literature. Some deal with response times guidelines for feedback, others have addressed performance penalties caused by delayed responses [26], while others deal with the form feedback should take. What is lacking in all of these studies is a behavioral model that explains users' communication expectations and their behavior. In this paper we present such a model of feedback based on principles of conversation derived from linguistic theory and on graphical user interface design guidelines. We also describe an experiment in which we test some of the expectations proposed by the model.

Feedback, in most of the human-computer interaction literature, refers to communication from the system to the user as a direct result of a user's action [24]. Most user interface design guidelines also stress the importance of this type of feedback, for example [1] recommends "keep the user informed" and use feedback to confirm "that the operation is being carried out, and (eventually) that it's finished." Gaines [10] states that the response to a user's action "should be sufficient to identify the type of activity taking place."

However, feedback also can be used to communicate the state of the system independently of the user's action [1]. The system, at times, can be busy processing incoming events like mail messages, or alarm notifications that the user has preset days in advance. In these cases, the system must let the user know its current state of processing so that the user does not feel frustrated or locked out of the dialogue.

Foley and van Dam [8] describe feedback in terms of interpersonal conversations, such as responses to questions or "signs of attentiveness." Both of these conversational devices are classified as forms of feedback.

What is common in all of these views of feedback is that it serves a behavioral purpose in interaction. This purpose represents the communication expectations that humans have from a conversational participant, even when this conversational participant is a computer system.

Collaborative View of Conversations

Our model is based on Clark, et. al [5-7] who view a conversation between two humans as a collaborative process. Previously, conversations were seen as turns taken by two participants, where the turns were related by a series of adjacency pair relations. The collaborative view extends this turn-taking model by looking at the cooperation between conversational partners to agree on common ground, sometimes without taking extra turns.

The collaborative model states that conversations are really negotiations over the contents of the conversation between two participants. One participant, the speaker, makes a presentation and the other participant, the listener, accepts it. A contribution to a conversation occurs only after the presentation is accepted. It is defined to be the pair of presentation and acceptance turns.

This theory of contributions to conversations has already been applied in the study of human-computer dialogues. Payne [19, 20] analyzed the interaction in MacDraw based on presentation/acceptance trees. Brennan and Hulteen have defined a model of adaptive speech feedback [3] based on the states of understanding principles. A similar model is used in Apple Computer's speech input system [15], PlainTalk. The different states of feedback are identified with icons, as shown below.



sleeping	listening	hearing	working

States of Understanding

States of understanding (SOU) are the states that a listener believes he or she is in, after receiving a presentation from the speaker [6]. SOU is one of many ways participants in conversations fulfill their goals in the collaborative process. The listener provides evidence of his or her SOU thus allowing the speaker to adapt according to the specific SOU.

The states are defined as follows. When speaker A issues an utterance, listener B believes he or she is in one of the following states: State 0: B is not aware of the utterance State 1: B is aware of the utterance, but did not hear it State 2: B heard the utterance, but did not understand it State 3: B fully understood the utterance

Evidence of understanding (i.e. identification of the state) must be provided by the listener, thus allowing the other participant to adapt accordingly. For example, if after presenting a question, the speaker receives evidence that the listener is in state 1, the speaker might repeat or rephrase the question to help the listener reach state 3. The two partners follow the principle of least collaborative effort [7]. This principle states that speaker and hearer will try to minimize the collaborative effort it takes to reach a level of understanding. If a SOU is not clearly communicated by the listener, the speaker will seek further evidence that the communication was received. This is a form of repair behavior guided by the SOU principles.

The SOU can be communicated without using a verbal utterance. For example, the use of a confused facial expression can be enough to indicate that the listener is in state 2. Also, several signals together can be used to signal a single state. A facial expression together with a hand gesture, for example, can be interpreted as a single indication of a state.

PROPOSED MODEL

Human behavior in a human-computer dialogue, when a communication expectation is not met, is similar to the repair behavior found in human-human dialogues. For example, consider the use of the performance meter on a workstation. Many workstation users keep one at the bottom of their screen to get an indication of the amount of work the CPU is currently performing. This information is used to identify whether the system is locked or just slow because it is overloaded. When users initiate an action but obtain no response within some expected amount of time, they face one of two situations: did the computer received my action or should I repeat it? The performance meter allows the user to determine if the system is busy, and thus may have received the action but is just slow in the response. If the system is not busy, the action was lost or ignored in which case the user must repeat the action if the results are desired. In this example, the user interface of the application in use did not meet the user's communication expectations and thus the user had to resort to an extra "device" (the performance meter) to find out extra information to match the communication expectations.

Based on the states of understanding principles and on graphical user interface design guidelines, such as [1], we have identified a number of feedback states needed in a human-computer dialogue to meet the human's communication expectations.

Our model contains five simulated states of understanding (SSOU): ready, processing, reporting, busy-no- response, and busy-delayed-response. Each state is intended to produce conversational behaviors on the part of the user similar to those produced by the SOU in human-human conversations. The model prescribes the type of feedback to be provided based on communication expectations but not the specific form in which it must be provided. The states are called simulated to emphasize the notion that this is not a model of understanding or of some other cognitive process. The model represents the communication expectations from a user's point of view. It does not address the communication expectations that the computer could or should have in a human-computer dialogue, see [17] for a discussion of this issue.

A user interface is a collection of concurrent dialogues between the user and the computing system. For each one of these dialogues, the user interface must provide some form of evidence about the state of the dialogue. Each dialogue is always in one (and only one) of the states shown in Figure 2. In the performance meter example above, the application had possibly several dialogues with the user, and the performance meter had at least another one. In addition, the state of the dialogue can be indicated using more than one signal, as it is often done in human conversations. For example, a cursor change and a progress bar are used at the same time in some situations to indicate that a particular dialogue is processing a user request.

The ready, processing, and reporting states are known as the internal loop and represent feedback as responses to users' actions. The other two states are collectively known as the busy states and represent feedback as an indicator of system state. The notation used on Figure 2 is a modification of that presented in [12]. The internal loop has a history component. When a transition is made to one of the busy states, the internal loop "remembers" the state in which it was. When the transition is made back into the loop, it returns to the state "remembered." This is used as a short hand to avoid explicit portrayal of transitions from every node in the internal loop to both busy states and back.

Ready State

The system must provide evidence that it is in the ready state when it can receive and process user actions. When evidence of this state is presented, the user might initiate the next action and will expect to see some form of feedback indicating that the action has been received by the system. The behavioral role of this state is to inform the user that the system is ready to accept the next action.

Processing State

In the processing state, the system communicates to the user that an action was received in the ready state, and that something is being processed before providing the results of the action. Note that no results are displayed in this state, that is done in the reporting state. The behavioral role here is to inform the user that his or her action was received and it is being processed.

Sometimes, the processing state appears to be skipped if the results can be calculated so fast that there is no apparent delay in it. An example of this situation is dragging an object with the mouse. The object being dragged follows the mouse location continuously, giving the appearance of no time spent on the processing state.

On the other hand, the processing state sometimes requires a separate notification. For example, copying a very large number of files on the Macintosh Finder can produce feedback of processing and reporting. The system shows the processing state before it starts copying the files by displaying the message "Preparing to copy files�" This is a case where a separate processing feedback is used from the feedback used in the reporting state due to the length of time spent in the processing state.

Reporting State

In the reporting state, the system informs the user of the results of the action initiated in the ready state. Once this information is given to the user, the system is expected to go back to the ready state where it will be ready to accept the next user action. The behavioral role here is to inform the user of the results of a user action, thus providing closure to the action.

Busy States

Our model includes two busy states that are used to provide feedback about system state. The busy-no-response state is a state in which the system is "unaware" of any possible action the user might initiate. While in this state, all user initiated actions are ignored and lost. In the other busy state, busy-delayed-response, all user initiated actions are saved and processed later. This state has been called "type-ahead" in command line interfaces.

It is important that the busy states be signalled to the user to maintain a "sense of presence" [17] of the interface. It has been recommended that some form of "placebo" [9] is provided to ensure the user the system has not crashed. In our model we have subclassified the busy states into two separate states because of the different behavioral role each fulfills in the dialogue.

If the user is presented with evidence that the system is in the busy-no-response state, the user will opt to wait until the system returns to the ready state, since all actions are ignored by the system. This is the traditional busy state in most user interfaces. On the other hand, if the system is in busy-delayed-response and the user is aware of it, then the user might take advantage of the state and perform some actions even though no immediate response is provided. This has been seen in command line interfaces for years, but it also occurs in graphical interfaces.

In the busy-no-response state, the behavioral role is to communicate to the user that nothing can be done at this point. Any user actions will result in wasted effort on the user's part. In the busy-delayed-response, however, the behavioral role of the feedback is to inform the user that actions will be accepted but processed after a delay. In effect, it is as if the system was telling the user "I am listening, but will respond after a short delay... go ahead keep working." The analogue in human conversations is when a participant gives the other participant consent to continue talking even though no feedback will be provided. This might occur, for example, when their visual communication is interrupted and thus feedback cannot be provided using the usual mechanisms. In this case, feedback and responses might be provided only after a delay.

Unfortunately, in today's interfaces these two states normally are not signalled differently to the user. As a result, taking advantage of the "type-ahead" features is left as a "goodie" that only advanced users learn. For example, printing a document on the Macintosh using the default choices of the Print dialogue box can be done with just two keystrokes: command-P followed by the return key. Most of the time, the dialogue box will not be displayed in its entirety. This is a nice shortcut that uses a busy-delayed-response state, but only advanced users know about it.

Failure to represent adequately the busy-delayed-response state can cause undesirable side effects. Consider the case when the user interface indicates that it is in the ready state, but it is really in the busy-delayed-response state. If the user initiates the next action, the action is stored and processed after the delay. But if the context of the interface changes during the delay, the user's action might have unintended side effects. There are many examples of this in today's desktop applications. The Sun (OpenWindows) File Manager allows the user to delete files by dragging their iconic representation to a trash can. After a file has been deleted, there is a small delay in user event processing without an indication. The user sees the interface in a ready state and initiates the next action. During the short delay, all icons in the window are redrawn to fill the gap where the deleted file was located. This change in location of icons combined with the delay in processing causes the icon under the cursor when the event occurred to be a different one when the event is processed, resulting in the user grabbing the wrong icon.

The problem in this example is that the system indicates that it is in a ready state, when it really is in a busy- delayed-response state and the context changes before user actions are processed. The solution to this problem is simple. If a context change will occur, then the system should signal the state as busy-no-response and flush the event queue when coming back into the ready state.

Repair Behavior and Effort

In general, when a simulated state of understanding is not communicated effectively to the user, the user will enter a repair dialogue to find evidence of the correct state. The performance meter example is one example of this repair behavior. The user is not receiving enough feedback of whether the system is ready or not and thus decides to rely on the performance meter as a source of feedback. Looking at the performance meter is a repair dialogue resulting from the need for feedback when the system provides the wrong evidence, or no evidence at all.

A repair dialogue may or may not be disruptive to the user. The level of disruption of the current user's goals depends on the task at hand. What is true always, is that the user will have to spend more overall effort to accomplish his or her goals when the system is not cooperating.

Another interesting behavioral aspect of this model is the separation of the busy-no-response state from the busy- delayed-response state. The current practice is to not signal the busy-delayed-response state.

When both busy states are signalled as a single state, the user will rarely take advantage of the busy-delayed- response state, since the information the system is providing indicates that the system is busy. Furthermore, at times the user will seek extra evidence to determine which state the system is in. That is, the user covers the collaborative gap left by the lack of appropriate feedback.

When busy-delayed-response has the same feedback as the ready state, the user will spend extra effort and possibly even obtain unwanted results because he or she will act at times when the system was not in the correct state.

If the system does provide accurate indication of the busy-delayed-response state, the user is more likely to take advantage of "type-ahead", or "drag-ahead" in the case of a graphical user interface. This will occur even if no feedback for the dragging action is provided during the delayed period. In this case, the user is attempting to reach the goal of the interaction without requiring the full feedback that could be provided by the system.

The busy states are a good example of where the user's repair behavior and extra effort will be spent overcoming "miscommunication" from the system. The user fills the conversational gap left by the system with extra effort exhibited in the form of repair behavior. The experiment described below studies the repair behavior and the extra effort caused by this behavior when users participate in a dialogue without good communication of the SSOU.

Related Work in the Context of Our Model

Our model incorporates many findings reported in the literature dealing with feedback and response times. The internal loop in the model includes the three states emphasized by proponents of direct manipulation interfaces [13, 14, 23]. User actions in the ready state require short and quick responses from the system in the form of the reporting state. Because most actions are short there is little need for feedback in the processing state.

Most response time studies [10, 11, 16, 17] are also related to this internal loop. Depending on the complexity of the action requested by the user, the time requirements for the response vary. If the user moves the mouse, he or she would expect the cursor on the screen to be moved accordingly. Such short actions, sometimes called "reflex" actions, require a response of less than 0.1 seconds [18]. In these cases, this is the timing constraint for a complete pass through the internal loop of the model. For more complex actions, for example actions that take more than 10 seconds, a progress bar should be displayed. This progress bar indicates the amount of work being done inside of either the processing or reporting state, as in the Macintosh file copying example.

A closely related model, Brennan and Hulteen's speech feedback model [3] differs from our model's structure only in that their model does not account for our busy-delayed-response state, possibly because it is not a common state in a speech understanding system. The other states can be directly translated between the two models; some of their states are further decompositions of our states.

We designed an experiment to explore the effect different forms of feedback have on users behavior in a direct manipulation task. Based on our model, we tried to answer three main questions: Do users engage in a repair behavior when the state of the system is not communicated correctly? Does this repair behavior amount to a significant extra effort on the user's part? And, how do users take advantage of the busy-delayed-response state?

Thirty participants were recruited from NRL summer coop students, George Washington University computer science students and U.S. Naval Academy computer science students. All participants had comparable experience with graphical user interfaces (9 use MS-Windows, 19 use X-WIndows), and they all use the mouse on a daily basis (average 4.9 hours a day). All participants had high school degrees, 10 had completed undergraduate studies, 5 had graduate degrees.

The task to be performed by each participant involved the movement with the mouse of geometrical icons from their original location to a target location. The shapes used for the icons were circles, triangles, squares, and diamonds. The target locations were boxes with one of the shapes drawn on the front face of the box. In Figure 3, all four boxes are shown with two shapes, a circle and a triangle. Participants were asked to drag the shapes over to the corresponding box. When the object was dragged over a box, the box highlights, independent of whether it was a correct assignment or not. When the object was dropped on a box it disappeared, whether it was the right box or not.

The dependent measures used in the experiment were: number of interaction techniques (ITs), number of assignments, and number of incomplete actions. All of these measures were computed as totals and as measure per time unit (seconds). IT were all mouse-down actions, including clicks and drags. Assignments were subdivided into valid assignments and invalid assignments (assignment to the wrong box). Incomplete actions were ITs that did not result in an assignment. Time and distance moved were captured for possible future analysis but were not used in this experiment.

The experiment consisted of three sessions of about five minutes and forty seconds each. In each session, the participant was presented with objects that appeared at a random location on the display. New objects appeared approximately every 3 seconds. The participants were instructed to assign the objects to their corresponding boxes as quickly and accurately as possible, with both criteria having equal importance.

Based on pilot data, the event rate was kept slow so that all participants would be able to complete the task. It was our desire to study the repair behavior exhibited by participants without causing frustration by making the task too difficult to complete. We were not concerned with performance effects in this study.

Three of the SSOU were used in the experiment (Table 1). In the ready state, all user actions were accepted and processed as soon as they occurred. In the busy-no-response state all user actions were ignored. In the busy-delayed- response all user actions were recorded but processed after a delay of approximately 2 seconds. User actions were played back after the delay, exactly as they had been performed.

State	User Action	System Response
Ready	Accepted	Yes, without a delay
Busy-No-Response	Ignored	None
Busy-delayed-response	Accepted	Yes, after 2 second delay

Each session started in the ready state, then went to the busy-delayed-response state, back to ready, next busy-no- response, and then the whole cycle was repeated. Each phase was approximately 12 seconds long and participants were presented with approximately 4 objects per phase. There were a total of 29 phases, with the first one and the last one being ready phases, yielding a total of 7 busy-delayed-response, 7 busy-no-response, and 15 ready phases.

As an indication of feedback, we chose to use mouse cursor changes because this is a common use of mouse cursors in all the graphical user interfaces with which our subjects had familiarity. Treatments differed only in the type of feedback provided to indicate the state of processing. The user interface was always in one of the three states shown in Table 1. Indication of the state was given by changing the mouse cursor, as shown in Table 2. The watch in treatment B under the busy states was not reused in treatment C to avoid transfer effects in the recognition of the cursor.

Treatment	Ready	Busy-No-Response	Busy-Delayed-Response
A	Arrow	Arrow	Arrow
B	Arrow	Watch	Watch
C	Arrow	Stop Sign	Hour Glass

Each participant was assigned randomly to one of three groups. Each group was presented with the three different treatments, counter-balanced according to a latin square design [4]. All participants received written instructions explaining the task to be performed. A three minute practice session was given, during which participants saw only the ready state. All user and system actions were recorded for analysis. At the end of all three treatments, the participants were given a questionnaire to obtain demographic information (e.g. age, education, etc.).

To avoid biasing the participants towards a specific behavior, participants were not told of the busy states that were to occur in the three experimental conditions. They neither saw these states in the practice session nor did they know the different cursors associated with the different treatments.

RESULTS AND DISCUSSION

Performance in the task was not significantly different over any of the independent variables. This was expected, as explained earlier, because we maintained a relatively slow event rate. On the average, participants assigned 96.8% of the objects they were presented. The only significant performance-related result was the number of incorrectly assigned objects (objects assigned to the wrong box). The mean number of incorrect assignments for all participants was only 0.32 objects per session, yet there was an interaction effect between session and state of processing, F(2, 139)= 3.89, p=0.0227. In the ready states of the first session, participants incorrectly assigned more objects than in the ready states of the second session, F(1)=8.08, p =0.0051.

Repair Behavior

Our model suggests that if no information about state is provided to the user when the system is in busy-delayed- response or busy-no-response, then the user will perform extra actions to elicit the state of the system. These extra actions in our experiment are in the form of incomplete actions, clicks and drags that do not result in assignment of objects. Treatment A did not provide any feedback for the two busy states. As a result, treatment A should produce some form of repair behavior from the user.

Treatment (F(2,224)=31.89, p=0.0001) and state (F(2,224)= 78.03, p=0.0001) were significant for the number of incomplete actions per second (Figure 4), and the interaction effect between these two factors was also significant, F(4, 224)=12.31, p=0.0001. There were almost no incomplete actions in the ready phases for all three treatments (means were < 0.0001 for all three treatments). Treatment A produced more incomplete actions per second than treatment B and C in the busy-no-response state, F(1)=94.56, p=0.0001. Treatment A also had significantly more incomplete actions per second in the busy-delayed-response state, F(1)=15.64, p=0.0001.

A very similar effect is found in total number of interaction techniques (IT) per seconds (Figure 5). This total includes incomplete actions and all the assignments performed. Treatment (F(2,224)=47.30, p=0.0001) and state (F(2,224)= 69.74, p=0.0001) had significant effects on total IT, and there was an interaction effect between these two factors, F(4, 224)=14.67, p=0.0001. There was no significant difference among all treatments in the ready state. However, there was a significant difference in the busy-delayed-response state caused by treatment. Treatment A produced significantly more ITs than the other two treatments, F(1)=34.37, p=0.0001. Treatment B had fewer ITs than treatment C, but the difference was not significant.

These differences in incomplete actions and interaction techniques provide significant evidence that participants who are not given correct feedback of the state of the system engage in repair behavior. Treatment A produced the largest amount of repair behavior in the two busy states. Participants in this group received no indication from the system that the state had changed. They not only had to figure out what was happening once they clicked for the first time in that state, but also had to continue clicking to determine when either busy state was over. Anyone who has ever faced a situation in which the computer does not respond to their actions understands this basic behavior.

Behavior in busy-delayed-response

We found evidence that users can take advantage of the busy-delayed-response state, once they understood that the system is responding after a delay. The participants had no a priori knowledge of the busy states or the meaning of the cursors. So in treatment A they had to figure out what was happening in the busy states without help from the system. Treatment B provided the same feedback, the wrist-watch, for both busy states. Treatment C provided differing feedback, but its meaning was not specified ahead of time. So, subjects in Treatment C also had to do some experimentation to understand the states.

State of processing had a significant effect F(1,139)= 1256.22, p=0.0001 on the number of assignments per seconds done (Figure 6). There were significantly more assignments per second in the ready state than in the busy- delayed-response state. We also found an interaction effect between state and treatment F(2,139)=11.59, p=0.0001. Treatment A had more assignments per second in the busy-delayed-response state than C, F(1)=4.09, p=0.0450, and B, F(1)=22.63, p=0.0001. Treatment B had significantly fewer assignments per seconds than treatment C, F=7.475, p= 0.0071. There was no significant difference due to treatment in the ready state. Session had a significant effect, F=3.100, p=0.0482; there were more assignments in the first session than in the other two, F(1)=6.11, p=0.0146. This represents a slight learning effect, but there was no interaction with any of the other variables.

Participants in all treatments were successful assigning objects in all states. Performance in the busy-delayed- response state was slower than in the ready state. Nevertheless users were able to perform assignments.

Surprisingly, participants in treatment A assigned the highest number of objects in the busy-delayed-response state. Perhaps this is merely an artifact of the smaller number of assignments for treatment A in the ready state. It is also possible that subjects in Treatment A had more trouble identifying the beginning of each new ready state and therefore did not make full use of it, hence making fewer overall assignments in the ready state.

Although the difference was not statistically significant, participants in treatment B did assign fewer objects during the busy-delayed-response state than those in treatment C. A possible explanation is that participants in treatment B assumed a more passive role in their busy-delayed-response state because they received the same feedback for both busy states. They could be interpreting both states as busy-no-response. Further data would be needed to fully explain this result.

CONCLUSIONS

We have taken a model of human conversation and reformulated some of the principles to cover human-computer interaction feedback. This is similar to the approach described in [2, 9, 17, 19]. Here, a theory of collaborative conversations served as the basis for the definition of states of feedback, with each representing different communication expectations users have when interacting with a computer.

The experiment showed that failure to properly identify a SSOU produces repair behavior by users, and that this repair behavior can be a significant amount of their effort. The study also shows that a different behavior is associated with each of the two busy states. It provides evidence that if a delayed state is identified, users do indeed take advantage of this "type-ahead" state.

The model presented, coupled with the feedback timings reported in the literature, gives the user interface designer the feedback requirements for the design of new interaction techniques and dialogues. More importantly, our model does so in a style-independent manner, so it is not specific to one platform or one style of interaction. As is common in our field, adherence to the principles of this model does not guarantee a good interface. But based on the empirical evidence and on our observations of commercial applications, violation of the principles embodied in the model does produce disruption on the human-computer dialogue as evidenced by the repair behavior. In this light, the model provides feedback design guidelines for interaction techniques and human-computer dialogues.

The model prescribes the feedback states for each dialogue in a human-computer interface. But, each user interface can have several human-computer dialogues, with each dialogue possibly having multiple cues communicating state information. The user integrates several cues into a single communicative event to identify the SSOU of the system. The model presented here does not address how this combination is done. It seems that some hierarchical combination of cues is done, but more work is needed to extend the model in that direction.

Finally, the goal of our research is to identify principles from human conversations that are desirable in human- computer dialogues [21, 22]. The feedback model presented here provides a behavioral description of direct manipulation interactions and their feedback requirements. This model is the lower level of a human-computer dialogue framework, currently under development, that studies dialogue at the feedback level and at the turn-taking level.

ACKNOWLEDGMENTS

We would like to thank Jim Ballas, Astrid Schmidt-Nielsen, and Greg Trafton for their many helpful comments about the design of this experiment. We also would like to thank the Computer Science Department at the U. S. Naval Academy and Rudy Darken at NRL, for allowing us to run most of the study at their facilities. The GWUHCI research group provided many helpful ideas in the design and the analysis of the data. This research was funded in part by the Economic Development Administration of the Government of Puerto Rico, and by NRL.

2. Brennan, S.E., Conversation as Direct Manipulation: An iconoclastic view, in The Art of Human-Computer Interface Design , B. Laurel, Editor. 1990, Addison-Wesley Publishing Company, Inc.: Reading, Massachusetts.

3. Brennan, S.E. and Hulteen, E.A. Interaction and Feedback in a Spoken Language System, in AAAI-93 Fall Symposium on Human-Computer Collaboration: Reconciling Theory, Synthesizing Practice , (1993), AAAI Technical Report FS93-05, pp. 4. Bruning, J.L. and Kintz, B.L. Computational Handbook of Statistics . Scott, Foresman and Company, Glenview, Illinois, 1987.

5. Clark, H.H. and Brennan, S.E., Grounding in Communication, in Shared Cognition: Thinking as Social Practice , J. Levine, L.B. Resnick, and S.D. Behrend, Editor. 1991, APA Books: Washington, D. C.

6. Clark, H.H. and Schaefer, E.F. Collaborating on contributions to conversations. Language and Cognitive Processes, 2, 1 (1987), pp. 19-41.

7. Clark, H.H. and Wilkes-Gibbs, D. Referring as a collaborative process. Cognition, 11, (1986), pp. 1- 39.

8. Foley, J.D. and Dam, A.v. Fundamentals of Interactive Computer Graphics . Addison-Wesley System Programming Series, ed. I.E. Board. Addison-Wesley Publishing Company, Reading, Massachusetts, 1982.

9. Foley, J.D. and Wallace, V.L. The Art of Natural Graphic Man-Machine Conversation. Proceedings of the IEEE, 62, 4 (1974), pp. 462-471.

10. Gaines, B.R. The technology of interaction- dialogue programming rules. IJMMS, 14, (1981), pp. 133- 150.

11. Gallaway, G.R. Response Times To User Activities in Interactive Man/Machine Computer Systems, in Proceedings of the Human Factors Society-25th Annual Meeting , (1981), pp. 754-758.

12. Harel, D. On Visual Formalisms. CACM, 31, 5 (1988), pp. 514-530.

13. Hutchins, E.L., Hollan, J.D., and Norman, D.A., Direct Manipulation Interfaces, in User Centered System Design: New Perspectives on Human-Computer Interaction , D.A. Norman and S.W. Draper, Editor. 1986, Lawrence Erlbaum Associates: Hillsdale, NJ.

14. Jacob, R.J.K. Direct Manipulation, in Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics , (Atlanta, GA, 1986), IEEE, pp. 384-388.

15. Lee, K.-F. The Conversational Computer: An Apple Perspective, in Proceedings of Eurospeech , (1993), pp. 1377-1384.

16. Miller, R.B. Response time in man-computer conversational transactions, in Proceedings of Fall Joint Computer Conference , (1968), pp. 267-277.

17. Nickerson, R.S. On Conversational Interaction with Computers, in User Oriented Design of Interactive Graphics Systems: Proceedings of the ACM SIGGRAPH Workshop. , (1976), ACM Press, pp. 681-683.

18. Nielsen, J. Usability Engineering . AP Professional, Cambridge, MA, 1993.

19. Payne, S.J. Looking HCI in the I, in Human-Computer Interaction - INTERACT '90 , (1990), Elsevier Science Publishers B.V., pp. 185-191.

20. Payne, S.J. Display-based action at the user interface. IJMMS, 35, (1991), pp. 275-289.

21. Pérez, M.A. Conversational Dialogue in Graphical User Interfaces: Interaction Technique Feedback and Dialogue Structure, in Proceedings Companion of the ACM CHI'95 Conference on Human Factors in Computing Systems , (Denver, Colorado, 1995), Addison-Wesley, pp. 71-72.

22. Pérez, M.A. and Sibert, J.L. Focus on Graphical User Interfaces, in Proceedings of the International Workshop on Intelligent User Interfaces , (Orlando, Florida, 1993), ACM Press, pp. 255-257.

23. Shneiderman, B. The future of interactive systems and the emergence of direct manipulation. BIT, 1, (1982), pp. 237-256.

24. Shneiderman, B. Designing the User Interface: Strategies for Effective Human-Computer Interaction. . Addison-Wesley Publishing Co., Reading, Masachusetts, 1987.

25. Simes, D.K. and Sirsky, P.A., Human Factors: An Exploration of the Psychology of Human-Computer Dialogues, in Advances in Human-Computer Interaction , H.R. Hartson, Editor. 1988, Ablex Publishing Corporation: Norwood, New Jersey.

26. Teal, S.L. and Rudnicky, A.I. A Performance Model of System Delay and User Strategy Selection, in Proceedings of ACM CHI'92 Conference on Human Factors in Computing Systems , (Monterey, California, 1992), Addison-Wesley, pp. 295-305.

Login To RMS System
About JETIR URP
About All Approval and Licence
Conference/Special Issue Proposal
Book and Dissertation/Thesis Publication
How start New Journal & Software
Best Papers Award
Mission and Vision
Reviewer Board
Join JETIR URP
Call For Paper
Research Areas
Publication Guidelines
Sample Paper Format
Submit Paper Online
Processing Charges
Hard Copy and DOI Charges
Check Your Paper Status
Current Issue
Past Issues
Special Issues
Conference Proposal
Recent Conference
Published Thesis

Contact Us Click Here

Whatsapp contact click here, published in:.

Volume 6 Issue 1 January-2019 eISSN: 2349-5162

UGC and ISSN approved 7.95 impact factor UGC Approved Journal no 63975

Unique identifier.

Published Paper ID: JETIREQ06003

Registration ID: 308403

Page Number

Post-publication.

Downlaod eCertificate, Confirmation Letter
editor board member
JETIR front page
Journal Back Page
UGC Approval 14 June W.e.f of CARE List UGC Approved Journal no 63975

Share This Article

Important links:.

Call for Paper
Submit Manuscript online

Rajni Sharma

Cite This Article

2349-5162 | Impact Factor 7.95 Calculate by Google Scholar An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 7.95 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator

Publication Details

Download paper / preview article.

Download Paper

Preview this article, download pdf, print this page.

Impact Factor:

Impact factor calculation click here current call for paper, call for paper cilck here for more info important links:.

-->

Developed by JETIR

DOI: 10.23956/IJARCSSE.V8I4.630
Corpus ID: 67231480

A Review Paper on Human Computer Interaction

H. Bansal , Rizwan Khan
Published 30 April 2018
Computer Science

Figures from this paper

56 Citations

A review on human-computer interaction (hci), new narrative of role human-computer interaction in system development.

Highly Influenced

Human Computer Interaction Applications in Healthcare: An Integrative Review

An overview of chatbot structure and source algorithms, in smart classroom: investigating the relationship between human–computer interaction, cognitive load and academic emotion, testing driver attention in virtual environments through audio cues, an exploratory analysis of using chatbots in academia, chatbot development through the ages : a survey, implementation of the conversational hybrid design model to improve usability in the faq, a brief review on recent advances in haptic technology for human-computer interaction: force, tactile, and surface haptic feedback, 5 references, methods for human – computer interaction research with older people, end-user privacy in human–computer interaction, increasing participation in online communities: a framework for human-computer interaction, citation counting, citation ranking, and h-index of human-computer interaction researchers: a comparison between scopus and web of science, related papers.

Showing 1 through 3 of 0 Related Papers

An Exploration into Human–Computer Interaction: Hand Gesture Recognition Management in a Challenging Environment

Original Research
Open access
Published: 12 June 2023
Volume 4 , article number 441 , ( 2023 )

Cite this article

You have full access to this open access article

Victor Chang ORCID: orcid.org/0000-0002-8012-5852 1 ,
Rahman Olamide Eniola 2 ,
Lewis Golightly 2 &
Qianwen Ariel Xu 1

7588 Accesses

7 Citations

Explore all metrics

Scientists are developing hand gesture recognition systems to improve authentic, efficient, and effortless human–computer interactions without additional gadgets, particularly for the speech-impaired community, which relies on hand gestures as their only mode of communication. Unfortunately, the speech-impaired community has been underrepresented in the majority of human–computer interaction research, such as natural language processing and other automation fields, which makes it more difficult for them to interact with systems and people through these advanced systems. This system’s algorithm is in two phases. The first step is the Region of Interest Segmentation, based on the color space segmentation technique, with a pre-set color range that will remove pixels (hand) of the region of interest from the background (pixels not in the desired area of interest). The system’s second phase is inputting the segmented images into a Convolutional Neural Network (CNN) model for image categorization. For image training, we utilized the Python Keras package. The system proved the need for image segmentation in hand gesture recognition. The performance of the optimal model is 58 percent which is about 10 percent higher than the accuracy obtained without image segmentation.

Hand Gesture Recognition: A Review

Hand Gesture Recognition for Human Computer Interaction: A Comparative Study of Different Image Features

Integrated Solutions and Computerized Human Gesture Control

Explore related subjects.

Artificial Intelligence

Avoid common mistakes on your manuscript.

Introduction

British Sign Language recognition is a project based on the notion of image processing and machine learning classification. There has been much study in the last year on gesture recognition utilizing various machine learning and deep learning approaches without fully explaining the methods used to get the results. This study focuses on lowering the cost and increasing the resilience of the suggested system using a mobile phone camera while also detailing the steps used to conclude.

Management is a collection of operations (including planning and decision-making, organizing, directing, and supervising) aimed toward an organization’s resources (human, financial, physical, and informational) to attain organizational goals effectively and efficiently [ 10 ]. Unquestionably, good management is one that the business can depend on in the face of new and unexpected difficulties. Nevertheless, socioeconomic, political, and, most recently, health challenges have significantly impacted the efficacy and efficiency of management processes in modern organizations. Consequently, the internal and external elements affecting the organizational management process should be attentive to and evaluated.

Internal factors such as workplace culture, personnel, finances, and current technologies are under the influence of the company, while extrinsic variables such as politics, competitors, the economic system, clients, and the climate are beyond the management’s control but can have a significant influence on the productivity and accomplishment of the organization. Therefore, the management framework of a company must be critically assessed. As a firm with a rich history spanning more than a century (116 years), BMW was founded in 1916 in Munich, Germany. This establishment, which is a few years younger than Ford in 1903 and Rolls Royce in 1907, has developed one of the finest automobiles [ 45 ]. In this research, we critically reviewed BMW's management strategy throughout the 2008–2011 global economic crisis to determine why BMW effectively navigated the crisis while other companies flopped.

Research Questions

This study aims to clarify and explain the following five research questions (RQs).

RQ1: What are the image processing approaches for improving picture quality and generalization of the project?

RQ2: What image segmentation techniques for separating the foreground (hand motion) from the background?

RQ3: What machine learning and deep learning approaches are available for image classification and hand gesture recognition?

RQ4: What hardware and or software is required?

RQ5: What are the benefits of the proposed approaches over currently existing methods? The result of the comparison between our method and other approaches can be used to determine what aspects of our techniques need to be improved for future study.

RQ6: What ethical issues does the initiative raise?

Research Contributions

This study aims to make a contribution to the understanding of various approaches utilized to enhance picture quality during an imaging job. Specifically, we investigate image processing techniques such as erosion, resizing, and normalizing, as well as segmentation features like HSV color separation and thresholding. In addition, this study explores Machine Learning approaches used in picture classification projects, particularly for hand gesture recognition, and considers hitherto unutilized Machine Learning methods as potential alternatives. Then, this study evaluates the feasibility of the project based on the available materials and quantifies the model's performance in comparison to prior studies. Finally, we detect and address any ethical concerns that may arise in hand gesture recognition due to the potential impact of advanced algorithms on people. Overall, this study seeks to contribute to the field of hand gesture recognition and image processing, with the goal of improving human–computer interaction and addressing potential ethical issues.

Related Literature

Hand Gesture Recognitions (HGRs) is a complicated process that includes many components like image processing, segmentation, pattern matching, machine learning, and even deep learning. The approach for hand gesture recognition may be divided into many phases: data collection, image processing, hand segmentation, extraction of features, and gesture classification. Furthermore, while static hand motion recognition tasks use single frames of imagery as inputs, dynamic sign languages utilize video, which provides continuous frames of varying imagery [ 5 ]. The technique for data collection distinguishes computer vision-based approaches from sensors and wearable-based systems. This section discusses the methods and strategies used by static computer vision-based gesture recognition researchers.

Human–Computer Interaction and Hand Gesture Recognition

The recent technological breakthrough in computational capabilities has resulted in the development of powerful computing devices that affects people’s everyday lives. Humans can now engage with a wide range of apps and platforms created to solve most day-to-day challenges. With the advancement of information technology in our civilization, we may anticipate greater computer systems integrated into our society. These settings will enact new requirements for human–computer interaction, including easy and robust platforms. When these technologies are used naturally, interaction with them becomes easier (i.e., similar to how people communicate with one another through speech or gestures). Another change is the recent evolution of computer user interfaces that have influenced modern developments in devices and methodologies of human–computer interaction. The keyboard, the perfect option for text-based user interfaces, is one of the most frequent human–computer interaction devices (R. [ 15 ].

Human–Computer Interaction/Interfacing (HCI), also known as Man–Machine Interaction or Interfacing, has emerged gradually with the emergence of advanced computers [ 11 , 12 , 19 ]. HCI is a field of research involving the creation, analysis, and deployment of interacting computer systems for human use and the investigation of the phenomena associated with the subject [ 11 ]. Indeed, the logic is self-evident: even the most advanced systems are useless until they have been operated effectively by humans. This foundational argument summarizes the two crucial elements to consider when building HCI: functionality and usability [ 19 ]. A system’s functionality is described as the collection of activities or services it delivers to its clients. Nevertheless, the significance of functionality is apparent only if it becomes feasible for people to employ it effectively. On the other hand, the usability of a system with a feature refers to the extent and depth to which the system can be utilized effectively and adequately to achieve specific objectives for the user. The real value of a computer is attained when the system’s functionality and usability are adequately balanced [ 11 , 12 ].

Hand Gesture Recognition (HGR) is a critical component of Human–Computer Interaction (HCI), which studies computer technology designed to understand human commands. Interacting with these technologies is made simpler when they are conducted in a natural manner (i.e., just as humans interact with each other using voice or gestures). Nonetheless, owing to the influence of illumination and complicated. Backgrounds, most visual hand gesture detection systems mainly function in a limited setting. Hand gestures are a kind of body language communicated via the center of the palm, finger position, and hand shape. Hand gestures are divided into two types: dynamic and static, as shown in Fig. 1 below. The stationary gesture relates to the fixed form of the hand, while on the other hand, the dynamic hand gesture consists of a sequence of hand motions like waving. There exist different hand motions in a gesture. For instance, a handshake differs from one individual to another and depends entirely on time and location. The main distinction between posture and gesture is that the former focuses on the form of the hand, while the latter focuses on the hand motion.

Features of hand gesture recognition

The fundamental objective of gesture recognition research is to develop a technology capable of recognizing distinct human gestures and utilizing them to communicate information or control devices [ 28 ]. As a result, it incorporates monitoring hand movement and translation of such motion as crucial instruction. Furthermore, Hand Gesture Recognition methods for HCI systems can also be classified into two types: wearable-based and computer vision-based recognition [ 30 ]. The wearable-based recognition approach collects hand gesture data using several sensor types. These devices are mounted to the hand and record the position and movement of the hand. Afterwards, the data are analyzed for gesture recognition [ 30 , 38 ]. Wearable devices allow gesture recognition in different ways, including data gloves, EMG sensors, and Wii controllers. Wearable-based hand gesture identification systems have a variety of drawbacks and ethical challenges: covered later in this paper.

In contrast, computer vision-based solutions are a widespread, appropriate, and adaptable approach that employs a camera to capture imagery for hand gesture recognition and enable contactless communication between people and computers [ 30 , 38 ]. Moreover, the vision-based recognition technique uses different image processing techniques to obtain the hand position and movement data. This method detects gestures based on the shapes, positions, features, color, and hand movements (Fig. 2 ). However, vision-based recognition has certain limitations in that it is impacted by depending on the light and crowded surroundings [ 38 ].

Computer vision-based gesture recognition

Image Processing

The human eye can perceive and grasp the things in a photograph. Accurate algorithms, and considerable training, are necessary to make computers comprehend like people [ 13 , 14 , 16 ]. Image data account for about 75 percent of the information acquired by an individual. When we receive and use visual information, we refer to this as vision, cognizance, or recognition. However, when a computer collects and processes visual data, this is called image processing and recognition. The median and Gaussian filters are two prevalently used filtering techniques for minimizing distortion in collected images [ 5 ]. Zhang et al. [ 46 ] adopted the median filter approach to remove noise from the gesture image to generate a more suitable image for subsequent processing. Also, Piao et al. [ 33 ] also presented the Gaussian and bilateral filter strategies to de-noise the image and created a more enhanced image. On the other hand, Treece [ 41 ] proposed a unique filter that claimed to have better edge and details retaining capabilities than the median filter, noise-reducing performance comparable to the Gaussian filter, and is suitable for a wide range of signal and noise kinds. Scholars have also researched other filtering algorithms. For example, Khare and Nagwanshi [ 21 ] presented a review of nonlinear filter methods that may be utilized: for image enhancement. They conducted a thorough investigation and performance comparison of the Histogram Adaptive Fuzzy (HAF) filter and other filters based on PSNR (Peak Signal to Noise Ration).

Morphological transformation is another image processing procedure often used to eliminate undesirable content from an image. A notable example is H. Hassanpour et al. (2015), which use morphological transformations to improve the quality of different medical photographs. Moreover, Morphological transformation has been utilized in many studies to extract features and contour areas critical for recognition or classification tasks [ 6 , 39 , 43 ]. Lastly, Histogram Equalization is another image processing technique for image enhancement that has received considerable attention. Xie et al. [ 44 ] investigated the basic concept of histogram equalization for image enhancement and showed that histogram equalization could enhance image effect and contrast. Abdullah-Al-Wadud et al. [ 1 ] also addressed a dynamic histogram equalization (DHE) method that splits the histogram relying on local minima and allocates specific grey level ranges for each partition before equalizing them individually, taking control over the impact of classical HE so that it performs image enhancement without sacrificing detail. Abdullah-Al-Wadud et al. [ 1 ] asserted that the DHE technique outdoes other current methods by improving contrast without presenting adverse effects or unacceptable artifacts.

Image Segmentation

The segmentation stage entails splitting images into numerous separate regions to separate the Region of Interest (ROI) from the rest of the imagery. Scholars have discussed different methods for image segmentation discussed below. Skin color segmentation is typically done in different color spaces, depending on the image type and content. Muhammad and Abu-Bakar [ 27 ] suggested a color space blend of HSV and YCgCr for skin detection and segmentation that responds well to various skin color tones while being less sensitive to pixels in the background that look like the skin.

Shaik et al. [ 37 ] conducted a thorough literature review, displaying various color spaces used for skin color identification, and discovered that RGB color space is not favored for color-based identification and color assessment due to the blending of color (chrominance) with the level of intensity (luminance) data and its non-uniform features. Furthermore, Shaik et al. [ 37 ] argued that Luminance and Hue-based strategies actively discriminate color and intensity level even under bad lighting conditions, a claim backed by their experimental results, demonstrating the YCbCr color space performance in the segmentation and detection of skin color in color images.

Saini and Chand [ 35 ] addressed the application and retrieval of skin pixels in the RGB color model in their publication, and they demonstrated the need for changing color models by monitoring the impacts of variables such as noise and illumination conditions. Furthermore, they debated various color models that are: commonly utilized in research, such as the HIS, HSV, TSL, and YUV color spaces. Saini and Chand [ 35 ] also speculated that the presence of illumination, shadows, and interference could impact the appearance of skin color and make segmentation and detection difficult. As a result, an RGB-based skin segmentation method for retrieving skin pixels was introduced in their research study, along with a computerized method for automatically transitioning color models in different color spaces, such as RGB into HSV or vice versa, to obtain the best noticeable image pixels.

Other methods, aside from skin color-based segmentation, have been extensively researched in the literature. Phung et al. [ 32 ] examined the pixel-wise skin segmentation technique based on color pixel classification, revealing that the Bayesian classifier based on the histogram technique and the multi-layered perception performed better than other methods, including the piece-wise linear and the Gaussian classifiers. Additionally, they argued that the Bayesian classifier combined with the histogram method is practical for the skin color pixels classifying issue due to the low dimension of the feature space and the availability of an enormous training set. They note, nonetheless, that the Bayesian classifier consumes far more memory than the MLP or other algorithms. Concerning color representations, their investigation using a Bayesian classifier demonstrates that the selection of color model does not affect pixel-wise skin segmentation. They concluded, nevertheless, that using chrominance channels solely reduces segmentation results and that there are considerable efficiency differences across various chrominance options.

Gesture Recognition and Machine Learning Algorithms

Various machine learning and deep learning algorithms have recently been utilized for hand gesture recognition and classification of static and dynamic gestures. Different machine learning algorithms have been utilized for static hand gesture recognition [ 7 , 25 , 29 ]. Liu et al. [ 25 ] introduced Hu moments and support vector machine-based approaches (SVMs). Firstly, Hu invariant moments are retrieved: into a seven-dimensional vector. Secondly, an SVM classifier is utilized to determine a decision boundary between the integrating and flawed hands. On the other hand, Feng and Yuan [ 7 ] and Nagashree et al. [ 29 ] retrieved the histogram of gradients (HOG) for feature extraction and the Support Vector Machine (SVM) classifier, which is extensively utilized for classification to train these relevant attributes. At testing time, a decision is made using the earlier learned SVMs, and the same gesture recognition rate with a comparison in distinct illumination scenarios. The findings reveal that the HOG feature extraction and multivariate SVM classification approaches have a substantial recognition accuracy, and the system is more resistant to lighting.

An Artificial Neural Network (ANN) is a computer processing technology with functional properties like human neural systems. In ANN for Hand gesture recognition, we studied several works of literature [ 8 , 17 , 31 ].

Oyedotun and Khashman [ 31 ] suggested using a deep convolutional neural network to solve the challenge of hand gesture recognition for all 24 hand gestures from Thomas Moeslund’s gesture recognition repository. They demonstrated that more biologically oriented DNN, such as the convolutional neural network and the stacked de- noising autoencoder, can grasp the complicated hand gesture identification challenge with reduced misclassification. Islam et al. [ 17 ] reported a static hand gesture recognition approach based on CNN. Data augmentation techniques such as re-scaling, resizing, shearing, translation, width, and height altering were applied: to the pictures used to train the model. Flores et al. [ 8 ] suggested techniques for recognizing the static hand gesture alphabet of the Peruvian sign language (LSP). They used image processing methods to remove or minimize noise, boost contrasts under different lighting conditions, segment the hand from the image background and ultimately recognize and trim the area holding the hand gesture. They used convolutional neural networks (CNN) to categorize the 24 hand gestures, creating two CNN designs with varying numbers of layers and attributes for every layer. The testing revealed that the initial CNN before data augmentation has a lower accuracy than the one after.

The timing mismatch makes it impossible to match two different gestures in dynamic gesture recognition using Euclidean space. Nevertheless, scholars have developed sophisticated techniques and algorithms for detecting and identifying dynamic hand gestures in real-time [ 23 , 24 , 40 ]. [ 40 ] created a method that uses picture entropy and density clustering to exploit critical frames from hand gesture video for additional feature extraction, potentially improving identification efficiency. Furthermore, they presented a pattern fusion technique to increase feature representation and enhance the system performance.

Lai and Yanushkevich [ 24 ] suggested a hybrid of convolutional neural networks (CNN) and recurrent neural networks (RNN) for automatic hand gesture identification utilizing depth and skeletal data. In their study, Recurrent neural networks functioned effectively in recognizing sequences of motion for every skeleton joint provided the skeleton details, while CNN was used to retrieve spatial data from depth images. Köpüklü et al. [ 23 ] suggested a two-level hierarchical system of a sensor and a classifier that allows offline-working convolutional neural network (CNN) models to run online effectively using the matching template technique. They used several CNN designs and compared them in terms of offline classification accuracy, set of variables, and computing efficiency. They employed Levenshtein distance as an assessment tool to assess the identified gestures' single-time activations.

Applications of Hand Gesture Recognition

Hand gesture recognition has several applications in various industries, such as virtual worlds, robotics, intelligent surveillance, sign language translation, and healthcare systems. The section below delves more into a few of the application areas.

Applications in Healthcare

Over the years, interactive systems have expanded significantly as several research efforts have proved their value and influence in different sectors, including drug production, medicine, and healthcare, after effective experiments. Versions of interactive systems in medicine, there has been a desire for such systems to be used in the healthcare and medical fields. Therefore, countries have been interested in building many interactive systems, including medical robots, and clearly by providing financing. Moreover, scholarship opportunities to promote research and innovation in the Human–Computer Interaction sector [ 36 ].

The continual progress of this kind of innovation is now recognized and welcomed in practically all industries. The concept of human–computer interaction systems will benefit the healthcare and medical industries, extensively utilizing the novel concepts. Yearly, new variations, designs, aesthetics, and maneuverability are developed, particularly human-inspired or humanoid robotics that can think and behave like people and act like humans. The continued advancement of this technology is lauded and has significantly influenced the medical and healthcare fields [ 36 ].

Undoubtfully, the prospect of computerized assistance will result in a significant boost in the quality of care. However, the performance and viability of such systems will be determined by how effective the interactive systems are in the medical and healthcare sector and how valuable they are to patients. Undoubtedly, the importance of doctors' confidence measures in understanding the effectiveness of the new technology systems to be deployed in the healthcare profession cannot be overstated [ 36 ].

It is essential to maintain the environment aseptic during medical surgery. However, during surgery, the physician must also see the patient’s clinical visual information via the computer, which must be sterile. Therefore, the existing method of the human–computer interface makes it difficult for workers to handle during operation. It raises the workload and the number of operational workers needed in the theatre. As a result, ensuring a speedy, accurate, and safe procedure becomes challenging.

The drawback mentioned above can be overcome using the hand gesture recognition approach. A notable example of a Hand Gesture Recognition application is the optimization of the surgical process utilizing the Kinect for Windows hardware and SDK created by a team at Microsoft Research Cambridge. The device allows clinicians to adjust, relocate, or enlarge Scans, Magnetic resonance imaging (MRI), and other medical data using simple hand motions (Douglas).

Wachs et al. [ 42 ] created a gesture-based system for sterilized viewing of radiological images, another notable example of Hand Gesture Recognition (HGR) research and application in a surgery theatre. The sterilized human–machine interaction is critical since it is how the physician handles clinical data while preventing it.

Contamination of the patient, the surgical theatre, and the accompanying doctors. The gesture-based technology might substitute touchscreen displays already used in many medical operating rooms, which must be enclosed to prevent contamination from accumulating or spreading and need flat surfaces which must be thoroughly cleaned after each treatment—but sometimes are not. With healthcare infection rates currently at alarmingly high levels, hand gesture recognition technology will be a viable option [ 9 ].

Another noteworthy application of the HGR technology is Sathiyanarayanan and Rajan’s [ 36 ] MYO diagnostics systems, which is applicable in interpreting Electromyography (EMG) patterns (graphs), bytes of vector data, and electrical data of our complex anatomy within our hand. To identify hand movement, the system employs complex algorithms that are interpreted: as instructions. The system will allow for collecting massive amounts of data and investigating a series of EMG lines to identify medical issues and hand motions.

Applications in Robotics

Van den Bergh et al. [ 3 ] created an automated hand gesture detection system using the available Kinect sensor. Using a sensor enables complicated three-dimensional motions while remaining resistant to disrupting objects or humans in the environment. The technology is embedded into an interactive robot (based on ROS), enabling automated interaction with the robot through hand gestures. The robot’s direction is determined by the translation of directing motions into objectives.

Robot technology has advanced considerably in recent years. However, there is a hurdle in developing the robot’s capacity to comprehend its surroundings, for which the sensor is crucial [ 4 ]. Using hand gesture recognition algorithms is advisable to manage the robot’s behavior efficiently and effectively, a new research hotspot in robot vision and control. A notable example is the Smartpal service robot, which utilizes Kinect technologies and allows users to operate the robot with their gestures, simulating the users’ actions [ 4 ]. We feel that as time progresses, using hand gestures to instruct the robot to do different tasks is no longer unrealistic for us.

Applications in Gaming and Virtual Realities (VE)

There has been a trend in the gaming industry toward hand gesture recognition systems, where gestures are used as instructions for video games rather than the traditional approach of touching keys on a keypad or using a controller. It is essential for these modern interfaces to recognize accidental movements and consistent gestures so that the user can have a more natural experience. Kang et al. [ 18 ] provided a unique approach to gesture identification that blends gesture detection and classification. The system distinguishes between purposeful and unintended motions within a particular visual sequence Kang et al. [ 18 ].

Methodology

In this study, we review previous literature in this area that has a focus on human–computer interaction and hand gesture recognition. We then select a dataset of images for analysis. In the step of image enhancement and segmentation, the quality of raw images is improved by reducing background noise, applying color space conversions, as well as isolating the main image from its background. After that, machine learning algorithms such as CNN are employed to learn the attributes and conduct hand gesture recognition. We then observe the results of the algorithms and analyze them in two parts focusing on image enhancement and hand gesture recognition. Finally, we provide a discussion of the results, including challenges and limitations, ethical considerations and opportunities for future work in this area (Fig. 3 ).

Research framework

British Sign Language

British Sign Language (BSL) is a visioned language used by persons with hearing or speech disabilities to convey meaning via word-level gestures, nonmanual characteristics such as facial expression and body posture, and fingerspelling (spelling words using hand movements) [ 26 ]. Figures 4 and 5 below depict the BSL alphabet with two hands and one hand, respectively. We utilized the one-hand alphabet in this project due to many features of two-hand BSL alphabets that make identification difficult.

Two-hand British sign language

One-Hand British sign language

In this paper, we worked on 5 one hand BSL alphabets: A, I, L, R, and V. For each of the five BSL alphabets utilized in this study, we developed a data collection of 90 distinct signs. A single signer performs each hand sign five times in varied lighting and timing circumstances. Moreover, to improve real-world performance and prevent over-fitting, we employed a validation dataset obtained under different conditions from the training dataset.

The images used in this project were captured with an iPhone 11 Pro Max camera, which has triple 12MP Ultra-Wide, Wide, and Telephoto lenses and an Ultra-Wide: 2.4 aperture and a 120° field of view. We collected two photos for the hand segmentation (color separation) task, one with and the other without a background. The images were separated and stored in separate folders. Each picture was processed and segmented before being augmented to expand the dataset for each sign. As part of the augmentations, the photos were randomly resized and rotated. The dataset was then subdivided into folders for each alphabet.

The first step in every image processing system is processing raw pictures. Image processing is essential to keep all images uniform and consistent, which increases the accuracy and efficacy of the subsequent segmentation and feature extraction methods. Consequently, background noise should be decreased to enhance images and color space conversions, emphasizing the image’s region of interest better. All the images should be converted: to different color spaces and determine the optimal color space for color separation to enable image segmentation.

Recently, many efforts have focused on the Image segmentation process, a critical stage in image processing that analyzes the digital image by segmenting it into several sections and is used to differentiate between distinct elements in an image into foreground and background based on various criteria like grey level values or to the texture [ 20 , 34 ]. Image segmentation is acknowledged as the initial and most fundamental procedure in numerous computer vision tasks, including hand gesture recognition, medical imaging, robotic vision, and geographical imaging. Scholars have previously examined several segmentation approaches or algorithms in the literature. These solutions overcome several limitations of traditional hand gesture recognition systems. However, no one method can be deemed a superior approach for all types of images—such strategies are only appropriate for the particular image and purpose.

Thresholding, region growing and region merging and splitting, clustering, edge detection, and model-based approaches are the six types of image segmentation methods. All splitting approaches are based on two fundamental concepts: intensity values and discontinuity and similarity. The discontinuity progresses to image segmentation based on a sudden shift in intensity levels in the picture. In this method, we are primarily interested in recognizing distinct spots. The other strategy is centered on pixels, which are comparable in some regions according to the predefined parameters used to split images, and it comprises procedures such as Thresholding, Region expansion, and region splitting and merging.

Color Model

A color model is a mathematical abstraction representing colors as tuples of integers with three or four values or color components. The collection of generated colors is referred to as “color space” when the color model is connected with an appropriate depiction of how the elements are to be inferred and the circumstances are observed. Color space may also be used to explain how human color vision might be simulated, which is used: in a range of applications, including computer vision, image analysis, and graphic design. Moreover, color space and color models are substantially equivalent in some cases. There are many color space bases, including the Luminance color model (YUV, YCbCr, and YIQ), Hue color model (HSI, HSV, and HSL), and the RGB color model (RGB, normalized RGB) [ 2 , 22 ]. By default, the Python OpenCV library transforms pictures into BGR format. However, the BGR picture is converted into any other color space using transformative functions. RGB Color space is the most fundamental kind of picture representation, yet some applications, such as OpenCV, feel that using alternative color spaces is more convenient.

Kolkur et al. [ 22 ] stated that color space preference is the first step in skin color segmentation. Furthermore, they acknowledged that the suitable specified threshold for recognizing skin pixels in a given image might be provided by a combination of one or more color spaces. Typically, the appropriate color space is usually determined: by the skin recognition application.

RGB Color Model

A natural image’s default color model is RGB (Red, Green, and Blue). An image is represented: by m x n × 3 arrays of color pixels in this form, where each pixel is a triplet of three colors, red, green, and blue, at a spatial position (m, n) Hema and Kannan [ 16 ] (Appendix 1). The three-color elements can be thought of as a stacking of three separate layers. Furthermore, pixels in an image have a red layer, a blue layer, and a green layer, resulting in an RGB image [ 16 ]. All these color components can be seen as a three-dimensional model. When all three-color channels have a value of 0 in additive color mixing, it signifies that no illumination is emitted, and the final color is black. The resultant color is white when all the three-color channels are set to their peak value, i.e., 255. Tv screens are excellent illustrations of how RGB color mixing is used. This color space is more susceptible to noise than others since it combines illumination and chromatic elements [ 2 ].

HSV Color Space

HSV (Hue, Saturation, Value) color space is much closer to RGB color space, which people use to describe and interpret colors. Humans see hue as the dominating color. The quantity of white light that varies with color is referred to as Saturation. The value represents the brightness/intensity. Hue is an abbreviation for tint, Saturation is an abbreviation for shade, and Value is an abbreviation for tone. A HSV color space may be seen as a geometric cylinder, with the angular dimension representing Hue(H), beginning with primary red at 0°, progressing to primary green at 120°, primary blue at 240°, and eventually curving back to red at 360°. Saturation refers to the distance from the center axis of the HSV cylinder (S). A saturation value heading towards the outside border indicates that the colorfulness value for the color described by the hue is reaching its peak. The Value (V) is the center vertical axis of HSV color space, extending from black at the bottom with brightness or value 0 to white at the top with lightness or value 1 (Appendix 1). Figure 9 below shows that the color model in this space can be easily separated, unlike in the RGB and HSV color space discussed above. Therefore, we selected the HSV color space as the optimal color model for the segmentation based on color space.

Color Image Segmentation Using HSV Space

The detection of the color of the skin surface is an excellent example of the application of color-based image segmentation for recognizing a particular item based on its color space. A popular image segmentation phase is determining skin color via distinct color spaces. Segmenting the image’s foreground object to identify and recognize the hand area is the first step in this project's three-phase technique, shown in the flowchart in Fig. 6 below. The first stage is to choose a Region of Interest (ROI) from the provided picture; the second step is to alter the HSV values inside the ROI to extract a mask; the last step is to select the ROI using the image mask. As shown in Fig. 6 , segmenting the picture in either the BGR or RGB color space is highly challenging and will not result in the optimal result, so we switched the image to HSV and investigated the segmentation potential.

HSV-based segmentation flowchart

How the Flowchart works is as follows. First, an RGB image is transformed into an HSV image using the HSV color space conversion algorithm. Next, the resultant HSV model components (Hue, Saturation, and Value) are separated into constituent values, as shown in Fig. 6 , which are then represented in a range using the Python library. Finally, the excellent HSV value for a particular image in a specific hand gesture is determined by interactively changing the values of each Hue, Saturation, and Value component.

Hand Gesture Recognition Algorithm

Deep Learning is quickly emerging as a prominent sub-field of machine learning owing to its exceptional performance over many data sources. Convolutional neural networks are an excellent approach to using deep learning algorithms to categorize images. The Keras Python package makes creating a CNN straightforward. CNN utilizes a multi-layer architecture that includes an input layer, an output layer, and a hidden layer composed of numerous convolutional layers, pooling layers, and fully linked layers.

Convolution Layer

Convolution is a mathematical operation on two functions that yields a third function that explains how the form of their form is dependent and one affects the other. Convolutional neural networks comprise several layers of artificial neurons. Artificial neurons are mathematical functions that compute the weighted sum of many inputs and output an activation value, an approximate replica of their biological counterparts. When feeding an image into a ConvNet, each layer creates many activation functions, which are passed on to the next layer. Typically, the first layer removes crucial information, such as horizontal or vertical edges. This output is sent to the subsequent layer, which recognizes more complicated characteristics like corners or combinational edges. As we get further into the system, it recognizes more sophisticated features, such as objects and characters.

Pooling Layer

As with the Convolutional Layer, the Pooling Layer is responsible for lowering the spatial dimension of the Convolved Feature. This reduces the CPU power necessary to analyze the data by decreasing the dimensions. Pooling is classified into two types: average Pooling and maximum Pooling. Max Pooling is a technique for determining a pixel’s highest value inside a kernel-covered picture region. Additionally, Max Pooling acts as a Noise Suppressant. It eliminates all noisy activations and conducts de-noising and dimension compression. Average Pooling produces the mean of all values in the Kernel’s section of the picture. Average Pooling is just a distortion suppression strategy that reduces dimension. As a result, we may conclude that Max Pooling outperforms Average Pooling.

The hand gesture recognition system is separated: into two tasks, which are explained lengthily below.

Image Enhancement and Segmentation Phase

The first task is image segmentation, which includes removing the image's background contents and providing an image; without the background. Before beginning image segmentation, we must first do image processing, as described in our technique above.

Image Enhancement

To start, we examine a sample of our picture for analysis using the image histogram, as shown in Fig. 7 below. The hand gesture image was loaded using the OpenCV library, which automatically converts the image into a BGR space. The difference between the RGB and BGR color spaces can be seen in Fig. 7 .

Histogram of a sample image

Figure 8 depicts the intensity value and count of the RGB color model from one of our dataset sample images. In addition, we produced a clearer histogram based on the RGB and HSV color spaces for better comprehension. We attempted to improve the image before moving on to the segmentation, and the task was to evaluate whether the image enhancement outcome was favorable. First, we employed several denoising algorithms to filter the picture and decrease noise. Then, we used the median and mean filter approaches on the example picture and found no difference between the original and filtered images, illustrated in Fig. 8 below.

RGB and HSV histogram of a sample hand gesture

Finally, we employed the Histogram of Equalization approach to improve the image quality and discovered that it did not provide the best results. Figure 9 compares the original and obtained images after Histogram Equalization. As seen in the above figure, equalization did not lead to an improved image. Therefore, we examined the histogram of the multi-channel image in the RGB and HSV color spaces to understand the impact of equalization better, as shown in Fig. 9 below.

RGB and HSV histogram after equalization of a sample hand gesture ( a left and b right)

When figures a and b are compared: to figures c and d, image improvement based on histogram equalization is counterproductive and thereby removed from the system. We moved on to the picture segmentation portion of the research after carefully analyzing the selected sample image from the hand motion collection. In the picture segmentation phase, we attempted two thresholding algorithms and a one-color space segmentation approach, which are explained further below. We attempted to segment a hand gesture image by binarizing it and repeatedly looking for the optimal threshold to segment the image and eliminate the background.

Figure 9 above shows the result obtainable from image segmentation using different thresholding functions. The yen and Otsu thresholds retain most of the information in the image. The section below tries both yen and Otsu thresholding methods for image segmentation. Figure 9 a shows the outcome of the Otsu threshold, whereas Fig. 9 b shows the yen threshold.

We tried different color segmentation approaches to remove the background from the foreground. The first approach is for image segmentation based on the upper and lower blue ranges in the HSV space, and the result is shown below. Our final approach in hand segmentation uses an iterative approach to select the best Hue, Saturation, and values in the sample image. This method proved the most effective and yielded the best result, as illustrated in the image below.

As we can see in the image above, the region of interest most needed for the hand gesture recognition task is highlighted above, and the background has been removed. We can now loop through our images and apply the image segmentation algorithm we created based on the HSV color space. In the next stage, we discussed how different Machine Learning results were obtained from the project.

Hand Gesture Recognition

The first step in the machine learning stage of the project is to load all image datasets into our system and carefully explore the dataset to understand the training and validation data distribution (Fig. 10 ).

Train and Validation Class

For the hand gesture prediction, we tried classifying the unsegmented image first and then compared the results of the two models. We observed that the performance of our model almost doubled because of image segmentation. We achieved an accuracy of 45 percent using the unsegmented images and an accuracy of 58 percent using the unsegmented images. However, the system did not achieve the optimal result due to the quality of sample images and the number of images used. The images and table below give a summary of the models used in this project. The author has also identified areas for improvement, highlighted in this project’s conclusion section.

The accuracy and loss for both training and validation images before and after image segmentation is given in Figs. 11 and 12 . Moreover, Table 1 illustrates the confusion matrix summary obtained by predicting the set of images under different lighting conditions.

Accuracy and Loss before Image Segmentation

Accuracy and Loss after Image Segmentation

This project extensively studied computer image processing and analyzed various literature and techniques. As a result, the author is familiar with the different approaches and algorithms required for image classification and hand gesture recognition. The summary of the discoveries by the author in this research project is illustrated below. While it is critical to analyze our data and pre-process photographs for better results, no approach exists that works on all images and image types. Therefore, the data scientist or image processor must be able to accurately analyze the picture to choose the optimum image enhancement for the image and the job. As we can see from the above result, the image-enhancing processes were unproductive and did not provide a satisfactory outcome.

Recommendations

Various factors influence image quality. The camera type we used to take the photographs, the direction and angle of capture, the lighting condition, the skin tone, and other characteristics are all likely to have an impact on the image processing and hand motion identification job. Unlike a straightforward Machine Learning classification or regression task, which makes it easier to fix the features for the prediction of the target class, deep learning classification has several hyper-parameters and criteria for the picture to be usable. For example, the image's form and size must be as specified by the algorithm of choice. Nonetheless, biased and racial results from hand gesture detection systems may have risky and uncontrolled consequences. Furthermore, we have to compare analytical, discriminating tendencies to various advantages to provide a fair model.

Ethical Considerations

If the true goal of a hand gesture recognition system is to deal with sign language efficiently, then all the multiple and diverse elements of sign language should be considered, which means that sign language must be researched in full and sign language recognition systems fully implemented. In this sense, scholars working on sign language must take an interdisciplinary approach with the assistance of the speech-impaired community and experts. Numerous studies gave scant attention to the effect of including or excluding specific words from the task of hand gesture recognition.

The implications of this omission may be detrimental to users who rely on such systems. Another oversight in hand gesture recognition systems is the omission or under-representation of some demographics or groups in the training dataset, which results in a biased system that overgeneralizes the training data. This could result in deploying machine learning systems that fail to perform efficiently, when used by an underrepresented user. Since the image's luminous level significantly affects the vision-based hand gesture recognition system, this can result in systems malfunctioning when employed in light conditions not included in the training data.

The project is a rudimentary static gesture recognition system that cannot do dynamic gesture recognition tasks. Since the dataset utilized is not diverse (the author’s hand gesture photographs), the system may fail when applied to a different dataset. The picture obtained was taken using a simple camera and is of poor quality. The technology is sensitive to lighting conditions and may not perform optimally on a picture with varying lighting conditions. Through the study, the research team uncovered other methodologies that are worth examining. First, several picture-enhancing algorithms must be researched and applied in future work. Second, feature extraction, such as the silhouette image-based technique, should be studied in future research. Third, optimizing, selecting, and weighting processing of extracted characteristics will be investigated to simplify computations. In addition, the algorithm design element will examine the recognition accuracy and robustness, ease of use, and operation efficiency. Finally, future studies should incorporate more modern technology and approaches, such as tracking to allow dynamic gesture detection and learning the newest technology in the sector to enhance the metric’ performance.

The hand gesture recognition project in this article was produced using skin color segmentation in the HSV color space. This algorithm uses robust skin color segmentation properties in the HSV space to counteract the impact of changes in illumination conditions on gesture detection. In addition, several image-enhancing procedures were performed: on the image prior to hand segmentation. The hand gesture orientation was generalized after the Region of Interest was segmented using the data generator function for batch gradient descent. These processes mitigate the effect of variations in gesture orientation on gesture recognition. The generalization ability of the algorithm is improved during the gesture recognition stage by integrating the embedded deep sparse auto-encoders in the classifier. The experimental findings reveal that, following segmentation, the suggested technique is robust and considerably preferable to the other method in classification performance and recognition consistency.

Data availability

The data used to support the findings of this study are available from the authors upon request. The data are not publicly available due to the presence of sensitive biological information that may compromise the privacy and confidentiality of research participants.

Abdullah-Al-Wadud M, Kabir MH, Dewan MAA, Chae O. A dynamic histogram equalization for image contrast enhancement. IEEE Trans Consum Electron. 2007;53(2):593–600.

Article Google Scholar

Al-Tairi ZH, Rahmat RW, Saripan MI, Sulaiman PS. Skin segmentation using yuv and rgb color spaces. J Inform Proc Syst. 2014;10(2):283–99.

Van den Bergh M, Carton D, De Nijs R, Mitsou N, Landsiedel C, Kuehnlenz K, Wollherr D, Van Gool L, Buss M. Real-time 3d hand gesture interaction with a robot for understanding directions from humans. Ro- Man. 2011;2011:357–62.

Google Scholar

Chen L, Wang F, Deng H, Ji K. A survey on hand gesture recognition. Intern Conf Comput Sci Appl. 2013;2013:313–6.

Cheok MJ, Omar Z, Jaward MH. A review of hand gesture and sign language recognition techniques. Int J Mach Learn Cybern. 2019;10(1):131–53.

Elyasi E. Hand gesture recognition using morphological processing (Doctoral dissertation). Northridge: California State University; 2018.

Feng KP, Yuan F. Static hand gesture recognition based on hog characters and support vector machines. In: 2013 2nd international symposium on instrumentation and measurement, sensor network and automation (IMSNA) 2013 pp 936–938.

Flores CJL, Cutipa AG, Enciso RL. Application of convolutional neural networks for static hand gestures recognition under different invariant features. In: 2017 IEEE XXIV International Conference on Electronics, Electrical Engineering and Computing (INTERCON) 2017 1–4.

Garg P, Aggarwal N, Sofat S. Vision based hand gesture recognition. World Acad Sci Eng Technol. 2009;49(1):972–7.

Griffin, R. W. (2021). Management. Cengage Learning . Hampshire, UK.

Gupta S, Bagga S, Sharma DK. Hand gesture recognition for human computer interaction and its applications in virtual reality. Cham: Advanced computational intelligence techniques for virtual reality in healthcare. Springer; 2020. p. 85–105.

Harish R, Khan SA, Ali S, Jain V, et al. Human computer interaction-a brief study. Int J Manage IT Eng. 2013;3(7):390.

Hassaballah, M., & Awad, A. I. (2020). Deep learning in computer vision: Principles and applications. CRC Press.

Hassanpour, H., Samadiani, N., & Salehi, S. M. (2015). Using morphological trans- forms to enhance the contrast of medical images. The Egyptian Journal of Radiology and Nuclear Medicine. 46(2):481–489

Hassanpour, R., Wong, S., & Shahbahrami, A. (2008). Visionbased hand gesture recog nition for human computer interaction: A review. IADIS international conference interfaces and human computer interaction, 125.

Hema, D., & Kannan, D. S. (2020). Interactive color image segmentation using hsv color space. Science and Technology Journal.

Islam, M. Z., Hossain, M. S., ul Islam, R., & Andersson, K. (2019). Static hand gesture recognition using convolutional neural network with data augmentation. 2019 Joint 8th International Conference on Informatics, Electronics & Vision (ICIEV) and 2019 3rd International Conference on Imaging, Vision & Pattern Recognition (icIVPR), 324–329.

Kang H, Lee CW, Jung K. Recognition-based gesture spotting in video games. Pattern Recogn Lett. 2004;25(15):1701–14.

Karray, F., Alemzadeh, M., Abou Saleh, J., & Arab, M. N. (2017). Human-computer interaction: Overview on state of the art. International journal on smart sensing and intelligent systems, 1(1).

Khan MW. A survey: Image segmentation techniques. International Journal of Future Computer and Communication. 2014;3(2):89.

Khare C, Nagwanshi KK. Image restoration technique with nonlinear filter. International Journal of Advanced Science and Technology. 2012;39:67–74.

Kolkur, S., Kalbande, D., Shimpi, P., Bapat, C., & Jatakia, J. (2017). Human skin detection using rgb, hsv and ycbcr color models. arXiv preprint arXiv:1708.02694 .

Köpüklü O, Gunduz A, Kose N, Rigoll G. Online dynamic hand gesture recognition including efficiency analysis. IEEE Transactions on Biometrics, Behavior, and Identity Science. 2020;2(2):85–97.

Lai, K., & Yanushkevich, S. N. (2018). Cnn+ rnn depth and skeleton based dynamic hand gesture recognition. 2018 24th international conference on pattern recognition (ICPR), 3451–3456.

Liu, Y., Gan, Z., & Sun, Y. (2008). Static hand gesture recognition and its application based on support vector machines. 2008 Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Dis- tributed Computing, 517–521.

Liwicki S, Everingham M. Automatic recognition of fingerspelled words in British sign language. IEEE computer society conference on computer vision and pattern recognition workshops. 2009;2009:50–7.

Muhammad B, Abu-Bakar SAR. A hybrid skin color detection using hsv and ycgcr color space for face detection. IEEE International Conference on Signal and Image Processing Applications (ICSIPA). 2015;2015:95–8.

Murthy G, Jadon R. A review of vision based hand gestures recognition. International Journal of Information Technology and Knowledge Management. 2009;2(2):405–10.

Nagashree R, Michahial S, Aishwarya G, Azeez BH, Jayalakshmi M, Rani RK. Hand gesture recognition using support vector machine. International Journal of Engineering and Science. 2005;4(6):42–6.

Oudah, M., Al-Naji, A., & Chahl, J. (2020). Hand gesture recognition based on computer vision: A review of techniques. journal of Imaging, 6(8), 73.

Oyedotun OK, Khashman A. Deep learning in vision-based static hand gesture recognition. Neural Comput Appl. 2017;28(12):3941–51.

Phung SL, Bouzerdoum A, Chai D. Skin segmentation using color pixel classification: Analysis and comparison. IEEE Trans Pattern Anal Mach Intell. 2005;27(1):148–54.

Piao W, Yuan Y, Lin H. A digital image denoising algorithm based on gaussian filtering and bilateral filtering. ITM Web of Conferences. 2018;17:01006.

Resma KB, Nair MS. Multilevel thresholding for image segmentation using krill herd optimization algorithm. Journal of King Saud University-Computer and Information Sciences. 2021;33(5):528–41.

Saini HK, Chand O. Skin segmentation using rgb color model and implementation of switching conditions. Skin. 2013;3(1):1781–7.

Sathiyanarayanan, M., & Rajan, S. (2016). Myo armband for physiotherapy healthcare: A case study using gesture recognition application. 2016 8th International Conference on Communication Systems and Networks (COMSNETS), 1–6.

Shaik KB, Ganesan P, Kalist V, Sathish B, Jenitha JMM. Compara- tive study of skin color detection and segmentation in hsv and ycbcr color space. Procedia Computer Science. 2015;57:41–8.

Singh S, Gupta AK, Singh T. Computer vision based hand gesture recog- nition: A survey. Snyder, H. (2019). Literature review as a research methodology: An overview and guidelines. J Bus Res. 2019;104:333–9.

Stergiopoulou E, Papamarkos N. A new technique for hand gesture recognition. International Conference on Image Processing. 2006;2006:2657–60.

Tang H, Liu H, Xiao W, Sebe N. Fast and robust dynamic hand gesture recognition via key frames extraction and feature fusion. Neurocomputing. 2019;331:424–43.

Treece G. The bitonic filter: Linear filtering in an edge-preserving morphological framework. IEEE Trans Image Process. 2016;25(11):5199–211.

Article MathSciNet MATH Google Scholar

Wachs JP, Stern HI, Edan Y, Gillam M, Handler J, Feied C, Smith M. A gesture-based tool for sterile browsing of radiology images. J Am Med Inform Assoc. 2008;15(3):321–3.

Article MATH Google Scholar

Wen, X., & Niu, Y. (2010). A method for hand gesture recognition based on morphology and fingertip angle. 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE), 1, 688–691.

Xie Y, Ning L, Wang M, Li C. Image enhancement based on histogram equalization. J Phys: Conf Ser. 2019;1314(1): 012161.

Xu, S., & Xiong, Y. (2021). Simulation of Chevy and BMW turning research. 2021 2nd International Conference on Intelligent Design (ICID) . IEEE..

Zhang, H., Wang, Y., & Deng, C. (2011). Application of gesture recognition based on simulated annealing bp neural network. Proceedings of 2011 International Conference on Electronic & Mechanical Engineering and Information Technology, 1,178–181.

Download references

Part of this work is supported by VC Research (VCR 0000198).

Author information

Authors and affiliations.

Aston University, Aston St, Birmingham, B4 7ET, UK

Victor Chang & Qianwen Ariel Xu

Teesside University, Campus Heart, Southfield Rd, Middlesbrough, TS1 3BX, UK

Rahman Olamide Eniola & Lewis Golightly

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Victor Chang .

Ethics declarations

Conflict of interest.

Authors confirm that there is no any conflict of interests with anyone involved.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Emerging Technologies and Services for post-COViD-19” guest edited by Victor Chang, Gary Wills, Flavia Delicato and Mitra Arami.

Appendix 1: Color Space

Figs. 13 , 14 , 15 and 16 .

RGB color space

RGB and BGR color space

HSV color space

HSV color space of a sample image

Appendix 2: Hand Representations

Figs. 17 , 18 and 19 .

Wearable data glove

Median filter of a sample hand gesture

Effect of histogram equalization on a sample hand gesture

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Chang, V., Eniola, R.O., Golightly, L. et al. An Exploration into Human–Computer Interaction: Hand Gesture Recognition Management in a Challenging Environment. SN COMPUT. SCI. 4 , 441 (2023). https://doi.org/10.1007/s42979-023-01751-y

Download citation

Received : 23 December 2022

Accepted : 21 February 2023

Published : 12 June 2023

DOI : https://doi.org/10.1007/s42979-023-01751-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Hand recognition
Human–computer interaction
Machine learning
Convolutional neural network (CNN)
Find a journal
Publish with us
Track your research

IEEE Account

Change Username/Password
Update Address

Purchase Details

Payment Options
Order History
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical Interests
US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support
About IEEE Xplore
Accessibility
Terms of Use
Nondiscrimination Policy
Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

To read this content please select one of the options below:

Please note you do not have access to teaching notes, personalized human-computer interaction as an information source for ride-hailing platforms: behavior intention perspective.

Asia Pacific Journal of Marketing and Logistics

ISSN : 1355-5855

Article publication date: 13 August 2024

This paper adds risk perception and personalized human-computer interaction to the technology acceptance model, and further analyzes the impact of personalized unmanned ride hailing on users' behavior intention.

Design/methodology/approach

This study model was tested using a sample of 299 social media users from China and we apply structural equation modeling (SEM) to build the theoretical framework.

Our results show that perceived ease of use has a greater positive impact on behavior intention compared to perceived usefulness. In addition, we find that the impact of risk perception on behavior intention is manifested in a number of ways, including people’s risk perception of the new technology, people’s risk perception of data leakage, and so on. Finally, we find that users’ personalized human-computer interaction has a positive effect on their perceived ease of use, perceived usefulness, and behavior intention.

Originality/value

Our study contributes to illuminate the pivotal role of tailoring the human-computer interface to individual preferences and needs for ride-hailing platforms from the perspective of behavior intention.

Ride-hailing platforms
Personalized human-computer interaction
Risk perception
Behavior intention
Structural equation modeling
Perceived usefulness

Acknowledgements

This research is supported in part by the National Natural Science Foundation of China (No. 72302115); National Natural Science Foundation of China (No. 72071040, 71931006); the Natural Science Foundation of Jiangsu Province (No. BK20230901); the China Postdoctoral Science Foundation (No. 2023M731682); Fundamental Research Funds for the Central Universities (No. 30922011203).

Li, J. , Ling, R. , Sun, F. , Zhou, J. and Cai, H. (2024), "Personalized human-computer interaction as an information source for ride-hailing platforms: behavior intention perspective", Asia Pacific Journal of Marketing and Logistics , Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/APJML-04-2024-0460

Emerald Publishing Limited

All feedback is valuable.

Please share your general feedback

Report an issue or find answers to frequently asked questions

Contact Customer Support

Grab your spot at the free arXiv Accessibility Forum

Help | Advanced Search

Computer Science > Human-Computer Interaction

Title: cps-taskforge: generating collaborative problem solving environments for diverse communication tasks.

Abstract: Teams can outperform individuals; could adding AI teammates further bolster performance of teams solving problems collaboratively? Collaborative problem solving (CPS) research commonly studies teams with two agents (human-human or human-AI), but team research literature finds that, for complex tasks, larger teams are more effective. Progress in studying collaboration with more than two agents, through textual records of team interactions, is hindered by a major data challenge: available CPS corpora are predominantly dyadic, and adapting pre-existing CPS tasks to more agents is non-trivial. We address this data challenge by developing a CPS task generator, CPS-TaskForge, that can produce environments for studying CPS under a wide array of conditions, and releasing a CPS task design checklist grounded in the theoretical PISA 2015 CPS framework to help facilitate the development of CPS corpora with more agents. CPS-TaskForge takes the form of a resource management (tower defense) game, and different CPS tasks can be studied by manipulating game design parameters. We conduct a case study with groups of 3-4 humans to validate production of diverse natural language CPS communication in a game instance produced by CPS-TaskForge. We discuss opportunities for advancing research in CPS (both with human-only and human-AI teams) using different task configurations. We will release data and code.

Subjects:	Human-Computer Interaction (cs.HC)
Cite as:	[cs.HC]
	(or [cs.HC] for this version)
	Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

HTML (experimental)
Other Formats

References & Citations

Google Scholar
Semantic Scholar

BibTeX formatted citation

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

We're Hiring!
Help Center

STRATEGIC ROLE OF ARTIFICIAL INTELLIGENCE (AI) ON HUMAN RESOURCE MANAGEMENT (HR) EMPLOYEE PERFORMANCE EVALUATION FUNCTION

2024, International Journal of Entrepreneurship and Business Innovation

ABSTRACT: Purpose – This research paper aims to create a realistic understanding of the favorable and unfavorable experiences that employees have as a result of adopting artificial intelligence (AI) or resorting to the old manual HR methods. It explains the difficulties and the benefits associated with developing human resources in light of the use of artificial intelligence or the old manual HR methods. Design/Methodology/Approach – For this study, the researcher employed a qualitative and exploratory research methodology. The primary element of the qualitative method which is adjusted to comprehend the literature, theories, motivations, viewpoints, and views in order to answer the study issue is exploratory research. This research used data from secondary sources. Findings – The study found that some firms spend over two million hours annually conducting manual HR performance reviews and evaluations. This is a significant amount of time spent on a process that is unreliable because it relies on people's opinions and prior performance. Real-time Artificial intelligence AI-driven assessments not only enable incentives and praise for good performances immediately, but they also ensure accuracy throughout the entire process and sound an alarm if targets are not met on time or performance standards are declining. From the extensive review of literature, it was found that Artificial Intelligence has a positive and significant influence on HR function of employee performance evaluation. Practical Implications – The study recommends a more robust top-level AI design and implementation within the entrepreneurial ecosystem and a robust application of Artificial Intelligence on HR function of employee performance evaluation. Originality/value – This research makes the unique contribution of establishing a qualitative finding that will revolutionize the entrepreneurial ecosystem for more employee productivity and satisfaction.

Related Papers

INTERNATIONAL JOURNAL OF BUSINESS AND ENTREPRENEURSHIP RESEARCH

Dr. Ernest J E B O L I S E CHUKWUKA

One of the rapidly expanding disciplines that are gaining greater attention in the corporate sector is artificial intelligence (AI). Artificial intelligence is already being used in a wide range of industries, including everyday life and commerce or entrepreneurship. This study examines the strategic impact of AI on Entrepreneurial Creativity and management. The study adopted qualitative and expository analysis through the extensive review of literature as a methodology. The corporate sector may become dependent on quicker, less expensive, and more accurate marketing strategies as a result of the use of AI. By utilizing AI in marketing strategies, an entrepreneur may increase audience reaction and establish a strong competitive advantage over other online firms. In addition to marketing, it may revitalize businesses through creative ideas. Additionally, it provides solutions for challenging jobs, which aids in the rapid expansion of businesses. As a result, we will talk about the development of the business sector, how entrepreneurs use AI topology, and how it plays a variety of roles in the business. The study concludes that artificial intelligence (AI) significantly and positively affects entrepreneurial creativity and management as well as also having a significant and negative effect on human creativity. This conclusion explains the finding that as AI is taking all aspect of works from humans, it will lead to lack of creativity and redundancy on the part of humans while on the other hand it will inspire more creativity and innovation on some humans in their quest to keep their jobs. This study recommends more robust top-level AI design and implementation within the entrepreneurial ecosystem.

International Journal of Entrepreneurship and Business Innovation

Chukwudi Ifekanandu

The study examined the relationship between Innovation Strategies and Sales Performance of food and beverage firms in Rivers State. The study adopted descriptive survey design using a quantitative approach; the population consisted of 25 Food and Beverages Firms in Rivers State. Sequel to the population of the study, which is 25 food and beverages firms, the study adopted a census study with a focus on the managerial staff (production manager, quality control manager, marketing manager and procurement manager). The questionnaire was distributed in the frame of four (4) copies per firm. A total of one hundred (100) copies of the questionnaire were distributed. The reliability of the instrument was determined using Cronbach's alpha test. The data collected for this study were analyzed through descriptive and inferential statistics. The Spearman Rank Order Correlation Technique was employed to test the various hypotheses formulated. It was confirmed that innovation strategies via its dimensions showed a significant relationship with sales performance of food and beverage firms in Rivers State. The study concluded that innovation strategies are significant predictors of sales performance of food and beverage firms in Rivers State. The study recommends amongst others that food and beverage firms in Rivers State that are experiencing low sales performance should embark on incremental innovation as this would increase their level of performance and that they should adopt radical innovation strategy by making some major adjustments to their existing products as this would attract more customers to the firm and increase their sales. KEYWORDS: Innovation Strategies, incremental innovation, radical innovation and Sales Performance.

International Journal of Commerce and Management Studies

Ritesh Sule

This paper delves into the transformative impact of digital technologies on Human Resource (HR) practices, emphasizing the integration of advanced tools such as artificial intelligence (AI), cloud computing, big data analytics, and the Internet of Things (IoT). It explores how these technologies are reshaping traditional HR functions, including recruitment, onboarding, performance management, and employee engagement. The paper highlights the benefits of digital HR systems in improving operational efficiency, enhancing employee experiences, and supporting strategic business objectives. Furthermore, it examines the implementation challenges and strategies for successful digital transformation in HR, underscoring the critical role of continuous feedback, data-driven decision-making, and personalized employee development.

Paul Nemashakwe

Scholars, policy makers and analysts have agreed that the future development of any country rests on the shoulders of Small and Medium Enterprises (SMEs). Although SMEs play an important role in developing countries such as Zimbabwe, 85% are expected to fail within the first three years. Many reasons have been outlined as the causes of such a high failure rate with the most notable ones being a dearth in managerial capacity and an inappropriate leadership model. Zimbabwean SMEs have failed to drive economic growth despite the implementation of Western-initiated leadership models. This is why scholars have argued against the applicability of these models and advocated for the establishment and institutionalisation of indigenous leadership models. The current study sought to develop and validate an Afrocentric Effective Leadership (AEL) model for Zimbabwean SMEs. Quantitative research was carried out employing a survey strategy where data was collected using a questionnaire from 241 p...

Dr. Mogbojuri O Babatunde

Mac-Kingsley Ikegwuru

British Journal of Management and Marketing Studies

Temitope J . Owolabi , Oluyemi Adeosun

Manufacturing firms are regularly filling management and top leadership positions. Any organization's imminent performance depends on the rigor of succession planning and management commitment to it. With increased employee turnover rate, organizations risk losing a wealth of knowledge to old employees, which new employees cannot learn from reading manuals or hand over notes. One-way employee retention can be guaranteed in any organization is the employee's perception towards the possibility of taking up higher roles that are more challenging in the future. The study identifies ineffective succession planning strategy as a factor responsible for the increase in employee turnover in the organization. Using both qualitative and quantitative methods, 218 respondents were purposively selected to examine how the improvement of the succession planning strategy could reduce the rate of employee turnover. Therefore, it recommends embarking on advocacy to gain management's commitment to the development of potential employees for future roles

Journal of Informatics Education and Research

Tushar Savale

Associations are forced to modify their capacity the executive's practices in order to remain serious as the globe goes through a period of noteworthy advanced development. This exploratory study attempts to address ten specific objectives while researching the various effects of technological development on ability across the board in the unique environment of Aurangabad City, India. The article thoroughly describes every single computerized tool, stage, and methodology used by organizations in Aurangabad City to enhance the board's functionality. It delves into the ways that computerized change has upended and re-envisioned traditional ability acquisition tactics, highlighting the approaching information driven recruiting. This study looks at how technological progress has ushered in a new era of learning and development while also altering the methods used to retrain and up skill employees. The study analyses the major impact of technological progress on the prevalence of remote work, virtual groups, and flexible work practices inside associations. It examines how technological advancements have changed the workplace environment and prompted teamwork and dedication at higher levels. The investigation acknowledges the emerging fields of expertise that are crucial for investigating the complex situation. It looks at the regular integration of HR innovations into the executive's procedures and the resulting increases in effectiveness. The analysis delves into the groundbreaking effects of digitization on authority practices, acceptable forms of correspondence, and organizations' real formation. This investigation focuses on the challenges, limitations, and inherent risks associations encounter when they embark on their advanced transformation effort. Finally, it offers organizations in Aurangabad City personalized advice on how to strengthen their use of board practices in the face of rapid change.

Adigun Uthman Opeyemi

Asian And Pacific Economic Review

Chaitanya Kalidindi

This systematic literature review explores the transformative impact of digitalization on recruitment and selection processes in organizations. The study investigates the current state of digital integration and its implications on organizational performance. Utilizing the Input-Mediator-Outcome (IMO) framework, the review examines the role of digital tools and technological, organizational, and environmental factors as mediators and assesses recruitment and organizational performance as outcomes. A rigorous methodology, adhering to PRISMA guidelines, was employed to select 41 relevant articles from major academic databases. Findings indicate widespread adoption of digital technologies such as AI, machine learning, and blockchain, revolutionizing job design, candidate sourcing, screening, interviewing, decision-making, and onboarding. Technological, organizational, and environmental factors influence successful digital integration, impacting recruitment and overall organizational performance. The study emphasizes the need for a balanced approach, considering both technological advancements and human-centric elements. Limitations and implications for researchers and practitioners are discussed, emphasizing strategic implementation, professional development, effective change management, ethical considerations, user-friendly platforms, Explainable AI (XAI) integration, and continuous monitoring and evaluation.

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

RELATED PAPERS

Acta Periodica

Balázs Laki

Journal of Infrastructure, Policy and Development

Stavros Kalogiannidis , Dimitrios Syndoukas

ALHASSAN MUSAH

Christabel Brownson

Frederic W Murray

Adebola A B A S S Jabar

Journal of Business Research

Erin Makarius

World Journal of Advanced Research and Reviews

Business Horizons

Aizhan Tursunbayeva

Sustainable Development and the Digital Economy: Human-centricity, Sustainability and Resilience in Asia

Khairunnisa Musari , sheeba sana

International Journal of Consumer Studies

Dr Lucill Curtis

International Journal of Information Management

Claudia Pagliari

GIT2020 Conference Proceedings

Mayanka S I N G H Chhonker

HASAN AHMAD

African journal of health, nursing and midwifery

Lucy Kivuti-bitok

International journal of academic research in business & social sciences

noor alqudah

Edim E . James

Springer Cham- Series Title Studies in Computational Intelligence

Dr-Bahaa Awwad

International Journal of Management, Technology And Engineering

Shubha Muralidhar

RIVISTA DI STUDI SULLA SOSTENIBILITA'

Kalaa Chenji

ARCN INTERNATIONAL JOURNAL OF BUSINESS AND ENTREPRENEURSHIP RESEARCH

Personnel review

Gilda Antonelli