Assistive in-home robots have the potential to enable older adults to age in place by offloading mentally or physically demanding tasks to a robot. However, one challenge for in-home robots is that each individual will have differing needs, preferences, and home environments, which can all change over time. Learning from Demonstration (LfD) is one solution to enable non-expert users to communicate their differing and changing preferences to a robot, but LfD has not been evaluated with a population of older adults. In a human-subjects experiment where participants teach a robot via LfD, we characterize disparities between older and younger adult participants in terms of robot performance, usability, and participant perceptions. We find that older adults are significantly more critical of the robot’s performance and found the LfD process less usable than younger adults. Based on participant performance and feedback, we present design guidelines that will enable roboticists to increase LfD accessibility across demographics.
IJHCS ’24
Towards the design of user-centric strategy recommendation systems for collaborative Human–AI tasks
Lakshita Dodeja, Pradyumna Tambwekar, Erin Hedlund-Botti, and 1 more author
International Journal of Human-Computer Studies, 2024
Artificial Intelligence is being employed by humans to collaboratively solve complicated tasks for search and rescue, manufacturing, etc. Efficient teamwork can be achieved by understanding user preferences and recommending different strategies for solving the particular task to humans. Prior work has focused on personalization of recommendation systems for relatively well-understood tasks in the context of e-commerce or social networks. In this paper, we seek to understand the important factors to consider while designing user-centric strategy recommendation systems for decision-making. We conducted a human-subjects experiment (n=60) for measuring the preferences of users with different personality types towards different strategy recommendation systems. We conducted our experiment across four types of strategy recommendation modalities that have been established in prior work: (1) Single strategy recommendation, (2) Multiple similar recommendations, (3) Multiple diverse recommendations, (4) All possible strategies recommendations. While these strategy recommendation schemes have been explored independently in prior work, our study is novel in that we employ all of them simultaneously and in the context of strategy recommendations, to provide us an in-depth overview of the perception of different strategy recommendation systems. We found that certain personality traits, such as conscientiousness, notably impact the preference towards a particular type of system (p < 0.01). Finally, we report an interesting relationship between usability, alignment, and perceived intelligence wherein greater perceived alignment of recommendations with one’s own preferences leads to higher perceived intelligence (p < 0.01) and higher usability (p < 0.01).
2023
IJHCI ’23
Explainable Artificial Intelligence: Evaluating the Objective and Subjective Impacts of xAI on Human-Agent Interaction
Andrew Silva, Mariah Schrum, Erin Hedlund-Botti, and 2 more authors
International Journal of Human–Computer Interaction, 2023
Intelligent agents must be able to communicate intentions and explain their decision-making processes to build trust, foster confidence, and improve human-agent team dynamics. Recognizing this need, academia and industry are rapidly proposing new ideas, methods, and frameworks to aid in the design of more explainable AI. Yet, there remains no standardized metric or experimental protocol for benchmarking new methods, leaving researchers to rely on their own intuition or ad hoc methods for assessing new concepts. In this work, we present the first comprehensive (n = 286) user study testing a wide range of approaches for explainable machine learning, including feature importance, probability scores, decision trees, counterfactual reasoning, natural language explanations, and case-based reasoning, as well as a baseline condition with no explanations. We provide the first large-scale empirical evidence of the effects of explainability on human-agent teaming. Our results will help to guide the future of explainability research by highlighting the benefits of counterfactual explanations and the shortcomings of confidence scores for explainability. We also propose a novel questionnaire to measure explainability with human participants, inspired by relevant prior work and correlated with human-agent teaming metrics.
THRI ’23
Concerning Trends in Likert Scale Usage in Human-Robot Interaction: Towards Improving Best Practices
Mariah Schrum, Muyleng Ghuy, Erin Hedlund-Botti, and 3 more authors
As robots become more prevalent, the importance of the field of human-robot interaction (HRI) grows accordingly. As such, we should endeavor to employ the best statistical practices in HRI research. Likert scales are commonly used metrics in HRI to measure perceptions and attitudes. Due to misinformation or honest mistakes, many HRI researchers do not adopt best practices when analyzing Likert data. We conduct a review of psychometric literature to determine the current standard for Likert scale design and analysis. Next, we conduct a survey of five years of the International Conference on Human-Robot Interaction (HRIc) (2016 through 2020) and report on incorrect statistical practices and design of Likert scales [1, 2, 3, 5, 7]. During these years, only 4 of the 144 papers applied proper statistical testing to correctly designed Likert scales. We additionally conduct a survey of best practices across several venues and provide a comparative analysis to determine how Likert practices differ across the field of Human-robot Interaction. We find that a venue’s impact score negatively correlates with number of Likert-related errors and acceptance rate, and total number of papers accepted per venue positively correlates with the number of errors. We also find statistically significant differences between venues for the frequency of misnomer and design errors. Our analysis suggests there are areas for meaningful improvement in the design and testing of Likert scales. Based on our findings, we provide guidelines and a tutorial for researchers for developing and analyzing Likert scales and associated data. We also detail a list of recommendations to improve the accuracy of conclusions drawn from Likert data.
RSS ’23
Investigating the Impact of Experience on a User’s Ability to Perform Hierarchical Abstraction
Nina Marie Moorman, Nakul Gopalan, Aman Singh, and 5 more authors
As high-speed, agile robots become more commonplace, these robots will have the potential to better aid and collaborate with humans. However, due to the increased agility and functionality of these robots, close collaboration with humans can create safety concerns that alter team dynamics and degrade task performance. In this work, we aim to enable the deployment of safe and trustworthy agile robots that operate in proximity with humans. We do so by 1) Proposing a novel human-robot doubles table tennis scenario to serve as a testbed for studying agile, proximate human-robot collaboration and 2) Conducting a user-study to understand how attributes of the robot (e.g., robot competency or capacity to communicate) impact team dynamics, perceived safety, and perceived trust, and how these latent factors affect human-robot collaboration (HRC) performance. We find that robot competency significantly increases perceived trust (p < .001), extending skill-to-trust assessments in prior studies to agile, proximate HRC. Furthermore, interestingly, we find that when the robot vocalizes its intention to perform a task, it results in a significant decrease in team performance (p = .037) and perceived safety of the system (p = .009).
HRI Pioneers ’23
Investigating Learning from Demonstration in Imperfect and Real World Scenarios
Erin Hedlund-Botti, and Matthew Gombolay
In Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, Stockholm, Sweden, Mar 2023
As the world’s population is aging and there are growing shortages of caregivers, research into assistive robots is increasingly important. Due to differing needs and preferences, which may change over time, end-users will need to be able to communicate their preferences to a robot. Learning from Demonstration (LfD) is one method that enables non-expert users to program robots. While a powerful tool, prior research in LfD has made assumptions that break down in real-world scenarios. In this work, we investigate how to learn from suboptimal and heterogeneous demonstrators, how users react to failure with LfD, and the feasibility of LfD with a target population of older adults.
HRI ’23
Impacts of Robot Learning on User Attitude and Behavior
Nina Moorman, Erin Hedlund-Botti, Mariah Schrum, and 2 more authors
In Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, Stockholm, Sweden, Mar 2023
With an aging population and a growing shortage of caregivers, the need for in-home robots is increasing. However, it is intractable for robots to have all functionalities pre-programmed prior to deployment. Instead, it is more realistic for robots to engage in supplemental, on-site learning about the user’s needs and preferences. Such learning may occur in the presence of or involve the user. We investigate the impacts on end-users of in situ robot learning through a series of human-subjects experiments. We examine how different learning methods influence both in-person and remote participants’ perceptions of the robot. While we find that the degree of user involvement in the robot’s learning method impacts perceived anthropomorphism (p=.001), we find that it is the participants’ perceived success of the robot that impacts the participants’ trust in (p<.001) and perceived usability of the robot (p<.001) rather than the robot’s learning method. Therefore, when presenting robot learning, the performance of the learning method appears more important than the degree of user involvement in the learning. Furthermore, we find that the physical presence of the robot impacts perceived safety (p<.001), trust (p<.001), and usability (p<.014). Thus, for tabletop manipulation tasks, researchers should consider the impact of physical presence on experiment participants.
2022
CoRL ’22
Reciprocal MIND MELD: Improving Learning From Demonstration via Personalized, Reciprocal Teaching
Mariah Schrum, Erin Hedlund-Botti, and Matthew Gombolay
In Proceedings of The 6th Conference on Robot Learning, Dec 2022
Endowing robots with the ability to learn novel tasks via demonstrations will increase the accessibility of robots for non-expert, non-roboticists. However, research has shown that humans can be poor teachers, making it difficult for robots to effectively learn from humans. If the robot could instruct humans how to provide better demonstrations, then humans might be able to effectively teach a broader range of novel, out-of-distribution tasks. In this work, we introduce Reciprocal MIND MELD, a framework in which the robot learns the way in which a demonstrator is suboptimal and utilizes this information to provide feedback to the demonstrator to improve upon their demonstrations. We additionally develop an Embedding Predictor Network which learns to predict the demonstrator’s suboptimality online without the need for optimal labels. In a series of human-subject experiments in a driving simulator domain, we demonstrate that robotic feedback can effectively improve human demonstrations in two dimensions of suboptimality (p < .001) and that robotic feedback translates into better learning outcomes for a robotic agent on novel tasks (p = .045).
HRI ’22
MIND MELD: Personalized Meta-Learning for Robot-Centric Imitation Learning
Mariah Schrum*, Erin Hedlund-Botti*, Nina Moorman, and 1 more author
In 2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI), Mar 2022
Learning from demonstration (LfD) techniques seek to enable users without computer programming experience to teach robots novel tasks. There are generally two types of LfD: human- and robot-centric. While human-centric learning is intuitive, human centric learning suffers from performance degradation due to covariate shift. Robot-centric approaches, such as Dataset Aggregation (DAgger), address covariate shift but can struggle to learn from suboptimal human teachers. To create a more human-aware version of robot-centric LfD, we present Mutual Information-driven Meta-learning from Demonstration (MIND MELD). MIND MELD meta-learns a mapping from suboptimal and heterogeneous human feedback to optimal labels, thereby improving the learning signal for robot-centric LfD. The key to our approach is learning an informative personalized em-bedding using mutual information maximization via variational inference. The embedding then informs a mapping from human provided labels to optimal labels. We evaluate our framework in a human-subjects experiment, demonstrating that our approach improves corrective labels provided by human demonstrators. Our framework outperforms baselines in terms of ability to reach the goal (p<.001), average distance from the goal (p=.006), and various subjective ratings (p=.008).
HRI ’22
MIND MELD: Personalized Meta-Learning for Robot-Centric Imitation Learning
Mariah Schrum, Erin Hedlund-Botti, Nina Moorman, and 1 more author
In 2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI), Mar 2022
Learning from demonstration (LfD) techniques seek to enable users without computer programming experience to teach robots novel tasks. There are generally two types of LfD: human- and robot-centric. While human-centric learning is intuitive, human centric learning suffers from performance degradation due to covariate shift. Robot-centric approaches, such as Dataset Aggregation (DAgger), address covariate shift but can struggle to learn from suboptimal human teachers. To create a more human-aware version of robot-centric LfD, we present Mutual Information-driven Meta-learning from Demonstration (MIND MELD). MIND MELD meta-learns a mapping from suboptimal and heterogeneous human feedback to optimal labels, thereby improving the learning signal for robot-centric LfD. The key to our approach is learning an informative personalized em-bedding using mutual information maximization via variational inference. The embedding then informs a mapping from human provided labels to optimal labels. We evaluate our framework in a human-subjects experiment, demonstrating that our approach improves corrective labels provided by human demonstrators. Our framework outperforms baselines in terms of ability to reach the goal (p<.001), average distance from the goal (p=.006), and various subjective ratings (p=.008).
2021
HRI ’21
The Effects of a Robot’s Performance on Human Teachers for Learning from Demonstration Tasks
Erin Hedlund, Michael Johnson, and Matthew Gombolay
In Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction, Boulder, CO, USA, Mar 2021
Learning from Demonstration (LfD) algorithms seek to enable end-users to teach robots new skills through human demonstration of a task. Previous studies have analyzed how robot failure affects human trust, but not in the context of the human teaching the robot. In this paper, we investigate how human teachers react to robot failure in an LfD setting. We conduct a study in which participants teach a robot how to complete three tasks, using one of three instruction methods, while the robot is pre-programmed to either succeed or fail at the task. We find that when the robot fails, people trust the robot less (p < .001) and themselves less (p=.004) and they believe that others will trust them less (p < .001). Human teachers also have a lower impression of the robot and themselves (p < .001) and found the task more difficult when the robot fails (p < .001$). Motion capture was found to be a less difficult instruction method than teleoperation (p=.016), while kinesthetic teaching gave the teachers the lowest impression of themselves compared to teleoperation (p=.017) and motion capture (p < .001). Importantly, a mediation analysis showed that people’s trust in themselves is heavily mediated by what they think that others – including the robot – think of them (p < .001). These results provide valuable insights to improving the human-robot relationship for LfD.