Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: NOVA architecture for predictive coding #4

Merged
merged 6 commits into from
Dec 13, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
.cursorrules

.venv/
.env

# Python
__pycache__/
Expand Down
108 changes: 108 additions & 0 deletions docs/01_nova_predcod_use_cases.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# NOVA Architecture & Predictive Coding: Integration with Health Literacy Use Cases

## 1. Technical Foundation

### 1.1 NOVA Architecture

The NOVA architecture forms an advanced framework for developing virtual humans, based on three interconnected processing layers, each operating on a different timescale. This layered structure reflects how human cognition and interaction function, from rapid reflexive reactions to deep learning processes.

The first layer, the **Reactive Layer**, operates on an ultra-fast timescale of 50-300 milliseconds. This layer handles direct, automatic reactions essential for natural interaction. This includes immediate recognition of emotions in facial expressions or voice use, responding to sudden movements, and processing basic social signals. It's this layer that ensures the virtual human feels 'human-like' in direct interactions.

The middle layer, the **Responsive Layer**, functions on a timescale of 300-1000 milliseconds and is responsible for context-aware, intelligent responses. This layer integrates information from various sources, manages dialogue, and ensures responses fit within the broader context of the conversation. It acts as the system's 'working memory', ensuring coherent and relevant interactions.

The third layer, the **Reflective Layer**, works on a longer timescale of more than 1000 milliseconds and is responsible for deep analysis and learning. This layer analyzes patterns across multiple interactions, adapts the virtual human's behavior based on experiences, and integrates cultural and contextual knowledge. This layer enables long-term adaptation and personalization of the interaction.

### 1.2 Predictive Coding & Free Energy Framework

The theoretical foundation for our approach to virtual humans is based on two groundbreaking concepts from cognitive science: Karl Friston's Free Energy Principle and Andy Clark's Prediction Machine perspective. These theories provide crucial insights into how human cognition works and how we can apply this to virtual human interactions.

#### 1.2.1 Theoretical Foundation

The Free Energy Principle, developed by Karl Friston, posits that biological systems continuously strive to minimize prediction errors. This principle explains how our brain functions as a prediction machine that constantly tries to anticipate incoming information. Andy Clark has further developed this concept in his work on predictive processing, where he describes the brain as a "prediction machine" that constantly makes and adjusts predictions based on new information.

A crucial concept here is Active Inference: the idea that we don't just passively predict, but actively act to test and adjust our predictions. This explains why people feel more comfortable in predictable situations and actively seek confirmation of their expectations.

#### 1.2.2 Application to Virtual Humans

These theoretical insights have direct implications for virtual human design. To be effective, virtual humans must exhibit consistent and predictable behavior that aligns with human expectation patterns. This doesn't mean they should be rigid or robotic, but rather that their behavior should fit within the prediction model that people unconsciously build during interaction.

#### 1.2.3 Practical Implications

In practice, this translates into several concrete design principles. Virtual humans must be consistent in their response times, allowing users to develop a feel for the natural rhythm of interaction. Their emotional reactions must be predictable within the context of the conversation, without abrupt or inexplicable changes. Moreover, the system must actively adapt to individual users' expectations and preferences, creating a personalized interaction that feels increasingly natural.

## 2. Integration with Use Cases

The theoretical foundations of NOVA and predictive coding find their practical application in three concrete use cases for healthcare. Each of these cases demonstrates how the layered architecture and predictive processing can contribute to improving health literacy.

### 2.1 Use Case 1: The STEP Platform

The STEP platform faces a challenge characteristic of many modern health platforms: an abundance of information that paradoxically can reduce accessibility. Users are confronted with an extensive offering from 147 health organizations, which can be overwhelming without proper guidance.

The integration of NOVA and predictive coding offers an elegant solution here. At the reactive level, the system immediately recognizes signals of overwhelm or frustration from the user. The responsive layer translates these signals into context-aware suggestions, while the reflective layer analyzes patterns in user behavior to increasingly personalize the experience.

Imagine: a user searches for information about fatigue. Instead of an overwhelming list of options, the virtual human guides the search process naturally. The system learns from each interaction and adapts to the user's preferences and comprehension level, making information increasingly relevant and accessible.

### 2.2 Use Case 2: Conversation Coach

The challenge of effective communication between healthcare providers and patients is complex and multidimensional. Here, the power of the layered NOVA architecture comes into its own. The virtual human functions as an advanced conversation coach that helps both patients and healthcare providers improve their communication skills.

The reactive layer monitors emotional signals in real-time during conversation simulations. This enables the system to respond immediately to subtle signs of discomfort, confusion, or stress. The responsive layer dynamically adapts the conversation, while the reflective layer identifies deeper patterns in communication styles and adjusts coaching accordingly.

This approach is particularly effective because the system not only reacts to what is said but also anticipates possible miscommunication. The predictive coding framework ensures that the virtual human learns which interventions are most effective for different users and situations.

### 2.3 Use Case 3: Prevention Promoter

The challenge of preventive care - where, for example, only 63.6% participate in population screening - requires a sophisticated approach that goes beyond mere information provision. Here, the power of the predictive processing framework becomes clearly evident.

The virtual human uses all three processing layers to identify and address barriers to participation. At the reactive level, immediate concerns and anxieties are recognized. The responsive layer provides personalized information that addresses the user's specific concerns. The reflective layer analyzes patterns in decision-making and helps develop long-term strategies for behavioral change.

The system learns to recognize, for example, which factors lead different users to hesitate or delay preventive care. By combining these insights with predictive coding principles, the virtual human can proactively anticipate concerns and remove barriers before they become obstacles.

## 3. Research Benefits & Theoretical Foundation

The strength of our approach lies in the integration of various theoretical frameworks that reinforce and complement each other. This synthesis provides not only a solid scientific foundation but also concrete implementation tools.

### 3.1 Integration of Theoretical Perspectives

The Predictive Processing Framework by Clark and Friston forms the theoretical backbone of our approach. This framework describes how the human brain functions as a prediction machine constantly refining its internal models. In virtual humans, we translate this into a hierarchical system of predictions operating simultaneously at different levels.

At the most basic level, the system predicts the user's next words or gestures. At a higher level, intentions and emotional states are predicted, while at the highest level, even long-term behavioral patterns are modeled. These layered predictions are weighted based on their reliability - what Friston calls 'precision-weighting'. A trembling voice, for example, receives more weight as an indicator of emotional state than a single word choice.

The Action-Oriented Predictive Processing aspect is particularly relevant for our virtual humans. The system doesn't just predict passively but actively generates interventions that help verify or adjust predictions. If the system suspects a user is becoming overwhelmed by information, it can test this by suggesting a pause and observing the response.

The insights from Nass's Computers as Social Actors research form a crucial addition to this predictive framework. This research shows that people automatically and unconsciously apply social rules in their interaction with computers. This isn't a bug but a feature we can utilize. By incorporating minimal but carefully chosen social cues, we can facilitate natural and effective interactions without falling into the 'uncanny valley'.

The practical application of the Free Energy Principle manifests in how our virtual humans handle uncertainty. The system constantly strives to minimize 'free energy' - a measure of the discrepancy between expectations and reality. This happens not only by making better predictions but also by actively steering the interaction to make it more predictable for the user.

### 3.2 Empirical Validation

The effectiveness of this theoretical integration can be measured using various metrics that illuminate different aspects of the interaction. Response times at different levels provide insight into the efficiency of the processing layers. The accuracy of predictions, measured by how well the system anticipates user behavior, validates the predictive processing aspect.

User satisfaction is measured not only through explicit feedback but also through analysis of interaction patterns and engagement metrics. Behavioral change, particularly relevant for healthcare applications, is evaluated through longitudinal studies examining both direct and indirect indicators of improved health literacy.

This empirical approach enables us to validate the theoretical foundations while optimizing the practical effectiveness of the interventions. Through continuous monitoring and analysis, we can refine and adapt the system to the specific needs of different user groups.

## 4. Research References

### Theoretical Framework
- Friston, K. (2010). The free-energy principle: A unified brain theory? *Nature Reviews Neuroscience*, 11(2), 127-138.
- Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. *Behavioral and Brain Sciences*, 36(3), 181-204.
- Clark, A. (2015). *Surfing Uncertainty: Prediction, Action, and the Embodied Mind*. Oxford University Press.
- Hohwy, J. (2013). *The Predictive Mind*. Oxford University Press.

### Human-Computer Interaction & Social Response
- Nass, C., Steuer, J., & Tauber, E. R. (1994). Computers are social actors. *Proceedings of the SIGCHI Conference on Human Factors in Computing Systems*, 72-78.
- Reeves, B., & Nass, C. (1996). *The Media Equation: How People Treat Computers, Television, and New Media Like Real People and Places*. Cambridge University Press.

### Virtual Humans & Healthcare
- Bickmore, T. W., & Picard, R. W. (2005). Establishing and maintaining long-term human-computer relationships. *ACM Transactions on Computer-Human Interaction*, 12(2), 293-327.
- Bickmore, T. W., Pfeifer, L. M., & Jack, B. W. (2009). Taking the time to care: Empowering low health literacy hospital patients with virtual nurse agents. *Proceedings of the SIGCHI Conference on Human Factors in Computing Systems*, 1265-1274.

### Architecture & Implementation
- Architecture of Emotional-Cognitive Agents:
- Gratch, J., & Marsella, S. (2004). A domain-independent framework for modeling emotion. *Cognitive Systems Research*, 5(4), 269-306.
- Sun, R. (2006). The CLARION cognitive architecture: Extending cognitive modeling to social simulation. In *Cognition and Multi-Agent Interaction* (pp. 79-99).

### Health Literacy & Technology
- Mackert, M., Mabry-Flynn, A., Champlin, S., Donovan, E. E., & Pounders, K. (2016). Health literacy and health information technology adoption: The potential for a new digital divide. *Journal of Medical Internet Research*, 18(10), e264.
- Parker, R. M., & Kindig, D. A. (2006). Beyond the Institute of Medicine health literacy report: Are the recommendations being taken seriously? *Journal of General Internal Medicine*, 21(8), 891-892.
Loading
Loading