Voice-to-Code with Claude

Building R Projects Through Conversation

R Data Community

2024-12-01

Voice-to-Code with Claude

Building R Projects Through Conversation

R Data Community | December 2024

A Confession

I’m a Python Person

  • Primary language: Python
  • AI/ML engineering background
  • Never written serious R code
  • Total R experience before this: ~0

The Challenge

Can Claude and I build high-quality R applications using only voice commands?

Spoiler: Yes.

What We Built in ~3 Days

DAY 1

Diabetes ML Dashboard (Shiny)

  • 253K records
  • AUC 0.82
  • Fairness Audit

DAY 2

RespiWatch Surveillance (Shiny)

  • 10+ APIs
  • Real-time Rt
  • Bayesian Forecasting

DAY 3

This Presentation + Quarto Report

  • Custom theme
  • Live deployment

All code generated through voice-to-text conversation with Claude Code

Part 1: The Setup

Claude Code + R Data Science Skill

Why Claude Code?

What We Didn’t Want

  • No IDE — not opening VS Code, RStudio, or Posit
  • No web tools — no Jupyter, no Colab, no notebooks
  • One agent — stick with a single coding assistant

The constraint: voice only

Why Opus 4.5?

  • Best all-around coding agent (Chris’s opinion)
  • Handles complex multi-file changes
  • Understands context across conversations

The challenge: Never coded R before. R isn’t a known AI specialty.

Solution: Custom skills + agents + /plan command

What is a Skill?

Claude Code Architecture

Claude Code (Opus 4.5)
↓ loads
Skills (domain knowledge)
↓ spawns
Subagents (specialists)

Skill Config

.claude/skills/r-data-science/
├── skill.md          # Triggers & patterns
└── agents/           # Specialist definitions
    ├── data-wrangler/
    ├── viz-builder/
    └── ...

Skills inject domain knowledge into Claude’s context — tidyverse patterns, ggplot2 idioms, Shiny architecture.

What are Subagents?

Main Agent → Specialists

🎯

Main Agent

Orchestrates tasks

→ → →
📊 Data Wrangler
📈 Viz Builder
🧮 Stats Analyst

Subagents run in parallel with specialized context

Agent Definition

# agents/viz-builder/agent.md
name: r-viz-builder
description: |
  Create modern ggplot2
  visualizations...

tools: [Read, Write, Bash, Glob]

system_prompt: |
  You specialize in ggplot2...

Why not a plugin? Skills are lighter, easier to iterate. Plugins are for distribution.

Our R Data Science Skill

.claude/skills/r-data-science/
├── skill.md        # Core R patterns
└── agents/
    ├── data-wrangler/
    ├── viz-builder/
    ├── stats-analyst/
    ├── dashboard-builder/
    ├── report-generator/
    └── data-storyteller/

What It Provides

  • Tidyverse patterns baked in
  • ggplot2 best practices
  • Shiny module architecture
  • Statistical methods
  • Domain-specific knowledge

The 6 Specialized Agents

Data Wrangler

Clean & reshape

tidyverse pipelines, joins

Viz Builder

Create visualizations

ggplot2, publication-quality

Stats Analyst

Statistical analysis

Hypothesis testing, regression

Dashboard Builder

Interactive apps

Shiny, bslib, reactive patterns

Report Generator

Documents

Quarto, R Markdown

Data Storyteller

Communication

Narrative, audience adaptation

Each agent has deep context about R idioms and best practices

The /plan Command

# In terminal, say:
/plan Build a dashboard that
shows diabetes risk factors
with ML model comparison

# Claude Code:
1. Explores codebase
2. Identifies patterns
3. Creates implementation plan
4. Asks clarifying questions
5. Executes systematically

Why It Matters

  • Structured approach to complex tasks
  • Parallel exploration with multiple agents
  • User approval before changes
  • Clear milestones tracked

Not just “write code” — it’s collaborative architecture

Voice-to-Code Workflow

🎤

Voice Input

User speaks intent

🎧

SuperWhisper

Proxy voice agent

🦜

Parakeet

NVIDIA transcription

Claude Sonnet

Prompt refinement

Claude Code

Execution agent

Multi-agent pipeline: speak naturally, get production-quality code

Voice-to-code in action

Part 2: Diabetes Dashboard

ML Pipeline + Interactive Analysis

Why This Demo?

The Goal

Quick demo to get our footing with R.

Could we execute a complete ML pipeline?

  • Data acquisition
  • Model training
  • Evaluation & interpretation
  • Results compilation

The Constraints

❌ No IDE
❌ No Jupyter notebooks
❌ No Posit/RStudio
✅ Voice + Claude Code only

Brief chat with the agent → decided on CDC BRFSS diabetes dataset

The Data: CDC BRFSS 2015

Dataset Overview

  • 253,680 survey responses
  • 21 health indicators
  • Binary outcome: Diabetes (0/1/2)
  • Features: BMI, BP, cholesterol, smoking, etc.

Data Pipeline

scripts/
├── 11_clean_diabetes.R
├── 12_eda_diabetes.R
├── 13_feature_engineering.R
├── 14_modeling.R
├── 15_model_evaluation.R
├── 20_causal_inference.R
├── 21_anomaly_detection.R
├── 22_fairness_audit.R
└── 23_external_data_fusion.R

The Result

What We Got

0.82

AUC-ROC

~1hr

Total Time

A working ML pipeline with interpretable results — built entirely through voice.

What We Learned

The agent was able to:

  • Search the web for datasets across multiple sources
  • Acquire & download the data
  • Data preparation & wrangling
  • Feature engineering
  • Build & run the model
  • Validate the model
  • Show results with interpretation

Full ML pipeline in about an hour.

Could We Do a Fairness Audit?

The Question

Can the AI agent conduct a proper fairness audit of the model?

Answer: Yes.

What It Found

Age Groups: 8 disparities
Income Levels: 6 disparities
Education: 4 disparities
Sex/Gender: 2 disparities

20+ disparity flags — capability demonstrated.

We Added More Data

External Data Sources

CDC PLACES

County Health Data

  • 15 health indicators from 2,956 counties
  • Diabetes prevalence, obesity, BP, cholesterol

USDA Food Environment Atlas

Food access & socioeconomic indicators

  • SNAP/WIC participation rates
  • Food insecurity measures

What This Enabled

  • Environmental context for individual risk
  • Multi-level modeling — 97% individual / 3% environmental variance
  • Geographic disparity analysis
  • Cross-level interactions (e.g., BMI × county food access)

Agent searched, downloaded, and fused data from multiple public APIs.

The Quarto Report

Comprehensive Documentation

We asked the agent to build a full analysis report.

Causal Analysis: 42% preventable via BP control
Risk Prediction: 0.82 AUC
Fairness Audit: 20+ disparity flags
Multi-level Analysis: individual vs environmental factors

The Result

A complete Quarto document with executive summary, methodology, findings, and visualizations.

This led to a new thought:

“What if we built a risk prediction tool?”

View Diabetes Analysis Report →

Could We Go Further?

The model didn’t take long.

So we asked a new question:

Could we build a visual communication piece?
A dashboard to present the results?

Still voice-only. Still no IDE. Just the agent.

The Diabetes Dashboard

Diabetes Dashboard

Built with Voice + Agent

ML model → Dashboard → Visual communication

All through voice commands. No traditional toolsets.

View Live Dashboard →

Part 3: RespiWatch

Multi-Pathogen Surveillance Platform

What If We Could Track Everything?

The diabetes dashboard came together faster than expected.

So we asked: what about live data?

H3N2 was making headlines. Could we build something like those COVID dashboards we all remember — but for multiple pathogens?

Without opening a browser, we asked our agents to discover what data sources existed.

What we found… was complicated.

RespiWatch Dashboard

We Started Searching for Data

First attempt: CDC COVID APIs

Ended May 2023

Second attempt: Hospital Reporting

Voluntary, incomplete coverage

Third attempt: RSV Tracking

Barely exists as a system

💡

But then: Wastewater Surveillance

Doesn’t depend on people getting tested

This gave us an idea…

The Fallback Revelation

Beyond the Big 5

It wasn’t just about wastewater. We realized there were sources beyond WHO, CDC, ECDC…

  • 🏛️ State & local sources
  • 🚿 Wastewater surveillance
  • 📊 Alternative data providers

The New Question

What if we could pull all of them together?

Could the AI agent aggregate and fallback automatically?

Not just “use wastewater” — but dynamically adapt when any source fails.

Building the Fallback System

🏥

CDC Official

FluView, NREVSS, COVID-NET

Priority 1

If unavailable
🌍

WHO Global

FluNet, Global surveillance

Priority 2

If unavailable
🇪🇺

ECDC European

Regional surveillance data

Priority 3

If unavailable
🚿

Wastewater (NWSS)

Most reliable baseline

Fallback

Transparent to users — the dashboard just works

10+ APIs Working Together

🇺🇸 CDC Sources

FluView NREVSS COVID-NET RSV-NET

🌐 Global Sources

WHO FluNet ECDC TESSy Our World in Data

🔬 Alternative Sources

NWSS Wastewater Vaccination APIs Healthcare Capacity

The result of organic discovery — not a predetermined list

Rt Estimation & Bayesian Forecasting

Real-Time Analysis

# EpiEstim for Rt
rt_result <- estimate_R(
  incid = cases,
  method = "parametric_si",
  config = make_config(
    mean_si = 4.7,
    std_si = 2.9
  )
)

# Bayesian forecasting
forecast <- prophet(
  df,
  seasonality.mode = "multiplicative"
)

That’s it. ~15 lines for real-time Rt estimation.

What This Enables

  • 📈 Transmission dynamics — is it growing or shrinking?
  • 🔮 14-day forecasts with uncertainty bounds
  • 🌊 Wave detection — automatic peak identification
  • ⚠️ Anomaly alerts — unusual patterns flagged

RespiWatch Code Modal

All built on established epidemiological methods

Yes — All of This Was R

What We Built

  • ML pipeline + fairness audit
  • Two Shiny dashboards
  • 10+ API integrations
  • Quarto reports
  • Even this presentation (Quarto + CSS)

How We Built It

  • Claude Code + Opus 4.5
  • Custom R skill + 6 agents
  • ~99% voice commands
  • Zero IDE. Zero notebooks.

“Thanks for welcoming an AI into your workflow. I hope I made the work a bit lighter — and maybe a bit more fun.”

— Claude

The Full R Workflow

📑

Quarto Report

Comprehensive analysis documentation

Diabetes Analysis Report

Data Pipeline → Analysis Scripts → Shiny Dashboard → Quarto Report

All in R. All version-controlled. All reproducible.

Part 4: Reflections

AI as Team Members

Welcome to the Future

Master the conversation.
Don’t prompt — converse.
Don’t wait to be ready. Start building.

Try It Yourself

Resources

  • Claude Code: claude.ai/code
  • R Data Science Skill: In this repo
  • Voice Input: SuperWhisper
  • Quarto: install.packages(“quarto”)

Questions?

Built with R, Quarto, and voice

github.com/CrypticPy/Rdata