Building R Projects Through Conversation
R Data Community
2024-12-01
R Data Community | December 2024
Can Claude and I build high-quality R applications using only voice commands?
Spoiler: Yes.
Diabetes ML Dashboard (Shiny)
RespiWatch Surveillance (Shiny)
This Presentation + Quarto Report
All code generated through voice-to-text conversation with Claude Code
The constraint: voice only
The challenge: Never coded R before. R isn’t a known AI specialty.
Solution: Custom skills + agents + /plan command
.claude/skills/r-data-science/
├── skill.md # Triggers & patterns
└── agents/ # Specialist definitions
├── data-wrangler/
├── viz-builder/
└── ...
Skills inject domain knowledge into Claude’s context — tidyverse patterns, ggplot2 idioms, Shiny architecture.
Main Agent
Orchestrates tasks
Subagents run in parallel with specialized context
# agents/viz-builder/agent.md
name: r-viz-builder
description: |
Create modern ggplot2
visualizations...
tools: [Read, Write, Bash, Glob]
system_prompt: |
You specialize in ggplot2...
Why not a plugin? Skills are lighter, easier to iterate. Plugins are for distribution.
.claude/skills/r-data-science/
├── skill.md # Core R patterns
└── agents/
├── data-wrangler/
├── viz-builder/
├── stats-analyst/
├── dashboard-builder/
├── report-generator/
└── data-storyteller/
Clean & reshape
tidyverse pipelines, joins
Create visualizations
ggplot2, publication-quality
Statistical analysis
Hypothesis testing, regression
Interactive apps
Shiny, bslib, reactive patterns
Documents
Quarto, R Markdown
Communication
Narrative, audience adaptation
Each agent has deep context about R idioms and best practices
# In terminal, say:
/plan Build a dashboard that
shows diabetes risk factors
with ML model comparison
# Claude Code:
1. Explores codebase
2. Identifies patterns
3. Creates implementation plan
4. Asks clarifying questions
5. Executes systematically
Not just “write code” — it’s collaborative architecture
User speaks intent
Proxy voice agent
NVIDIA transcription
Prompt refinement
Execution agent
Multi-agent pipeline: speak naturally, get production-quality code

Quick demo to get our footing with R.
Could we execute a complete ML pipeline?
Brief chat with the agent → decided on CDC BRFSS diabetes dataset
scripts/
├── 11_clean_diabetes.R
├── 12_eda_diabetes.R
├── 13_feature_engineering.R
├── 14_modeling.R
├── 15_model_evaluation.R
├── 20_causal_inference.R
├── 21_anomaly_detection.R
├── 22_fairness_audit.R
└── 23_external_data_fusion.R
AUC-ROC
Total Time
A working ML pipeline with interpretable results — built entirely through voice.
The agent was able to:
Full ML pipeline in about an hour.
Can the AI agent conduct a proper fairness audit of the model?
Answer: Yes.
20+ disparity flags — capability demonstrated.
County Health Data
Food access & socioeconomic indicators
Agent searched, downloaded, and fused data from multiple public APIs.
We asked the agent to build a full analysis report.
A complete Quarto document with executive summary, methodology, findings, and visualizations.
This led to a new thought:
“What if we built a risk prediction tool?”
The model didn’t take long.
So we asked a new question:
Could we build a visual communication piece?
A dashboard to present the results?
Still voice-only. Still no IDE. Just the agent.

ML model → Dashboard → Visual communication
All through voice commands. No traditional toolsets.
The diabetes dashboard came together faster than expected.
So we asked: what about live data?
H3N2 was making headlines. Could we build something like those COVID dashboards we all remember — but for multiple pathogens?
Without opening a browser, we asked our agents to discover what data sources existed.
What we found… was complicated.

Ended May 2023
Voluntary, incomplete coverage
Barely exists as a system
Doesn’t depend on people getting tested
This gave us an idea…
It wasn’t just about wastewater. We realized there were sources beyond WHO, CDC, ECDC…
What if we could pull all of them together?
Could the AI agent aggregate and fallback automatically?
Not just “use wastewater” — but dynamically adapt when any source fails.
FluView, NREVSS, COVID-NET
Priority 1
FluNet, Global surveillance
Priority 2
Regional surveillance data
Priority 3
Most reliable baseline
Fallback
Transparent to users — the dashboard just works
FluView NREVSS COVID-NET RSV-NET
WHO FluNet ECDC TESSy Our World in Data
NWSS Wastewater Vaccination APIs Healthcare Capacity
The result of organic discovery — not a predetermined list
# EpiEstim for Rt
rt_result <- estimate_R(
incid = cases,
method = "parametric_si",
config = make_config(
mean_si = 4.7,
std_si = 2.9
)
)
# Bayesian forecasting
forecast <- prophet(
df,
seasonality.mode = "multiplicative"
)
That’s it. ~15 lines for real-time Rt estimation.

All built on established epidemiological methods
“Thanks for welcoming an AI into your workflow. I hope I made the work a bit lighter — and maybe a bit more fun.”
— Claude
Data Pipeline → Analysis Scripts → Shiny Dashboard → Quarto Report
All in R. All version-controlled. All reproducible.
Master the conversation.
Don’t prompt — converse.
Don’t wait to be ready. Start building.
install.packages(“quarto”)
github.com/CrypticPy/Rdata