---
name: r-anti-slop
description: >
  Enforce production-quality R code standards. Prevents generic AI patterns
  through namespace qualification, explicit returns, and tidyverse conventions.
  Use when writing or reviewing R code for data analysis or packages.
applies_to:
  - "**/*.R"
  - "**/*.Rmd"
  - "**/*.qmd"
tags: [r, tidyverse, code-quality, data-science]
related_skills:
  - quarto/anti-slop
  - text/anti-slop
version: 2.0.0
---

# R Anti-Slop: Stop Writing `df <- data`

## When to Use This

Use this for:
- ✓ Any R code leaving your machine (analysis, packages, scripts)
- ✓ AI-generated code review (catches `df`, `result`, missing `::`)
- ✓ CRAN submissions (they'll reject generic code anyway)
- ✓ Team code standards

Skip for:
- Quick console experiments (though habits form fast)
- Legacy code you can't touch
- Bioconductor or other style guides that override this

## Quick Example

**Before (AI Slop)**:
```r
# Load the library
library(dplyr)

# Read the data
df <- read.csv("data.csv")

# Filter the data
result <- df %>% filter(x > 0)
```

**After (Anti-Slop)**:
```r
customer_data <- readr::read_csv("data/customers.csv")

active_customers <- customer_data |>
  dplyr::filter(status == "active", revenue > 0)

return(active_customers)
```

**What changed**:
- ✓ Descriptive names (`customer_data` not `df`)
- ✓ Namespace qualification (`dplyr::`, `readr::`)
- ✓ Native pipe (`|>` not `%>%`)
- ✓ No obvious comments
- ✓ Explicit return

## When to Use What

| If you need to... | Do this | Details |
|-------------------|---------|---------|
| Name variables | Use `snake_case`, no `df`/`data`/`result` | reference/naming.md |
| Call tidyverse functions | Always use `::` (e.g., `dplyr::filter()`) | reference/tidyverse.md |
| Return from function | Always explicit `return()` statement | reference/naming.md |
| Write pipe chains | Use `\|>`, break at 8+ operations | reference/tidyverse.md |
| Document functions | Specific `@param`, `@return`, no circular text | reference/documentation.md |
| Handle missing data | Explicit strategy + report data loss | reference/statistical-rigor.md |
| Validate data | Check assumptions with `stopifnot()` | reference/statistical-rigor.md |
| Format code | Use `styler::style_file()` | reference/tidyverse.md |
| Check code quality | Use `lintr::lint()` | reference/tidyverse.md |

## Core Workflow

### 5-Step Quality Check

1. **Namespace qualification** - All external functions use `::`
   ```r
   # Good
   dplyr::filter(data, x > 0)
   # Bad
   filter(data, x > 0)
   ```

2. **Explicit returns** - Every function has `return()`
   ```r
   # Good
   my_function <- function(x) {
     result <- x + 1
     return(result)
   }
   # Bad
   my_function <- function(x) {
     x + 1
   }
   ```

3. **Naming conventions** - All objects use `snake_case`
   ```r
   # Good
   customer_lifetime_value <- calculate_clv(data)
   # Bad
   df <- calculate_clv(data)
   customerLifetimeValue <- calculate_clv(data)
   ```

4. **Documentation quality** - No generic descriptions
   ```r
   # Good
   #' @param deaths Data frame with `age_group` and `count` columns
   # Bad
   #' @param data The data
   ```

5. **Code formatting** - Run styler and lintr
   ```r
   styler::style_file("script.R")
   lintr::lint("script.R")
   ```

## Quick Reference Checklist

Before committing R code, verify:

- [ ] All external functions qualified with `::`
- [ ] All functions have explicit `return()`
- [ ] All objects use `snake_case`
- [ ] No generic names (`df`, `data`, `result`, `temp`)
- [ ] Pipes (`|>`) have space before, end lines
- [ ] Long pipelines (>8 ops) broken into named steps
- [ ] Complex operations have WHY comments
- [ ] Data validated after transformations
- [ ] Seeds set before random operations
- [ ] Uncertainty reported (SE, CI) for statistical models
- [ ] No `attach()` calls
- [ ] No right-hand assignment (`->`)
- [ ] Roxygen documentation is specific
- [ ] Examples are realistic and run

## Common Workflows

### Workflow 1: Clean Up AI-Generated R Script

**Context**: AI generated an analysis script with generic patterns.

**Steps**:

1. **Run detection script**
   ```bash
   Rscript toolkit/scripts/detect_slop.R analysis.R --verbose
   ```

2. **Fix high-priority issues first**
   ```r
   # Replace df, data, result with descriptive names
   # Before
   df <- readr::read_csv("data.csv")
   result <- df %>% filter(x > 0)

   # After
   customer_data <- readr::read_csv("data/customers.csv")
   active_customers <- customer_data |> dplyr::filter(status == "active")
   ```

3. **Add namespace qualification**
   ```r
   # Before
   data %>% filter(x > 0) %>% summarize(mean(y))

   # After
   data |>
     dplyr::filter(x > 0) |>
     dplyr::summarize(mean_y = mean(y))
   ```

4. **Add explicit returns**
   ```r
   # Before
   calculate_rate <- function(numerator, denominator) {
     numerator / denominator
   }

   # After
   calculate_rate <- function(numerator, denominator) {
     rate <- numerator / denominator
     return(rate)
   }
   ```

5. **Break long pipes**
   ```r
   # Before (12 operations in one chain)
   result <- data |>
     filter(...) |> mutate(...) |> group_by(...) |>
     summarize(...) |> arrange(...) |> [7 more ops]

   # After
   clean_data <- data |>
     dplyr::filter(!is.na(value)) |>
     dplyr::mutate(category = categorize(value))

   summary_stats <- clean_data |>
     dplyr::group_by(category) |>
     dplyr::summarize(mean_val = mean(value))
   ```

6. **Format and validate**
   ```r
   styler::style_file("analysis.R")
   lintr::lint("analysis.R")
   ```

**Expected outcome**: Score drops from 60+ to <20

---

### Workflow 2: Fix Generic Package Documentation

**Context**: R package has generic roxygen documentation.

**Steps**:

1. **Identify generic patterns**
   ```r
   # Bad
   #' Process Data
   #'
   #' @description This function processes the data.
   #' @param data The data.
   #' @return The result.
   ```

2. **Make description specific**
   ```r
   # Good
   #' Calculate age-adjusted mortality rates
   #'
   #' Computes mortality rates per 100,000 population, standardized to the
   #' 2000 US Census age distribution using direct standardization.
   ```

3. **Describe parameter structure**
   ```r
   # Good
   #' @param deaths Data frame with columns `age_group` and `count`.
   #' @param population Data frame with columns `age_group` and `pop_size`.
   ```

4. **Specify return value**
   ```r
   # Good
   #' @return A tibble with columns:
   #'   \describe{
   #'     \item{county}{County FIPS code}
   #'     \item{rate}{Age-adjusted rate per 100,000}
   #'     \item{se}{Standard error of the rate}
   #'   }
   ```

5. **Add realistic examples**
   ```r
   # Good
   #' @examples
   #' counties <- data.frame(
   #'   county = c("A", "B"),
   #'   deaths = c(150, 200),
   #'   population = c(50000, 80000)
   #' )
   #'
   #' adjust_rates(counties, rate_per = 100000)
   #' #> # A tibble: 2 x 3
   #' #>   county  rate    se
   #' #> 1 A       312.  25.4
   #' #> 2 B       258.  18.2
   ```

**Expected outcome**: Documentation that teaches, not restates

---

### Workflow 3: Prepare Package for CRAN

**Context**: Final checks before CRAN submission.

**Steps**:

1. **Run all quality checks**
   ```r
   # Standard checks
   devtools::check()

   # Anti-slop checks
   lapply(list.files("R", full.names = TRUE), function(f) {
     system(paste("Rscript toolkit/scripts/detect_slop.R", f))
   })
   ```

2. **Fix documentation**
   - Check all `@param` descriptions are specific
   - Verify `@examples` run and are realistic
   - Ensure `@return` describes structure

3. **Validate code quality**
   ```r
   # Format all files
   styler::style_dir("R/")

   # Check lints
   lintr::lint_package()
   ```

4. **Check CRAN-specific requirements**
   - Validate URLs in DESCRIPTION and documentation
   - Check examples run in < 5 seconds
   - Verify package structure meets CRAN standards

**Expected outcome**: Clean `R CMD check` with no slop patterns

## Mandatory Rules Summary

### 1. Namespace Qualification
**ALWAYS use `::` for external packages**

Exceptions (don't need `::`):
- Base R: `mean()`, `sum()`, `log()`, etc.
- stats: `lm()`, `glm()`, `t.test()`, etc.
- utils: `head()`, `tail()`, `str()`, etc.

### 2. Explicit Returns
**ALWAYS use `return()` - never implicit**

### 3. Naming: snake_case
**All objects use `snake_case`**
- Variables: `customer_data` not `customerData` or `df`
- Functions: `calculate_rate` not `calculateRate`
- Arguments: `input_data` not `inputData`

### 4. Native Pipe
**Prefer `|>` over `%>%`** (unless R < 4.1)

### 5. No Generic Names
**Never use**: `df`, `data`, `result`, `temp`, `x`, `n` (except standard math notation)

## Tidyverse Philosophy

Follow [Tidyverse Style Guide](https://style.tidyverse.org/) as primary reference:

1. **Design for humans** - Code should be readable and intuitive
2. **Reuse existing data structures** - Work with tibbles and data frames
3. **Compose simple functions with pipes** - Build complexity through composition
4. **Embrace functional programming** - Functions are first-class objects

See **reference/tidyverse.md** for complete tidyverse conventions.

## Resources & Advanced Topics

### Reference Files

- **[reference/naming.md](reference/naming.md)** - Complete naming conventions and forbidden patterns
- **[reference/tidyverse.md](reference/tidyverse.md)** - Pipe conventions, formatting, ggplot2 standards
- **[reference/documentation.md](reference/documentation.md)** - Roxygen2, vignettes, README quality
- **[reference/statistical-rigor.md](reference/statistical-rigor.md)** - Validation, uncertainty, reproducibility
- **[reference/forbidden-patterns.md](reference/forbidden-patterns.md)** - Complete antipattern catalog

### Related Skills

- **text/anti-slop** - For cleaning prose in documentation
- **quarto/anti-slop** - For cleaning vignettes and documentation

### Tools

- `styler::style_file()` - Auto-format code
- `lintr::lint()` - Check code quality
- `Rscript toolkit/scripts/detect_slop.R` - Detect AI patterns

## Integration with Posit Skills

This skill focuses on **code quality and avoiding generic patterns**.

Use together with Posit skills for complete coverage:

| Task | Use This Skill | + Posit Skill |
|------|----------------|---------------|
| Write error messages | r/anti-slop (quality) | + r-lib/cli (structure) |
| Write tests | r/anti-slop (code quality) | + r-lib/testing (test patterns) |
| Prepare for CRAN | r/anti-slop (no slop) | + r-lib/cran-extrachecks (requirements) |
| Document lifecycle | r/anti-slop (doc quality) | + r-lib/lifecycle (deprecation) |
