# User Guide

This guide walks you through the complete GeoLift workflow for measuring marketing campaign effectiveness using causal inference.

## Business Value & Positioning

### What Problem We Solve
GeoLift answers the most critical marketing question: **"How much incremental revenue did my campaign actually generate?"** It separates true campaign impact from natural market fluctuations using rigorous causal inference.

### Why This Matters for Your Business
- **Prove ROI**: Get statistical confidence (not just correlation) in your marketing returns
- **Optimize Spend**: Identify which regional campaigns deliver the best incremental results
- **Avoid False Positives**: Stop attributing natural growth to your campaigns
- **Support Budget Decisions**: Provide rigorous evidence to leadership and finance teams
- **Competitive Advantage**: Make data-driven decisions while competitors rely on assumptions

### Where GeoLift Fits in Your Measurement Stack

**Primary Use Cases:**
- Regional advertising campaigns (TV, radio, outdoor, digital geo-targeting)
- Store rollouts and expansion strategies
- Local market tests and pilot programs
- Geo-targeted promotional campaigns

**Complements (doesn't replace):**
- **Media Mix Modeling (MMM)**: GeoLift provides ground truth for MMM calibration
- **Attribution**: Adds causal rigor to last-touch and multi-touch attribution
- **A/B Testing**: Use when randomized testing isn't feasible
- **Brand Studies**: Provides sales impact to complement brand awareness metrics

**Integration Points:**
- Exports to your existing BI tools (Tableau, Power BI, etc.)
- API integration with analytics platforms
- Compatible with Google Analytics, Adobe Analytics data
- Works with your existing data warehouse

### Investment & ROI

**Implementation:**
- **Timeline**: 2-4 weeks from contract to first analysis
- **Resources**: 1 analyst + IT support for data integration
- **Training**: 2-day workshop for your team

**Expected Returns:**
- **Immediate**: Stop wasting budget on ineffective campaigns (typically 10-30% savings)
- **Short-term**: Optimize active campaigns for better performance (15-25% lift improvement)
- **Long-term**: Build institutional knowledge for better campaign planning (compound returns)
- **Typical ROI**: 5-10x within first year through improved optimization

---

## Overview

GeoLift measures the true sales lift and ROI from your regional marketing campaigns using advanced causal inference methods. It follows a proven 3-step workflow to ensure reliable results.

## The 3-Step Workflow

### Step 1: Find a Fair Comparison (Donor Evaluation)

Before measuring impact, we need to identify control markets that behave similarly to your test markets.

#### Data Requirements

Your dataset should include:
- **Time Series Data**: At least 12-24 weeks of pre-campaign data
- **Geographic Units**: Markets, DMAs, states, or regions
- **Outcome Metrics**: Sales, conversions, or other KPIs
- **Treatment Assignment**: Which markets received the campaign

#### Running Donor Evaluation

```python
# Use the donor evaluator from recipes
import sys
import os
sys.path.append(os.path.join(os.path.dirname(__file__), 'recipes'))
from donor_evaluator import DonorEvaluator

evaluator = DonorEvaluator()
evaluator.load_data('campaign_data.csv')

# Find best control markets
donor_results = evaluator.evaluate_donors(
    treatment_markets=[502, 503, 504],
    pre_period_start='2023-01-01',
    pre_period_end='2023-05-31',
    outcome_column='sales'
)

# Review donor quality
print(donor_results.summary())
donor_results.plot_donor_map()
```

#### What to Look For
- **High Correlation**: Control markets should track closely with treatment markets pre-campaign
- **Similar Trends**: Parallel movement in the pre-period
- **Geographic Diversity**: Avoid clustering all controls in one region

### Step 2: Check if the Test is Strong Enough (Power Analysis)

Power analysis determines if your experiment can detect meaningful effects.

```python
from geolift.analyzer import PowerCalculator

power_calc = PowerCalculator()

# Calculate minimum detectable effect
power_results = power_calc.calculate_power(
    treatment_markets=[502, 503, 504],
    control_markets=donor_results.best_donors,
    baseline_data=your_data,
    campaign_duration_weeks=12,
    alpha=0.05,  # Significance level
    power=0.80   # Desired power
)

print(f"Minimum Detectable Effect: {power_results.mde:.1%}")
print(f"Recommended Campaign Duration: {power_results.min_duration} weeks")
```

#### Power Analysis Outputs
- **Minimum Detectable Effect (MDE)**: Smallest lift you can reliably detect
- **Recommended Duration**: How long to run the campaign for reliable results
- **Sample Size Requirements**: Number of markets needed

### Step 3: Measure the Lift (GeoLift Analysis)

Run the main causal inference analysis to measure campaign impact.

```python
from geolift.analyzer import GeoLiftAnalyzer

analyzer = GeoLiftAnalyzer()
analyzer.load_data('campaign_data.csv')

# Configure analysis
config = {
    'treatment_start_date': '2023-06-01',
    'treatment_end_date': '2023-08-31',
    'treatment_markets': [502, 503, 504],
    'control_markets': donor_results.best_donors,
    'outcome_column': 'sales',
    'inference_method': 'bootstrap',  # or 'placebo', 'jackknife'
    'confidence_level': 0.95
}

# Run analysis
results = analyzer.run_analysis(**config)
```

## Understanding Your Results

### Key Metrics Explained

#### Causal Impact
- **Absolute Lift**: Raw units of incremental impact
- **Relative Lift**: Percentage increase over baseline
- **Confidence Intervals**: Range of plausible effect sizes

#### Statistical Significance
- **P-value**: Probability results occurred by chance
- **Confidence Level**: How certain we are about the effect
- **Statistical Power**: Ability to detect true effects

#### Business Impact
- **Incremental ROI**: Return on marketing investment
- **Cost Per Incremental Unit**: Efficiency of campaign spend
- **Payback Period**: Time to recover campaign investment

### Interpreting Results

```python
# Print comprehensive summary
print(results.summary())

# Key business metrics
print(f"Campaign generated {results.absolute_lift:,.0f} incremental units")
print(f"Relative lift of {results.relative_lift:.1%}")
print(f"ROI of {results.roi:.1f}x")
print(f"P-value: {results.p_value:.3f}")
```

### Visual Diagnostics

Generate plots to validate your analysis:

```python
# Time series plot showing treatment vs synthetic control
results.plot_time_series()

# Pre/post comparison
results.plot_lift_analysis()

# Diagnostic plots for model validation
results.plot_diagnostics()

# Geographic visualization
results.plot_geo_map()
```

## Data Preparation Best Practices

### Data Quality Requirements
- **Completeness**: No missing values in key periods
- **Consistency**: Same measurement methodology throughout
- **Granularity**: Weekly or daily data preferred over monthly
- **Baseline Period**: At least 12 weeks of pre-campaign data

### Common Data Issues
- **Seasonality**: Account for holidays and seasonal patterns
- **External Events**: Note major market disruptions
- **Data Breaks**: Ensure consistent measurement methodology
- **Outliers**: Identify and handle extreme values appropriately

### Data Validation

```python
from geolift.data_handler import DataValidator

validator = DataValidator()
validation_report = validator.validate_dataset(
    data=your_data,
    required_columns=['date', 'geo', 'sales', 'treatment'],
    date_column='date',
    geo_column='geo'
)

print(validation_report.summary())
```

## Configuration Options

### Analysis Parameters

```yaml
# config.yaml
analysis:
  treatment_start_date: "2023-06-01"
  treatment_end_date: "2023-08-31"
  pre_period_weeks: 24
  outcome_column: "sales"
  
inference:
  method: "bootstrap"  # bootstrap, placebo, jackknife
  n_bootstrap: 1000
  confidence_level: 0.95
  
validation:
  min_pre_period_weeks: 12
  max_missing_data_pct: 0.05
  outlier_threshold: 3.0
```

### Advanced Options

```python
# Custom donor selection
analyzer.set_custom_donors(
    donor_markets=[501, 505, 506, 507],
    donor_weights=[0.4, 0.3, 0.2, 0.1]
)

# Multiple treatment cohorts
analyzer.analyze_multiple_cohorts(
    cohort_1={'markets': [502, 503], 'start_date': '2023-06-01'},
    cohort_2={'markets': [504, 505], 'start_date': '2023-07-01'}
)
```

## Reporting and Export

### Generate Business Reports

```python
# HTML report for stakeholders
results.export_html_report(
    filename='campaign_results.html',
    include_technical_details=False
)

# Detailed CSV export for analysis
results.export_csv_report('detailed_results.csv')

# Executive summary
results.export_executive_summary('exec_summary.pdf')
```

### Custom Reporting

```python
# Create custom summary
summary_data = {
    'campaign_name': 'Q3 Brand Campaign',
    'lift_estimate': results.absolute_lift,
    'lift_ci_lower': results.confidence_interval[0],
    'lift_ci_upper': results.confidence_interval[1],
    'p_value': results.p_value,
    'roi': results.roi,
    'campaign_cost': 50000,
    'incremental_revenue': results.incremental_revenue
}

# Export to your preferred format
import pandas as pd
pd.DataFrame([summary_data]).to_csv('campaign_summary.csv')
```

## Troubleshooting Common Issues

### Poor Pre-Period Fit
**Problem**: Synthetic control doesn't match treatment markets well before campaign
**Solutions**: 
- Extend pre-period data
- Remove outlier periods
- Try different donor selection criteria

### Low Statistical Power
**Problem**: Cannot detect meaningful effects
**Solutions**:
- Extend campaign duration
- Include more treatment markets
- Use more sensitive outcome metrics

### Implausible Results
**Problem**: Effect sizes seem too large or small
**Solutions**:
- Check data quality and definitions
- Validate treatment assignment
- Review external factors during campaign period

## Next Steps

- **Need technical details?** → See [API Reference](API_REFERENCE.md)
- **Want to understand the math?** → Check [Advanced Topics](ADVANCED_TOPICS.md)  
- **Having specific issues?** → Review [FAQ](FAQ.md)
- **Ready for production?** → See deployment guides in Advanced Topics