# API Reference Technical reference for GeoLift classes and functions. ## Core Classes ### GeoLiftAnalyzer Main class for running causal inference analyses. ```python from geolift.analyzer import GeoLiftAnalyzer analyzer = GeoLiftAnalyzer() ``` #### Methods ##### `load_data(filepath, **kwargs)` Load dataset for analysis. **Parameters:** - `filepath` (str): Path to CSV file - `date_column` (str, optional): Name of date column. Default: 'date' - `geo_column` (str, optional): Name of geographic unit column. Default: 'geo' - `outcome_column` (str, optional): Name of outcome variable. Default: 'sales' **Returns:** None **Example:** ```python analyzer.load_data('data.csv', date_column='week', outcome_column='revenue') ``` ##### `run_analysis(**config)` Execute the complete GeoLift analysis. **Parameters:** - `treatment_start_date` (str): Start date of treatment (YYYY-MM-DD) - `treatment_end_date` (str): End date of treatment (YYYY-MM-DD) - `treatment_markets` (list): List of treated market IDs - `control_markets` (list, optional): List of control market IDs - `outcome_column` (str): Name of outcome variable - `inference_method` (str): 'bootstrap', 'placebo', or 'jackknife' - `confidence_level` (float): Confidence level (0.0-1.0). Default: 0.95 **Returns:** `GeoLiftResults` object **Example:** ```python results = analyzer.run_analysis( treatment_start_date='2023-06-01', treatment_end_date='2023-08-31', treatment_markets=[502, 503], outcome_column='sales', inference_method='bootstrap' ) ``` ### DonorEvaluator Class for identifying optimal control markets. ```python # The donor evaluator is available as a standalone script in recipes/ import sys import os sys.path.append(os.path.join(os.path.dirname(__file__), 'recipes')) from donor_evaluator import DonorEvaluator evaluator = DonorEvaluator() ``` #### Methods ##### `evaluate_donors(treatment_markets, pre_period_start, pre_period_end, **kwargs)` Find best control markets for treatment units. **Parameters:** - `treatment_markets` (list): List of treatment market IDs - `pre_period_start` (str): Start of pre-treatment period (YYYY-MM-DD) - `pre_period_end` (str): End of pre-treatment period (YYYY-MM-DD) - `outcome_column` (str): Name of outcome variable - `min_correlation` (float, optional): Minimum correlation threshold. Default: 0.7 - `max_donors` (int, optional): Maximum number of donors. Default: 10 **Returns:** `DonorResults` object ### PowerCalculator Class for power analysis and experimental design. ```python from geolift.analyzer import PowerCalculator power_calc = PowerCalculator() ``` #### Methods ##### `calculate_power(treatment_markets, control_markets, baseline_data, **kwargs)` Calculate statistical power and minimum detectable effect. **Parameters:** - `treatment_markets` (list): List of treatment market IDs - `control_markets` (list): List of control market IDs - `baseline_data` (DataFrame): Historical data for power calculation - `campaign_duration_weeks` (int): Planned campaign duration - `alpha` (float, optional): Significance level. Default: 0.05 - `power` (float, optional): Desired statistical power. Default: 0.80 **Returns:** `PowerResults` object ## Result Classes ### GeoLiftResults Contains analysis results and methods for interpretation. #### Attributes - `absolute_lift` (float): Absolute treatment effect - `relative_lift` (float): Relative treatment effect (percentage) - `confidence_interval` (tuple): Lower and upper confidence bounds - `p_value` (float): Statistical significance p-value - `roi` (float): Return on investment - `incremental_revenue` (float): Total incremental revenue #### Methods ##### `summary()` Print comprehensive results summary. ##### `plot_time_series()` Generate time series plot showing treatment vs synthetic control. ##### `plot_lift_analysis()` Create pre/post treatment comparison plots. ##### `export_html_report(filename, **kwargs)` Export results to HTML report. **Parameters:** - `filename` (str): Output filename - `include_technical_details` (bool, optional): Include statistical details. Default: True ### DonorResults Contains donor evaluation results. #### Attributes - `best_donors` (list): List of optimal control market IDs - `donor_weights` (dict): Weights for each donor market - `correlation_matrix` (DataFrame): Correlation between treatment and potential donors #### Methods ##### `plot_donor_map()` Visualize donor markets on geographic map. ##### `summary()` Print donor evaluation summary. ## Utility Functions ### Data Validation ```python from geolift.data_handler import DataValidator validator = DataValidator() report = validator.validate_dataset(data, required_columns, date_column, geo_column) ``` ### Configuration Management ```python from geolift.config_manager import ConfigManager config = ConfigManager() config.load_from_yaml('config.yaml') settings = config.get_analysis_config() ``` ## CLI Commands ### Basic Analysis ```bash # Run complete analysis python -m geolift analyze \ --data data.csv \ --treatment-start 2023-06-01 \ --treatment-markets 502,503 \ --output results/ ``` ### Power Analysis ```bash # Calculate power python -m geolift power \ --data data.csv \ --treatment-markets 502,503 \ --duration 12 \ --mde 0.05 ``` ### Donor Evaluation ```bash # Find optimal donors python -m geolift donors \ --data data.csv \ --treatment-markets 502,503 \ --pre-start 2023-01-01 \ --pre-end 2023-05-31 ``` ## Configuration Schema ### YAML Configuration ```yaml # Complete configuration example data: filepath: "data/campaign_data.csv" date_column: "date" geo_column: "geo_id" outcome_column: "sales" treatment: start_date: "2023-06-01" end_date: "2023-08-31" markets: [502, 503, 504] analysis: pre_period_weeks: 24 inference_method: "bootstrap" confidence_level: 0.95 n_bootstrap: 1000 donors: auto_select: true min_correlation: 0.7 max_donors: 10 exclude_markets: [501, 599] output: directory: "results/" formats: ["html", "csv"] include_plots: true ``` ## Error Handling ### Common Exceptions ```python from geolift.exceptions import ( InsufficientDataError, InvalidConfigurationError, AnalysisError ) try: results = analyzer.run_analysis(**config) except InsufficientDataError as e: print(f"Not enough data: {e}") except InvalidConfigurationError as e: print(f"Configuration error: {e}") except AnalysisError as e: print(f"Analysis failed: {e}") ``` ## Advanced Usage ### Custom Synthetic Control ```python from sparsesc import fit, estimate_effects # Direct SparseSC usage for advanced users sc_results = fit( features=X, targets=Y, treated_units=treated_units, control_units=control_units ) effects = estimate_effects( sc_results, post_treatment_data=Y_post ) ``` ### Batch Processing ```python # Process multiple campaigns campaigns = [ {'name': 'Q1_Campaign', 'config': config1}, {'name': 'Q2_Campaign', 'config': config2} ] results = {} for campaign in campaigns: analyzer = GeoLiftAnalyzer() analyzer.load_data(campaign['config']['data_path']) results[campaign['name']] = analyzer.run_analysis(**campaign['config']) ``` ### Custom Inference Methods ```python # Implement custom inference class CustomInference: def __init__(self, method='custom'): self.method = method def calculate_pvalues(self, effects, null_distribution): # Custom p-value calculation return p_values # Use with analyzer analyzer.set_inference_method(CustomInference()) ``` For more examples and advanced usage patterns, see [Advanced Topics](ADVANCED_TOPICS.md).