TimeCopilot fev
Experiments
This project demonstrates the evaluation of a foundation model ensemble built using the TimeCopilot library on the fev benchmark.
TimeCopilot is an open‑source AI agent for time series forecasting that provides a unified interface to multiple forecasting approaches, from foundation models to classical statistical, machine learning, and deep learning methods, along with built‑in ensemble capabilities for robust and explainable forecasting.
Model Description
This ensemble leverages TimeCopilot's MedianEnsemble feature, which combines two state-of-the-art foundation models:
Setup
Prerequisites
- Python 3.11+
- uv package manager
- AWS CLI configured (for distributed evaluation)
- Modal account (for distributed evaluation)
Installation
# Install dependencies
uv sync
Evaluation Methods
1. Local Evaluation
Run evaluation sequentially (locally):
uv run -m src.evaluate_model --num-tasks 2
Remove --num-tasks
parameter to run on all tasks. Results are saved to timecopilot.csv
in fev
format.
2. Distributed Evaluation (Recommended)
2.1 Evaluate ensemble
Evaluate all dataset configurations in parallel using modal:
# Run distributed evaluation on Modal cloud
uv run modal run --detach -m src.evaluate_model_modal
This creates one GPU job per dataset configuration, significantly reducing evaluation time.
Infrastructure:
- GPU: A10G per job
- CPU: 8 cores per job
- Timeout: 3 hours per job
- Storage: S3 bucket for data and results
2.2 Collect Results
Download and consolidate results from distributed evaluation:
# Download all results from S3 and create consolidated CSV
uv run python -m src.download_results
Results are saved to timecopilot.csv
in fev
format.