TimeCopilot `fev` Experiments

This project demonstrates the evaluation of a foundation model ensemble built using the TimeCopilot library on the fev benchmark.

TimeCopilot is an open‑source AI agent for time series forecasting that provides a unified interface to multiple forecasting approaches, from foundation models to classical statistical, machine learning, and deep learning methods, along with built‑in ensemble capabilities for robust and explainable forecasting.

Model Description

This ensemble leverages TimeCopilot's MedianEnsemble feature, which combines two state-of-the-art foundation models:

Setup

Prerequisites

Python 3.11+
uv package manager
AWS CLI configured (for distributed evaluation)
Modal account (for distributed evaluation)

Installation

# Install dependencies
uv sync

Evaluation Methods

1. Local Evaluation

Run evaluation sequentially (locally):

uv run -m src.evaluate_model --num-tasks 2

Remove --num-tasks parameter to run on all tasks. Results are saved to timecopilot.csv in fev format.

2. Distributed Evaluation (Recommended)

2.1 Evaluate ensemble

Evaluate all dataset configurations in parallel using modal:

# Run distributed evaluation on Modal cloud
uv run modal run --detach -m src.evaluate_model_modal

This creates one GPU job per dataset configuration, significantly reducing evaluation time.

Infrastructure: - GPU: A10G per job - CPU: 8 cores per job
- Timeout: 3 hours per job - Storage: S3 bucket for data and results

2.2 Collect Results

Download and consolidate results from distributed evaluation:

# Download all results from S3 and create consolidated CSV
uv run python -m src.download_results

Results are saved to timecopilot.csv in fev format.

TimeCopilot fev Experiments