Spaces:

FatemehT
/

apec-segment

Sleeping

App Files Files Community

apec-segment / README copy.md

FatemehT

Configure Git LFS and remove binary files from direct git storage

76b0572 about 2 months ago

preview code

raw

history blame contribute delete

12.1 kB

	# tomoro-evals

	[![Static Badge](https://img.shields.io/badge/User%20Guide-Documentation-blue)](https://tomoro-ai.github.io/tomoro-evals/)


	# How to run the project
	## Create virtual environment and activate it
	```bash
	uv venv
	source .venv/bin/activate
	uv pip install -e .
	```

	## Database Setup (Required for ETL Pipeline)

	Before running the ETL pipeline (`main()` function) in the notebooks, you need to set up PostgreSQL:

	### Prerequisites
	1. Install PostgreSQL (if not already installed):
	```bash
	brew install postgresql@14
	```

	2. Start PostgreSQL service:
	```bash
	brew services start postgresql@14
	```

	3. Create database and user:
	```bash
	psql -d postgres -c "CREATE DATABASE cii;"
	psql -d postgres -c "CREATE USER app WITH PASSWORD 'password';"
	psql -d postgres -c "GRANT ALL PRIVILEGES ON DATABASE cii TO app;"
	```

	4. Create database tables:
	```bash
	psql -d cii -U app -h localhost -f cronjob/customer_transactions/schema.sql
	```

	5. Verify setup:
	```bash
	psql -d cii -U app -h localhost -c "\dt"
	```

	The ETL pipeline expects:
	- Database name: `cii`
	- Username: `app`
	- Password: `password`
	- Host: `localhost`
	- Port: `5432`

	These settings are configured in the `DSN` variable in the notebook.

	### Alternative: Run PostgreSQL with Docker (Recommended for Isolation)
	If you prefer not to install PostgreSQL locally, you can run it in a Docker container that auto-loads the schema.

	#### 1. Start a fresh container
	```bash
	docker rm -f pg-cii 2>/dev/null \|\| true
	docker volume rm pgdata 2>/dev/null \|\| true

	docker run -d \
	--name pg-cii \
	-e POSTGRES_USER=app \
	-e POSTGRES_PASSWORD=password \
	-e POSTGRES_DB=cii \
	-p 5432:5432 \
	-v pgdata:/var/lib/postgresql/data \
	-v $(pwd)/cronjob/customer_transactions/schema.sql:/docker-entrypoint-initdb.d/001-schema.sql:ro \
	postgres:16
	```
	The `schema.sql` file is executed only the first time the named volume `pgdata` is initialized.

	#### 2. Check container & logs
	```bash
	docker ps --filter name=pg-cii
	docker logs pg-cii \| tail -n 30
	```

	#### 3. Inspect tables
	```bash
	docker exec -it pg-cii psql -U app -d cii -c "\dt"
	```

	#### 4. Set DSN (current shell)
	```bash
	export CII_PG_DSN="dbname=cii user=app password=password host=localhost port=5432"
	```
	If using a notebook:
	```python
	import os
	os.environ["CII_PG_DSN"] = "dbname=cii user=app password=password host=localhost port=5432"
	```

	#### 5. Rebuild after changing `schema.sql`
	```bash
	docker rm -f pg-cii && docker volume rm pgdata && \
	docker run -d --name pg-cii \
	-e POSTGRES_USER=app -e POSTGRES_PASSWORD=password -e POSTGRES_DB=cii \
	-p 5432:5432 \
	-v pgdata:/var/lib/postgresql/data \
	-v $(pwd)/cronjob/customer_transactions/schema.sql:/docker-entrypoint-initdb.d/001-schema.sql:ro \
	postgres:16
	```

	#### 6. Stop / start later
	```bash
	docker stop pg-cii
	docker start pg-cii
	```

	#### 7. Apply schema manually (if needed on an existing container)
	```bash
	cat cronjob/customer_transactions/schema.sql \| docker exec -i pg-cii psql -U app -d cii
	```

	#### 8. Simple backup / restore
	```bash
	# Backup
	docker exec -t pg-cii pg_dump -U app -d cii > backup.sql
	# Restore (fresh volume)
	docker rm -f pg-cii && docker volume rm pgdata
	# start container again (see step 1, omit schema bind if restoring)
	cat backup.sql \| docker exec -i pg-cii psql -U app -d cii
	```

	### Using uv with Docker Postgres
	All Python commands can run inside the uv-managed environment while PostgreSQL runs in Docker.
	```bash
	uv sync # install dependencies
	export CII_PG_DSN="dbname=cii user=app password=password host=localhost port=5432"
	uv run python cronjob/customer_transactions/agent_run.py
	```
	Add a script alias in `pyproject.toml` (optional):
	```toml
	[tool.uv.scripts]
	agent = "python cronjob/customer_transactions/agent_run.py"
	```
	Then:
	```bash
	uv run agent
	```

	# Usage
	The commands below sassume an activated virtual environment. If you haven't activated your environment and you are using `uv`, you should prefix the commands with `uv run`.

	## Online Usage
	### run reranking evaluation for langfuse traces
	```bash
	uv run langfuse_trace_evaluation.py
	```

	## Offline Usage
	Evals Hub may be used offline, for development purposes, or as part of a CI/CD pipeline. You can use the `evals-hub` CLI tool to run benchmarks offline. The main entry point is the `run-benchmark` command.

	<details>
	<summary><b>View options for the evals-hub command</b></summary>

	```bash
	evals-hub --help
	```

	```
	Usage: evals-hub COMMAND

	╭─ Commands ─────────────────────────────────────────────────────────────────────────────────────────────╮
	│ run-benchmark │
	│ --help -h Display this message and exit. │
	│ --version Display application version. │
	╰────────────────────────────────────────────────────────────────────────────────────────────────────────╯
	╭─ Parameters ───────────────────────────────────────────────────────────────────────────────────────────╮
	│ * --config [required] │
	╰────────────────────────────────────────────────────────────────────────────────────────────────────────╯
	```
	</details>


	<details>
	<summary><b>View options for the evals-hub run-benchmark command</b></summary>

	```bash
	evals-hub run-benchmark --help
	```
	```
	Usage: evals-hub run-benchmark [ARGS] [OPTIONS]

	╭─ Parameters ──────────────────────────────────────────────────────────────────────────────────────────────────────────╮
	│ * TASK-NAME --task-name [choices: retrieval, reranking, classification, nli] [required] │
	│ * DATASET.NAME --dataset.name [required] │
	│ DATASET.SPLIT --dataset.split │
	│ DATASET.HF-SUBSET --dataset.hf-subset │
	│ * MODEL.CHECKPOINT --model.checkpoint [required] │
	│ METRICS.MAP --metrics.map Identifier for MAP metric │
	│ METRICS.MRR --metrics.mrr Identifier for MRR metric │
	│ METRICS.NDCG --metrics.ndcg Identifier for NDCG metric │
	│ METRICS.RECALL --metrics.recall Identifier for Recall metric │
	│ METRICS.PRECISION --metrics.precision Identifier for Precision metric │
	│ METRICS.MICRO-AVG-F1 --metrics.micro-avg-f1 Identifier for micro average F1 metric │
	│ METRICS.MACRO-AVG-F1 --metrics.macro-avg-f1 Identifier for macro average F1 metric │
	│ METRICS.ACCURACY --metrics.accuracy Identifier for accuracy metric │
	│ EVALUATION.TOP-K --evaluation.top-k [default: 10] │
	│ EVALUATION.BATCH-SIZE [default: 16] │
	│ --evaluation.batch-size │
	│ EVALUATION.SEED --evaluation.seed [default: 42] │
	│ EVALUATION.MAX-LENGTH │
	│ --evaluation.max-length │
	│ EVALUATION.SAMPLES-PER-LABEL │
	│ --evaluation.samples-per-label │
	│ EVALUATION.N-EXPERIMENTS [default: 10] │
	│ --evaluation.n-experiments │
	│ * OUTPUT.RESULTS-FILE --output.results-file [required] │
	╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
	```
	</details>

	\
	Benchmarks can be run in a couple of different ways:
	- options defined in a YAML config file
	- options directly from the command line
	- options defined in a YAML config file which are overridden by the command line

	Benchmark configured entirely from a YAML file

	## run reranking
	```bash
	evals-hub run-benchmark --config reranking_config.yaml
	```
	## run nli
	```bash
	evals-hub run-benchmark --config nli_config.yaml
	```

	## run classification
	```bash
	evals-hub run-benchmark --config classification_config.yaml
	```

	## run patent landscape evaluation
	```bash
	evals-hub run-benchmark --config pl_eval_config.yaml
	```
	## Troubleshooting SSL errors
	## SSL errors when connecting to huggingface dataset
	Set environment variable for python library `requests` in `.env`
	```python
	REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt
	```
	SSL certificates may need to be imported if you have not done before.

	# Development Setup
	## Install the git hook scripts
	```bash
	pre-commit install
	```
	## Run tests
	```bash
	uv run pytest -v
	```

	## SQL Code Quality
	### Lint all SQL files in a directory:
	```bash
	uv run sqlfluff lint --dialect postgres cronjob/
	```

	### Format/fix SQL files:
	```bash
	uv run sqlfluff format --dialect postgres cronjob/
	```

	## Serve documentation locally
	```bash
	uv run mkdocs serve -f docs/mkdocs.yml
	```
	Then open up http://127.0.0.1:8000/ in your browser

	## Refresh & upgrade the lockfile
	```bash
	uv sync --upgrade
	```

	### Integration tests
	By default, integration tests are ignored in the pytest configuration because evaluation runs take a long time and require GPU resources. However, it is sometimes useful to run the evaluation to verify that results are correct against public benchmarks.
	And here is the command:
	```bash
	uv run pytest tests/integration
	```


	## To run pre-commit hooks locally
	```bash
	source .venv/bin/activate
	pre-commit run --all-files
	```

	## Show outdated package
	```bash
	uv tree --outdated --depth 1
	```