AI & ML interests

AI4Science

Recent Activity

cgeorgiaw  updated a Space 12 days ago
LeMaterial/LeMat-GenBench
cgeorgiaw  published a Space 12 days ago
LeMaterial/LeMat-GenBench
thomwolf  authored a paper about 2 months ago
Robot Learning: A Tutorial
View all activity

cgeorgiaw 
posted an update 18 days ago
cgeorgiaw 
posted an update 3 months ago
view post
Post
5948
🚀🚀🚀 The largest ever dataset of co-folded 3D protein-ligand structures just dropped on HF!!

Meet SAIR (Structurally Augmented IC₅₀ Repository): 5M+ AI-generated complexes with experimentally measured drug potency data from SandboxAQ. 🚀🚀🚀

Check it out and explore here: SandboxAQ/SAIR

·
cgeorgiaw 
posted an update 4 months ago
cgeorgiaw 
posted an update 6 months ago
cgeorgiaw 
posted an update 6 months ago
view post
Post
1608
Snooping on HF is the best because sometimes you just discover that someone (in this case, Earth Species Project) is about to drop terabytes of sick (high quality animal sounds) data...

EarthSpeciesProject/NatureLM-audio-training
cgeorgiaw 
posted an update 6 months ago
view post
Post
520
Just dropped two bigger physics datasets (both on photonics)!

NUMBA 1: SIB-CL
This dataset of Surrogate- and Invariance-Boosted Contrastive Learning (SIB-CL) datasets for two scientific problems:
- PhC2D: 2D photonic crystal density-of-states (DOS) and bandstructure data.
- TISE: 3D time-independent Schrödinger equation eigenvalue and eigenvector solutions.

NUMBA2: 2D Photonic Topology
Symmetry-driven analysis of 2D photonic crystals: 10k random unit cells across 11 symmetries, 2 polarizations, 5 contrasts. Includes time-reversal breaking cases for 4 symmetries at high contrast.

Check them out: cgeorgiaw/sib-cl & cgeorgiaw/2d-photonic-topology
clefourrier 
posted an update 7 months ago
view post
Post
1998
Always surprised that so few people actually read the FineTasks blog, on
✨how to select training evals with the highest signal✨

If you're serious about training models without wasting compute on shitty runs, you absolutely should read it!!

An high signal eval actually tells you precisely, during training, how wel & what your model is learning, allowing you to discard the bad runs/bad samplings/...!

The blog covers in depth prompt choice, metrics, dataset, across languages/capabilities, and my fave section is "which properties should evals have"👌
(to know on your use case how to select the best evals for you)

Blog: HuggingFaceFW/blogpost-fine-tasks
  • 2 replies
·