SeFMol: An efficient structure-based molecular generation tool

1. What SeFMol Does

SeFMol is designed for structure-based drug design, where the goal is to generate 3D ligand molecules directly inside a protein binding pocket. Unlike methods that treat ligands as rigid during training, SeFMol is inspired by semi-flexible docking and allows molecular conformations to be adjusted during the denoising process, making the generated candidates more compatible with pocket geometry and interaction patterns.

What makes SeFMol different

Generates 3D molecules conditioned on a protein pocket.
Uses reinforcement learning to steer semi-flexible conformational optimization.
Supports molecular property guidance such as QED, SA, LogP, TPSA, HBA, HBD, Fsp3, and ROTB.

What users can expect

Candidate molecules with strong docking-related performance.
Better control over drug-like physicochemical properties.
Fast sampling suitable for practical lead exploration and early candidate triage.

Research-use statement: generated molecules are computational hypotheses. They should be further reviewed with docking validation, medicinal chemistry assessment, ADMET analysis, and experimental testing.

2. Method Core

Two-stage rigid training

Pretraining: 1,000,000 target-free molecules from Molecule3D.
Fine-tuning: 100,000 protein-ligand pairs from CrossDocked2020.
Property guidance uses 8 RDKit-calculated properties: QED, SA, LogP, TPSA, HBA, HBD, Fsp3, and ROTB.

SFRL: semi-flexible RL optimization

The denoising process is formulated as a Markov Decision Process (MDP).
A policy denoiser optimizes molecular states step by step inside the target pocket.
KL regularization constrains policy drift from the pretrained denoiser.
PPO-style clipping and a value function are used to stabilize optimization.

Default property vector in the paper: [QED, SA, LogP, TPSA, HBA, HBD, Fsp3, ROTB] = [1.0, 1.0, 1.0, 50.0, 3.0, 2.0, 0.5, 2.0]. SeFMol also uses a fast sampling strategy that reduces denoising steps from 1000 to 50, giving about 20× acceleration.

3. User Workflow

Figure. Overview of SeFMol: a reinforcement-learning-steered semi-flexible diffusion model for pocket-conditioned molecular generation.

Step 1

Upload Target Structure

Provide the target protein structure and make sure the binding pocket is meaningful and complete.

Step 2

Set Property Guidance

Choose the sample number and optionally specify target molecular properties for generation.

Step 3

Run Generation

Launch SeFMol inference to generate 3D molecules under pocket and property constraints.

Step 4

Inspect and Prioritize

Review structures, docking-related scores, and property values to identify promising candidates.

4. Input Requirements

Required input

Protein structure / pocket information used as the spatial condition for generation.
Use a clean and chemically meaningful structure whenever possible.
Binding-site geometry should be relevant to the design objective.

Optional guidance

Property targets for QED, SA, LogP, TPSA, HBA, HBD, Fsp3, and ROTB.
Sample count to control how many candidate molecules are generated.
If you are unsure where to start, use the default property vector from the paper.

In the reported experiments, SeFMol sampled 100 molecules per protein pocket. For practical use, a smaller batch is suitable for quick testing, while larger batches are better for broader candidate exploration.

5. Parameter Definitions and Practical Ranges

Parameter	Meaning	Reference Value / Range	Interpretation Guidance
QED	Quantitative estimate of drug-likeness	default 1.0; SR criterion > 0.25	Higher values usually indicate a more drug-like overall profile.
SA	Synthetic accessibility	default 1.0; SR criterion > 0.59	Useful for checking whether generated molecules remain practically synthesizable.
LogP	Hydrophobicity balance	default 1.0; commonly -0.4 to 5.6	High values may help permeability but can reduce solubility.
TPSA	Topological polar surface area	default 50.0; often < 90, SR criterion ≤ 140	Important for polarity, membrane transport, and exposure behavior.
HBA	Hydrogen-bond acceptors	default 3.0; usually ≤ 10	Helps tune intermolecular interaction patterns and polarity.
HBD	Hydrogen-bond donors	default 2.0; usually ≤ 5	Useful when balancing binding interactions and developability.
FSP3	3D saturation level	default 0.5; typically > 0.47, SR criterion ≥ 0.42	Higher values often improve 3D character and scaffold richness.
ROTB	Rotatable bonds	default 2.0; usually ≤ 10	Lower values often help conformational stability.
num_samples	Number of generated molecules	paper evaluation: 100 per pocket	More samples improve coverage, but also increase screening workload.

Default condition vector: [QED, SA, LogP, TPSA, HBA, HBD, Fsp3, ROTB] = [1.0, 1.0, 1.0, 50.0, 3.0, 2.0, 0.5, 2.0]. This is a good baseline setting for first-time users.

In the paper, Success Rate (SR) is defined as the proportion of molecules satisfying nine joint constraints: Vina Dock < -8.18, QED > 0.25, SA > 0.59, -0.4 ≤ LogP ≤ 5.6, TPSA ≤ 140, FSP3 ≥ 0.42, HBA ≤ 10, HBD ≤ 5, and ROTB ≤ 10.

6. Reported Performance

Avg. Vina Score

-7.23

Success Rate (SR)

11.53%

Sampling Time

0.81 s

Completion

98.3%

Additional reported indicators	Value	Notes
Fast sampling	1000 → 50 steps	About 20× acceleration during sampling.
Test scale	100 protein pockets	Used in the benchmark evaluation.
Sampling per pocket	100 molecules	Used for model comparison.
Interaction-pattern JSD	0.1401	Best reported value, tied with TargetDiff.
Case studies	CDK2 / ROCK1	SeFMol reproduced known interactions and explored new ones.
Generalization	AlphaFold structures	Also showed favorable Vina score distributions on predicted proteins.

These are paper-reported results intended to describe the method’s performance. Actual web runs may vary across targets, pocket quality, and parameter settings.

The paper also notes a trade-off: SeFMol improves affinity and property control, but may sacrifice some diversity compared with more exploratory settings.

7. Recommended Practice

Use biologically meaningful and structurally clean binding pockets.
Start with the default property vector, then adjust one or two properties at a time.
Do not screen candidates by Vina score alone; combine affinity, QED, SA, TPSA, and Fsp3.
Cluster generated molecules before detailed review to reduce redundant chemotypes.
Pair SeFMol with downstream docking, ADMET, and synthesis-feasibility tools for better triage.
Keep records of inputs, parameters, and output files for reproducibility.

8. Reference

Xudong Zhang, Sanqing Qu, Fan Lu, Jianmin Wang, Zhixin Tian, Shangding Gu, Yanping Zhang, Alois Knoll, Shaorong Gao, Guang Chen, Changjun Jiang, Steering Semi-Flexible Molecular Diffusion Model for Structure-Based Drug Design with Reinforcement Learning.

Code and resources: https://github.com/ispc-lab/SeFMol

SeFMol User Manual and Method Guide

What makes SeFMol different

What users can expect

Two-stage rigid training

SFRL: semi-flexible RL optimization

Upload Target Structure

Set Property Guidance

Run Generation

Inspect and Prioritize

Required input

Optional guidance