Skip to content

Improve OptiSim selection method, documentation, and examples #292

Description

@mademind122333-jp

Description:

  1. Performance / Efficiency Issue

The current OptiSim selection method becomes very slow for large datasets during the setup phase because finding the optimal radius takes a lot of time.

Steps to reproduce:

Load a large dataset (e.g., 10,000+ points).
Call optisim_selection() or get_initial_selection().

Expected behavior: Selection completes quickly.
Actual behavior: Setup phase takes very long.

Possible solutions:

Use binary search for radius determination.
Use KDTree or BallTree for distance checks.
Use approximate or sampling-based methods for very large datasets.
2) Documentation Issue

README currently describes installation and citation, but lacks a clear example for using OptiSim method and its parameters.

Suggested solution:

Add a usage example with a code snippet showing optisim_selection() or get_initial_selection().

Proposed code snippet:

Example: OptiSim selection

from selector import optisim_selection
import numpy as np

Example dataset: 1000 points, each with 5 features

points = np.random.rand(1000, 5)

Select 50 diverse points using OptiSim

selected_points = optisim_selection(points, n_select=50)

print("Number of selected points:", len(selected_points))

Explanation:

points: Your dataset
n_select: Number of diverse points to select
optisim_selection(): OptiSim method call
Output: Selected points list and count
3) Testing Issue

Current tests do not cover large datasets to verify OptiSim performance and correctness.

Suggested solution:

Add unit tests for datasets of 5,000+ points to check runtime and correctness.
4)Web Server / Feature Issue

The web server allows selecting a diverse subset but does not provide options to tune OptiSim radius or optimization parameters.

Suggested solution:

Add an interactive slider or input for radius control so users can customize selection.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions