Description:
- Performance / Efficiency Issue
The current OptiSim selection method becomes very slow for large datasets during the setup phase because finding the optimal radius takes a lot of time.
Steps to reproduce:
Load a large dataset (e.g., 10,000+ points).
Call optisim_selection() or get_initial_selection().
Expected behavior: Selection completes quickly.
Actual behavior: Setup phase takes very long.
Possible solutions:
Use binary search for radius determination.
Use KDTree or BallTree for distance checks.
Use approximate or sampling-based methods for very large datasets.
2) Documentation Issue
README currently describes installation and citation, but lacks a clear example for using OptiSim method and its parameters.
Suggested solution:
Add a usage example with a code snippet showing optisim_selection() or get_initial_selection().
Proposed code snippet:
Example: OptiSim selection
from selector import optisim_selection
import numpy as np
Example dataset: 1000 points, each with 5 features
points = np.random.rand(1000, 5)
Select 50 diverse points using OptiSim
selected_points = optisim_selection(points, n_select=50)
print("Number of selected points:", len(selected_points))
Explanation:
points: Your dataset
n_select: Number of diverse points to select
optisim_selection(): OptiSim method call
Output: Selected points list and count
3) Testing Issue
Current tests do not cover large datasets to verify OptiSim performance and correctness.
Suggested solution:
Add unit tests for datasets of 5,000+ points to check runtime and correctness.
4)Web Server / Feature Issue
The web server allows selecting a diverse subset but does not provide options to tune OptiSim radius or optimization parameters.
Suggested solution:
Add an interactive slider or input for radius control so users can customize selection.
Description:
The current OptiSim selection method becomes very slow for large datasets during the setup phase because finding the optimal radius takes a lot of time.
Steps to reproduce:
Load a large dataset (e.g., 10,000+ points).
Call optisim_selection() or get_initial_selection().
Expected behavior: Selection completes quickly.
Actual behavior: Setup phase takes very long.
Possible solutions:
Use binary search for radius determination.
Use KDTree or BallTree for distance checks.
Use approximate or sampling-based methods for very large datasets.
2) Documentation Issue
README currently describes installation and citation, but lacks a clear example for using OptiSim method and its parameters.
Suggested solution:
Add a usage example with a code snippet showing optisim_selection() or get_initial_selection().
Proposed code snippet:
Example: OptiSim selection
from selector import optisim_selection
import numpy as np
Example dataset: 1000 points, each with 5 features
points = np.random.rand(1000, 5)
Select 50 diverse points using OptiSim
selected_points = optisim_selection(points, n_select=50)
print("Number of selected points:", len(selected_points))
Explanation:
points: Your dataset
n_select: Number of diverse points to select
optisim_selection(): OptiSim method call
Output: Selected points list and count
3) Testing Issue
Current tests do not cover large datasets to verify OptiSim performance and correctness.
Suggested solution:
Add unit tests for datasets of 5,000+ points to check runtime and correctness.
4)Web Server / Feature Issue
The web server allows selecting a diverse subset but does not provide options to tune OptiSim radius or optimization parameters.
Suggested solution:
Add an interactive slider or input for radius control so users can customize selection.