This project provides Python scripts to import and map the Vaccine Ontology (VO) into a Neo4j graph database. It leverages the Neosemantics (n10s) library for RDF import and performs subsequent mapping to align the ontology with domain-specific nodes and relationships.
Before you begin, ensure you have the following installed:
- Python 3.6 or higher: Required to run the Python scripts.
- Neo4j Graph Database: You need a running instance of Neo4j.
- Neo4j Python Driver: This project uses the official Neo4j Python driver. You can install it using pip:
pip install neo4j
- dotenv: For managing environment variables. Install using pip:
pip install python-dotenv
- Neosemantics (n10s): This Neo4j extension is used for importing the OWL ontology.
- Clone the repository (if you have the code in a repository):
git clone <repository_url> cd <repository_directory>
This project uses a .env file to store sensitive information like your Neo4j connection URI, username, and password.
- Create a
.envfile in the root directory of your project. - Add your Neo4j connection details to the
.envfile. Replace the placeholders with your actual credentials:URI=bolt://localhost:7687 # Replace with your Neo4j URI USERNAME=neo4j # Replace with your Neo4j username PASSWORD=your_password # Replace with your Neo4j password
To enable the import of the OWL ontology, you need to configure Neosemantics in your Neo4j instance. Follow these steps:
-
Download Neosemantics: Download the latest stable release JAR file of Neosemantics from the official GitHub releases page: https://github.com/neo4j-labs/neosemantics/releases. Look for a file named something like
neosemantics-{version}.jar. -
Place the JAR file in the
pluginsdirectory: Locate your Neo4j installation directory. Inside it, you will find apluginsdirectory. Copy the downloaded Neosemantics JAR file into this directory. -
Configure
neo4j.conf: Open theneo4j.conffile located in theconfdirectory of your Neo4j installation. -
Add the following lines to the
neo4j.conffile:- Enable unmanaged extensions for Neosemantics:
dbms.unmanaged_extension_classes=n10s.extension=/rdf - Set the import directory: This allows Neo4j to access files in the specified import directory. While this project imports from a remote URL, it's a good practice to configure it.
Note: Ensure that the
dbms.directories.import=importimportdirectory exists within your Neo4j installation directory. You might need to create it if it doesn't exist.
- Enable unmanaged extensions for Neosemantics:
-
Restart Neo4j: After making these changes, you need to restart your Neo4j server for the configuration to take effect.
-
Activate your Python environment (if applicable): If you are working within a virtual environment, make sure to activate it. For example, if you used
venv:source neo4j-env/Scripts/activate # On Windows source neo4j-env/bin/activate # On macOS and Linux
(This step is also mentioned in the code comments).
-
Run the main script (
__main__.pyif you structure your project that way, or directly run the provided script): Execute the 2 Python scripts to start the import and mapping process.python import_to_neo4j.py
and then
python ontology_mapping.py
The script will perform the following actions:
- Import Ontology: Downloads the Vaccine Ontology (VO) from the specified GitHub URL and imports it into Neo4j using Neosemantics.
- Map Ontology: Maps the imported ontology nodes to domain-specific nodes (like
VaccineandPathogen) and createsVO_REPRESENTATIONrelationships. - Update Resource Properties: Transforms complex IAO and UBPROP codes on
Resourcenodes into more human-readable properties.
You can observe the progress and any potential errors in the console output.
The provided Python code contains the following key functions:
import_ontology_complete(driver): Imports the complete Vaccine Ontology (VO) from a remote GitHub URL into Neo4j using Neosemantics. It handles constraint creation, graph configuration, and error handling.map_ontology(driver): Orchestrates the mapping of imported ontology nodes to domain-specific nodes (Vaccine,Pathogen, etc.) by calling individual mapping functions.map_vaccine_nodes_gemini_way(session): MapsVaccinenodes using theirc_vo_idproperty and also links them to the generalVO_0000001Resourcenode.update_resource_properties(session): Updates the properties ofResourcenodes by converting IAO and UBPROP codes to more readable names.map_vaccine_nodes(session): MapsVaccinenodes to their corresponding VOResourcerepresentations, remapping properties and creatingVO_REPRESENTATIONrelationships.map_pathogen_nodes(session): MapsPathogennodes to their corresponding TaxonomyResourcerepresentations based on theirc_taxon_id.map_relationships_vo_aligned(session, relationship_queries): Executes Cypher queries to create relationships aligned with VO concepts (this function is defined but not actively used with specific queries in the provided code).import_data(): The main function that establishes the Neo4j connection, calls the ontology import and mapping functions, and handles overall execution.execute_queries(driver, queries): A helper function to execute a dictionary of Cypher queries (used in theimport_data()function defined at the end of the script, which seems to be a different version focusing on CSV import - the main execution calls the earlierimport_data()function).
The script loads Neo4j connection details from a .env file for security and ease of configuration.
As mentioned in the code comments, if you are working in a Python virtual environment, you might need to activate it before running the script. The comment provides examples for both Windows and macOS/Linux.
Contributions to this project are welcome. If you find any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request.