Overview
This course uses DDI network clustering as a case to teach general artificial intelligence (AI) skills. However, it is not intended to be a specialized bioinformatics course. Instead, bioinformatics serves as a realistic and data-rich setting for introducing core AI concepts and problem-solving strategies.
Within this framework, students learn how to apply AI methods to real-world problems, with a particular focus on the effective use of large language models (LLMs). LLMs are used to support tasks such as data analysis, code generation, result interpretation, and iterative problem solving. Through this process, students develop practical AI skills that are transferable across disciplines, rather than domain-specific expertise in bioinformatics.
Learning Goals
- Understand basic AI concepts and responsible AI use.
- Develop introductory programming and data analysis skills.
- Apply machine learning algorithms to a realistic scientific problem.
- Use LLMs for programming support, debugging, and reflection.
- Critically evaluate AI-generated results instead of accepting them directly.
Course Structure
Foundations of AI
Introduction to AI concepts, AI history, major application scenarios, and ethical issues in AI-assisted learning.
Computational Skills
Introduction to Python, common scientific libraries, and AI-assisted programming tools. Students learn how to combine coding practice with guided use of LLMs.
Project-Based Learning
Students complete four connected assignments centered on DDI network clustering and compare traditional computational workflows with LLM-assisted approaches.
Assignments
Clustering DDI Networks Using Drug Information
Required Tasks
- Compute Drug Similarity from Multiple Perspectives (i.e., Structural similarity, Pharmacological similarity, Phenotypic similarity,Therapeutic similarity)
- Apply spectral clustering to group 18 antibiotics into six clusters based on different similarity measures.
- Evaluate clustering quality using Edge Purity and visualize the clustering results with Cytoscape.
Optional Tasks
- Explore different numbers of clusters to determine the optimal clustering number.
- Try different clustering algorithms (e.g., K-means, hierarchical clustering)
- Explore additional drug similarity features through literature review or interaction with LLMs.
Integrating Multi-Source Drug Information
Required Tasks
- Integrate the clustering results from Assignment 1 using a consensus clustering algorithm.
- Integrate the four similarity matrices from Assignment 1 using a similarity matrix fusion method, and perform clustering based on the fused similarity matrix.
- Evaluate the performance of the two ensemble clustering approaches using edge purity.
Optional Tasks
- Conduct an ablation study to evaluate how removing one type of drug information (e.g., chemical structure) affects clustering performance, thereby identifying the most important drug feature or feature combination.
- Assign three new antibiotics (kanamycin, penicillin G, and roxithromycin) to the clustered DDI network, and predict their interactions with the 18 antibiotics included in the clustering analysis.
Network-Based Similarity
Required Tasks
- Compute drug similarity based on DDI information.
- Analyze the correlation between network-based similarity and the four similarity measures introduced in Assignment 1.
- Explore the characteristics of DDI network clustering and discuss its potential applications in predicting the mechanisms of action of compounds.
Optional Tasks
- Apply the clustering integration methods introduced in Assignment 2 to integrate multi-species DDI information and obtain more robust clustering results.
LLM-Assisted DDI Network Clustering
Required Tasks
- Use LLMs (e.g., ChatGPT, DeepSeek) to perform clustering based on chemical structure, mechanism of action, bacterial growth curves, and ATC codes.
- Multi-source integration for clustering.
- Network-based clustering using DDI topology.
- Compare LLM performance with traditional algorithms.
- Integrate clustering results from multiple LLMs using ensemble clustering algorithms.
- Use retrieval-augmented generation (RAG) to incorporate external knowledge bases into the clustering process.
Optional Tasks
How We Use LLMs in This Course
LLMs are introduced as assistive tools, not as substitutes for student reasoning. Students must understand the problem, decompose tasks, design the workflow, and evaluate results.
- Students do independently: understand the assignment, design the analysis flow, interpret outputs, and assess correctness.
- LLMs can support: code generation, debugging, brainstorming, and improving prompt quality.
- Students must not do: submit AI-generated outputs without checking and revising them.
Example Prompt
This simplified example shows the style of prompt used in Assignment 4.
Teaching Materials
Lecture Slides
- Module 1: Introduction to AI and responsible use of LLMs
- Module 2: Python basics and AI-assisted programming
- Module 3: project-based learning and DDI network clustering (Download PDF)
Datasets
Teaching Notes
- At the early stage, instructors provide structured but limited guidance.
- As the course progresses, students are encouraged to work more independently.
- Teachers act as facilitators rather than direct problem solvers.
- Incorrect LLM outputs can be used as learning opportunities.
Frequently Asked Questions
Do students need a background in biology?
No. The biological concepts are simplified, and the main goal is to develop general AI skills.
Do instructors need deep expertise in bioinformatics?
No. The framework is designed so that instructors can adopt the materials with limited domain-specific knowledge.
Can this course be adapted to another dataset?
Yes. The workflow can be adapted to other application contexts, especially those involving structured data, similarity analysis, and clustering.