Artificial Intelligence: Practice and Applications

Overview

This course uses DDI network clustering as a case to teach general artificial intelligence (AI) skills. However, it is not intended to be a specialized bioinformatics course. Instead, bioinformatics serves as a realistic and data-rich setting for introducing core AI concepts and problem-solving strategies.

Within this framework, students learn how to apply AI methods to real-world problems, with a particular focus on the effective use of large language models (LLMs). LLMs are used to support tasks such as data analysis, code generation, result interpretation, and iterative problem solving. Through this process, students develop practical AI skills that are transferable across disciplines, rather than domain-specific expertise in bioinformatics.

LLMs are used as supportive tools. Students remain responsible for understanding the task, designing the workflow, evaluating outputs, and identifying errors.

Learning Goals

Understand basic AI concepts and responsible AI use.
Develop introductory programming and data analysis skills.
Apply machine learning algorithms to a realistic scientific problem.
Use LLMs for programming support, debugging, and reflection.
Critically evaluate AI-generated results instead of accepting them directly.

Course Structure

Module 1

Foundations of AI

Introduction to AI concepts, AI history, major application scenarios, and ethical issues in AI-assisted learning.

Module 2

Computational Skills

Introduction to Python, common scientific libraries, and AI-assisted programming tools. Students learn how to combine coding practice with guided use of LLMs.

Module 3

Project-Based Learning

Students complete four connected assignments centered on DDI network clustering and compare traditional computational workflows with LLM-assisted approaches.

Assignments

Assignment 1

Clustering DDI Networks Using Drug Information

Required Tasks

Compute Drug Similarity from Multiple Perspectives (i.e., Structural similarity, Pharmacological similarity, Phenotypic similarity,Therapeutic similarity)
Apply spectral clustering to group 18 antibiotics into six clusters based on different similarity measures.
Evaluate clustering quality using Edge Purity and visualize the clustering results with Cytoscape.

Optional Tasks

Explore different numbers of clusters to determine the optimal clustering number.
Try different clustering algorithms (e.g., K-means, hierarchical clustering)
Explore additional drug similarity features through literature review or interaction with LLMs.

Assignment 2

Integrating Multi-Source Drug Information

Required Tasks

Integrate the clustering results from Assignment 1 using a consensus clustering algorithm.
Integrate the four similarity matrices from Assignment 1 using a similarity matrix fusion method, and perform clustering based on the fused similarity matrix.
Evaluate the performance of the two ensemble clustering approaches using edge purity.

Optional Tasks

Conduct an ablation study to evaluate how removing one type of drug information (e.g., chemical structure) affects clustering performance, thereby identifying the most important drug feature or feature combination.
Assign three new antibiotics (kanamycin, penicillin G, and roxithromycin) to the clustered DDI network, and predict their interactions with the 18 antibiotics included in the clustering analysis.

Assignment 3

Network-Based Similarity

Required Tasks

Compute drug similarity based on DDI information.
Analyze the correlation between network-based similarity and the four similarity measures introduced in Assignment 1.
Explore the characteristics of DDI network clustering and discuss its potential applications in predicting the mechanisms of action of compounds.

Optional Tasks

Apply the clustering integration methods introduced in Assignment 2 to integrate multi-species DDI information and obtain more robust clustering results.

Assignment 4

LLM-Assisted DDI Network Clustering

Required Tasks

Use LLMs (e.g., ChatGPT, DeepSeek) to perform clustering based on chemical structure, mechanism of action, bacterial growth curves, and ATC codes.
Multi-source integration for clustering.
Network-based clustering using DDI topology.
Compare LLM performance with traditional algorithms.

Optional Tasks

Integrate clustering results from multiple LLMs using ensemble clustering algorithms.
Use retrieval-augmented generation (RAG) to incorporate external knowledge bases into the clustering process.

How We Use LLMs in This Course

LLMs are introduced as assistive tools, not as substitutes for student reasoning. Students must understand the problem, decompose tasks, design the workflow, and evaluate results.

Students do independently: understand the assignment, design the analysis flow, interpret outputs, and assess correctness.
LLMs can support: code generation, debugging, brainstorming, and improving prompt quality.
Students must not do: submit AI-generated outputs without checking and revising them.

Example Prompt

This simplified example shows the style of prompt used in Assignment 4.

Write a Python program to compute the structural similarity between compounds and perform clustering analysis using spectral clustering. Please use libraries such as NumPy, RDKit, pandas, and scikit-learn. The workflow should include the following steps: 1. Read an Excel file to obtain the compound names and SMILES strings. 2.Use RDKit to convert SMILES into molecular objects and compute MACCS fingerprints. 3.Calculate pairwise Tanimoto similarity for all compounds to construct a similarity matrix. 4. Apply Spectral Clustering to the similarity matrix. 5. Output the final clustering results in the form of a python dictionary.

Teaching Materials

Lecture Slides

Module 1: Introduction to AI and responsible use of LLMs
Module 2: Python basics and AI-assisted programming
Module 3: project-based learning and DDI network clustering (Download PDF)

Datasets

Drug Information (Download)
DDI Information (Download)
Bacterial Growth Curve (Download)
PPI network of E. coli (Download)
Results of network prppagating (Download)
Dataset for the optional tasks in Assignment 3 (Download)

Softwares

Code and Examples

Edge purity (Download)
Network proximity (Download)

Teaching Notes

At the early stage, instructors provide structured but limited guidance.
As the course progresses, students are encouraged to work more independently.
Teachers act as facilitators rather than direct problem solvers.
Incorrect LLM outputs can be used as learning opportunities.

Frequently Asked Questions

Do students need a background in biology?

No. The biological concepts are simplified, and the main goal is to develop general AI skills.

Do instructors need deep expertise in bioinformatics?

No. The framework is designed so that instructors can adopt the materials with limited domain-specific knowledge.

Can this course be adapted to another dataset?

Yes. The workflow can be adapted to other application contexts, especially those involving structured data, similarity analysis, and clustering.