Artificial Intelligence: Practice and Applications

A project-based undergraduate course that introduces AI through drug–drug interaction (DDI) network clustering analysis. The course is designed for beginners and uses large language models as supportive tools for programming, analysis, and critical reflection.

Audience
First-year undergraduates
Format
Project-based learning
Core Case
DDI network clustering
Main Tools
Python, RDKit, NetworkX, LLMs

Overview

This course uses DDI network clustering as a case to teach general artificial intelligence (AI) skills. However, it is not intended to be a specialized bioinformatics course. Instead, bioinformatics serves as a realistic and data-rich setting for introducing core AI concepts and problem-solving strategies.

Within this framework, students learn how to apply AI methods to real-world problems, with a particular focus on the effective use of large language models (LLMs). LLMs are used to support tasks such as data analysis, code generation, result interpretation, and iterative problem solving. Through this process, students develop practical AI skills that are transferable across disciplines, rather than domain-specific expertise in bioinformatics.

LLMs are used as supportive tools. Students remain responsible for understanding the task, designing the workflow, evaluating outputs, and identifying errors.

Learning Goals

  • Understand basic AI concepts and responsible AI use.
  • Develop introductory programming and data analysis skills.
  • Apply machine learning algorithms to a realistic scientific problem.
  • Use LLMs for programming support, debugging, and reflection.
  • Critically evaluate AI-generated results instead of accepting them directly.

Course Structure

Module 1

Foundations of AI

Introduction to AI concepts, AI history, major application scenarios, and ethical issues in AI-assisted learning.

Module 2

Computational Skills

Introduction to Python, common scientific libraries, and AI-assisted programming tools. Students learn how to combine coding practice with guided use of LLMs.

Module 3

Project-Based Learning

Students complete four connected assignments centered on DDI network clustering and compare traditional computational workflows with LLM-assisted approaches.

Assignments

Assignment 1

Clustering DDI Networks Using Drug Information

Required Tasks

  • Compute Drug Similarity from Multiple Perspectives (i.e., Structural similarity, Pharmacological similarity, Phenotypic similarity,Therapeutic similarity)
  • Apply spectral clustering to group 18 antibiotics into six clusters based on different similarity measures.
  • Evaluate clustering quality using Edge Purity and visualize the clustering results with Cytoscape.

Optional Tasks

  • Explore different numbers of clusters to determine the optimal clustering number.
  • Try different clustering algorithms (e.g., K-means, hierarchical clustering)
  • Explore additional drug similarity features through literature review or interaction with LLMs.
Assignment 2

Integrating Multi-Source Drug Information

Required Tasks

  • Integrate the clustering results from Assignment 1 using a consensus clustering algorithm.
  • Integrate the four similarity matrices from Assignment 1 using a similarity matrix fusion method, and perform clustering based on the fused similarity matrix.
  • Evaluate the performance of the two ensemble clustering approaches using edge purity.

Optional Tasks

  • Conduct an ablation study to evaluate how removing one type of drug information (e.g., chemical structure) affects clustering performance, thereby identifying the most important drug feature or feature combination.
  • Assign three new antibiotics (kanamycin, penicillin G, and roxithromycin) to the clustered DDI network, and predict their interactions with the 18 antibiotics included in the clustering analysis.
Assignment 3

Network-Based Similarity

Required Tasks

  • Compute drug similarity based on DDI information.
  • Analyze the correlation between network-based similarity and the four similarity measures introduced in Assignment 1.
  • Explore the characteristics of DDI network clustering and discuss its potential applications in predicting the mechanisms of action of compounds.

Optional Tasks

  • Apply the clustering integration methods introduced in Assignment 2 to integrate multi-species DDI information and obtain more robust clustering results.
Assignment 4

LLM-Assisted DDI Network Clustering

Required Tasks

  • Use LLMs (e.g., ChatGPT, DeepSeek) to perform clustering based on chemical structure, mechanism of action, bacterial growth curves, and ATC codes.
  • Multi-source integration for clustering.
  • Network-based clustering using DDI topology.
  • Compare LLM performance with traditional algorithms.

    Optional Tasks

  • Integrate clustering results from multiple LLMs using ensemble clustering algorithms.
  • Use retrieval-augmented generation (RAG) to incorporate external knowledge bases into the clustering process.

How We Use LLMs in This Course

LLMs are introduced as assistive tools, not as substitutes for student reasoning. Students must understand the problem, decompose tasks, design the workflow, and evaluate results.

  • Students do independently: understand the assignment, design the analysis flow, interpret outputs, and assess correctness.
  • LLMs can support: code generation, debugging, brainstorming, and improving prompt quality.
  • Students must not do: submit AI-generated outputs without checking and revising them.

Example Prompt

This simplified example shows the style of prompt used in Assignment 4.

Write a Python program to compute the structural similarity between compounds and perform clustering analysis using spectral clustering. Please use libraries such as NumPy, RDKit, pandas, and scikit-learn. The workflow should include the following steps: 1. Read an Excel file to obtain the compound names and SMILES strings. 2.Use RDKit to convert SMILES into molecular objects and compute MACCS fingerprints. 3.Calculate pairwise Tanimoto similarity for all compounds to construct a similarity matrix. 4. Apply Spectral Clustering to the similarity matrix. 5. Output the final clustering results in the form of a python dictionary.

Teaching Materials

Lecture Slides

  • Module 1: Introduction to AI and responsible use of LLMs
  • Module 2: Python basics and AI-assisted programming
  • Module 3: project-based learning and DDI network clustering (Download PDF)

Datasets

Code and Examples

Teaching Notes

  • At the early stage, instructors provide structured but limited guidance.
  • As the course progresses, students are encouraged to work more independently.
  • Teachers act as facilitators rather than direct problem solvers.
  • Incorrect LLM outputs can be used as learning opportunities.

Frequently Asked Questions

Do students need a background in biology?

No. The biological concepts are simplified, and the main goal is to develop general AI skills.

Do instructors need deep expertise in bioinformatics?

No. The framework is designed so that instructors can adopt the materials with limited domain-specific knowledge.

Can this course be adapted to another dataset?

Yes. The workflow can be adapted to other application contexts, especially those involving structured data, similarity analysis, and clustering.