[Press Release] CUHK’s collaborative studies with institutes in Southeast Asia and the UK offer new insights into the roles of large language models in public health research

Home / News & Events / [Press Release] CUHK’s collaborative studies with institutes in Southeast Asia and the UK offer new insights into the roles of large language models in public health research

Date: 23 October 2024

The Chinese University of Hong Kong (CUHK)’s Faculty of Medicine (CU Medicine) has collaborated with RMIT University Vietnam, the National University of Singapore and Imperial College London on two studies exploring the potential of large language models (LLMs), such as ChatGPT, in public health research. In the first study, researchers demonstrated the potential of ChatGPT to assist public health practitioners in developing mathematical models to inform infection control policies, marking a significant advancement in infectious disease epidemiology. The second study, however, highlighted the challenges of integrating LLMs in low-resource languages like Vietnamese – ones where only a limited amount of written material is available, which may result in the dissemination of inaccurate health information and pose public health risks.

LLMs could play a role in narrowing the health information digital divide and democratising access to healthcare.

Using ChatGPT’s conversational characteristics to develop a mathematical transmission model

From SARS to the COVID-19 pandemic, mathematical modelling has become increasingly influential in informing infectious diseases transmission mitigation strategies. However, not every public health practitioner is equipped with the programming knowledge or advanced mathematical expertise needed to effectively use complex modelling tools. This is where LLMs come into play.

In a study published in the Computational and Structural Biotechnology Journal, a public health practitioner used natural conversation to initiate a dialogue involving an iterative process of code generation, refinement and debugging with ChatGPT. This process successfully developed a mathematical transmission model to fit 10 days of prevalence data and estimate two key epidemiological parameters. The model was then validated with historical data from the 1978 influenza outbreak in a British boarding school, demonstrating its reliability and real-world applicability by producing estimates that align with existing literature. This rapid, accessible approach to model development expands the reach of advanced modelling techniques, offering quicker, more inclusive responses to public health challenges.

Professor Kwok Kin-on, Associate Professor from the Jockey Club School of Public Health and Primary Care at CU Medicine, said: “ChatGPT makes sophisticated mathematical transmission models accessible to a wider audience, including in resource-limited settings, enhancing pandemic preparedness and public health responses. Its ability to create models through natural conversation also adds educational value to epidemiology courses, enabling students to simulate disease spread interactively.”

Disparity in language populations widens health digital divide

In another study, published in The BMJ, researchers explored the significant digital divide in LLMs that affects low-resource language populations, particularly in the context of healthcare access. In a case study, the research team sought guidance on atrial fibrillation symptoms from a health chatbot in Vietnamese but instead received information about Parkinson’s disease.

Professor Kwok Kin-on said: “The inherent bias in LLMs, caused by their greater exposure to high-resource languages, poses critical challenges in infectious disease research, particularly when dealing with low-resource languages like Vietnamese. Misinterpretations in symptom detection or disease guidance could have severe repercussions in managing outbreaks. It is vital to enhance LLMs to ensure they deliver accurate, culturally and linguistically relevant health information, especially in regions vulnerable to infectious disease outbreaks.”

Dr Arthur Tang, Senior Lecturer from the School of Science, Engineering and Technology (SSET), RMIT University Vietnam, added: “The accuracy of LLMs hinges significantly on the quantity and quality of their training datasets. LLMs typically perform better in English due to the abundance of high-quality digital training resources available in this language. Conversely, low-resource languages such as Vietnamese and Cantonese have limited digital resources, which are often of lower quality. As a result, LLMs tend to exhibit reduced performance in these languages. This disparity in LLM accuracy can exacerbate the digital divide, particularly since low-resource languages are predominantly spoken in lower- to middle-income countries.”

Professor Wilson Tam Wai-san, Associate Professor and Director of Research from the Alice Lee Centre for Nursing Studies, National University of Singapore, added: “LLMs such as ChatGPT and Gemini-Pro offer significant convenience in distributing health information. However, it is crucial to monitor their accuracy and reliability carefully, especially when prompts are entered and responses generated in low-resource languages. While providing an equitable platform to access health information is beneficial, ensuring the accuracy of this information is essential to prevent the spread of misinformation.”

To bridge the gap in artificial intelligence (AI) language inclusivity and ensure more equitable access to accurate health information across diverse linguistic communities, the researchers proposed six pillars to address the current shortcomings in LLM-driven healthcare communication, which they believe are crucial to mitigate misinformation and enhance healthcare outcomes globally.

Six pillars proposed to address the digital divide:

Policymakers: develop global regulatory frameworks for equitable AI governance.
Research funding agencies: increase support for projects enhancing AI language inclusivity.
Technology corporations: improve AI translation capabilities for diverse languages.
Research community: create and share open-source linguistic data and tools.
Healthcare practitioners: provide feedback to ensure culturally accurate AI solutions.
Linguistically underrepresented communities: contribute insights and experiences to inform inclusive AI development.

Research team members include (from left) Dr Arthur Tang, Prof Wilson Tam, Prof Kwok Kin-on and Mr Neo Tung.

About the research team

The two studies were conducted by a team co-led by Professor Kwok Kin-on, Dr Arthur Tang and Professor Wilson Tam. Other members of the research team include Professor Samuel Wong Yeung-shan and Ms. Vivian Wei Wan-in from the Jockey Club School of Public Health and Primary Care at CU Medicine; Professor Steven Riley from Imperial College London; Mr. Tom Huynh, Mr. Nhat Bui and Mr. Giang Nguyen from RMIT Vietnam; Mr. Neo Tung from the University of Melbourne; and Mr. Huy Quang Nguyen from the Oxford University Clinical Research Unit in Vietnam.

References

1. Kin On Kwok*, Tom Huynh, Wan In Wei, Samuel Y.S. Wong, Steven Riley, Arthur Tang*. Utilizing large language models in infectious disease transmission modelling for public health preparedness. Comput Struct Biotechnol J. Volume 23, December 2024, Pages 3254-3257 (*Corresponding author)

2. Arthur Tang, Neo Tung, Huy Quang Nguyen, Kin On Kwok*, Stanley Luong, Nhat Bui Giang Nguyen, Wilson Tam. Health information for all: do large language models bridge or widen the digital divide? BMJ. 2024 (*Corresponding author)

About

Our People

Education

Research

Global

News & Events

Follow Us

[Press Release] CUHK’s collaborative studies with institutes in Southeast Asia and the UK offer new insights into the roles of large language models in public health research

About

Our People

Education

Research

Global

News & Events

About

Our People

Education

Research

Global

News & Events