Research Projects

 

1. Overview

Recent years have witnessed an explosive increase in electronic data. For example, International Data Corporation (IDC) estimated that the total amount of electronic data was 2.7 zettabytes by the end of 2012. The global size of “Big Data” in Biomedicine stands at roughly 200 Exabytes in 2012. Such big data have fundamentally changed the business, research, and education. Understanding the big data and thereby transforming big data into smart data at the semantic level is the key to explore the full potential of data and to unravel complex phenomena. Due to the unique challenges of big data, the task of transforming big data presents a number of compelling computational and analytical challenges

 

Our research interests span a variety of aspects of algorithms and software infrastructure for Big Data and Computational Intelligence, with a particular focus on scientific domains including: (1) data-intensive analytics for pervasive healthcare monitoring to assist aging population and patients with chronic diseases; (2) scalable deep learning from extremely complex biomedical multimedia data; and (3) user-centered knowledge discovery and decision support from large scale clinical data to evaluate and improve the quality of health care.


Working in the high impact interdisciplinary research projects to contribute the core area of computer science and computer technology has played a vital role in the success of my research. Our research efforts, which has been supported by more than 1.05 million external/federal research grants, cover three important research areas (as shown in paragraph 1) that include some of the most challenging research questions in Biomedical Informatics. Below are brief descriptions of our some samples of current and previous research projects. Please contact us at ycao@cs.uml.edu for more information. 

 

 

2. Research Areas

Research Area 1: Data-intensive Analytics for Pervasive Healthcare Monitoring to Assist Aging Population and Patients with Chronic Diseases: During last five years, we have seen an explosive increasing of new human-computer interaction devices (e.g., smartphones, Microsoft Kinect) and new sensors (e.g., wireless body sensors). These devices are equipped with inexpensive and unobtrusive sensors that can collect physiology data for chronic condition in natural living environment. It opens an unprecedented opportunity to discover early predictors and novel biomarkers to support clinical decision making and to reduce healthcare cost. The long term objective of this research is to investigate, develop, and validate new data-intensive analytics models and algorithms to discover insights from physiological information using inexpensive and unobtrusive sensors. However, developing new analytics models and algorithms using these devices remains to be an open research problem with many challenging research questions. Currently, we center our efforts on data analytics for three types of physiology data: time series data from body sensors and smartphones; human motion data from Microsoft Kinect; and Electroencephalography (EEG) data from EPOC neuroheadset consumer brain-computer interfaces (BCI), with immediate applications on assisted living environment for aging population and in-home monitoring/rehabilitation for patients with chronic diseases. Our research results have been published in top journals and conferences, such as IEEE Transactions on Biomedical Engineering (TBME), Journal of Neural Computing & Applications (NCA) by Springer, Journal of Cognitive Neurodynamics by Springer, ACM Multimedia, ACM/IEEE BodyNets, IEEE EMBS, and etc. Real-world validation of our proposed approach is being conducted with our clinic collaborators at the University of Tennessee: College of Medicine Chattanooga

 

Research Area 2: Scalable Deep Learning from Extremely Complex Biomedical Multimedia Data:  Tremendous amounts of biomedical multimedia data, such as CT images, cardiac ECHO videos and textual patient records, are captured and recorded in digital format during the daily clinical practice, medical research, and education. Important biomedical knowledge is embedded in this data. Automatic discovery of this biomedical knowledge, by machine learning-based intelligent analysis, is highly desirable and very useful. Unfortunately, the extremely complex natural of biomedical multimedia data (e.g., tens of millions of training data) makes the problem of learning and analysis a very challenging problem. This project includes two parts and both of them are rooted from recent advances in deep learning, which is a very promising intriguing area of machine learning research. In the first part of this project, we are developing automated data-intensive (petebytes) content analysis techniques and software for medical images and videos captured during endoscopy procedure. We are actively one of the pioneers in this area and our pioneer work was awarded the ACG (American College of Gastroenterology) Governors Award for Excellence in Clinical Research for “the Best Scientific Paper”. In the second part of this project, we aim to develop an intelligent and scalable multi-modal medical retrieval system to support the medical diagnosis, research, and teaching. Radiology is a case in point in our study. Our pioneer research is reshaping the future of the medical multimedia analysis retrieval with publications in top ranked journals and conferences, such as IEEE TBME, ACM Multimedia, IEEE ISM, IEEE CIVR, IAPR/IEEE ICPR, IEEE ICME, and etc. Some of our proposed approaches are being evaluated and validated in clinic practice, with the support from our collaborators: Dr. J. Kalpathy-Cramer from Harvard Medical School and Piet C. De Groen, M.D at Mayo Clinic

 

Research Area 3: User-centered Knowledge Discovery and Decision Support from Large Scale Clinical Data to Evaluate and Improve the Quality of Health Care: With the full adoption of Health Information Technology, there will be a steady accumulation of large amounts of patient data that can be leveraged to evaluate and improve the quality, safety and efficiency of care and extend public health and research. Research in this area focuses on user-centered data analysis to determine the relative clinical effectiveness of different interventions. Specifically, we employ the risk analysis for Acute Coronary Syndromes in chest pain patients as our application domain. The use of nuclear cardiac stress testing has been incorporated into chest pain unit (CPU) evaluation protocols in the evaluation of patients deemed at low to intermediate risk of acute coronary syndromes (ACS) defined as unstable angina or acute myocardial infarction (AMI). The objective of this project is to develop a computer-aided predicative model to investigate the risk factors (e.g., age, sex, cardiac risk factors) on the incidence of ACS for the purpose of developing a tool that may assist physicians to predicate the ACS in chest pain patients. We have developed parallel, distributed, and scalable computer algorithms to handle the real-world clinical data with large volume of patient information and a very large number of variables. Our results have been reported at the top biomedical journals and computer science conferences such as American Journal of Emergency Medicine, Annals of Emergency Medicine, ACM Multimedia, IEEE EMBC, IEEE BioMed, and etc. We have integrated some of our proposed approach into clinical workflow to provide computer-aided clinic decision support for medical professionals at the Emergency Department of Erlanger Hospital, Chattanooga, Tennessee. To the best of our knowledge, this is the first ACS risk calculator that has been employed under a real-world clinical environment

 

3. Sample Research Projects

(Special Notes: The following sample projects represent a partial list of our past and current research projects. We could not disclose everything here due to HIPAA compliance and other regulatory compliance. If you are interested in collaborating with us, please feel free to contact PI, Dr. Yu Cao at ycao@cs.uml.edu

 

 

 

      

  

Medical Video/Image Analysis and Retrieval for User-centered Decision Support

 

Overview

This project includes two parts. In the first part, we have developed automated data-intensive content analysis techniques and software for medical videos/images captured during endoscopy procedure [1-9]. In the second part of this project, we aim to develop an intelligent and scalable multi-modal medical retrieval system to support the medical diagnosis, research, and teaching. We are actively one of the pioneers in these areas. Some examples of current research efforts include: to investigate novel annotation, retrieval, and dimension deduction techniques to build high quality, large scale, trusted medical archive [10-16] , to develop new query-adaptive search strategy to retrieval the most relevant medical information [17, 18] . We are also developing new representation of biomedical data in a semantically rich, structured form that lends itself to automated search, retrieval, inference, and data-driven knowledge acquisition (e.g., using machine learning) [19] . Software tools and methods for biomedical data annotation/indexing are also explored [20, 21]. Real world applications of our research include: (1) the first intelligent multimedia system to analyze, retrieve, and visualize important content in medical videos captured during endoscopy; (2) intelligent medical search engine; (2) instructional video search engine. 

 

References

Link to references

 

Software

ARRS GoldMiner

 

 

 

 

 

 

 

Motion Tracking, Analyzing, and Visualization

 

Overview

Motion is a fundamental component of all organismal behavior. Motion research is one of the most active research topics in graphics and visual computing, driven by a wide range of promising applications in many areas such as animation production, movement analysis, and industrial. The long term goal of our motion project is to develop a 3D video tracking and analyzing system that can fully recover the unconstrained movements of a wide range of organisms that do not have easily trackable natural landmarks or markers placed by the experimenter. Tracking and analyzing a moving and deforming three-dimensional organism to derive detailed and accurate locomotory kinematics remains a challenging open problem.We are now investigating efficient techniques for multi-view stereo reconstruction, motion analysis,  2D/3D visual tracking algorithm using a flexible geometric model, as well as new visual learning paradigm that includes both geometric and appearance model for 3D tracking and analysis [22-25]. We envision the new techniques would enable the biomechanists and ethologists to model very large datasets with high accuracy..

 

References

Link to references

 

Software

2D Fish Tracking, Analyzing, and Visualization Software 

3D Fly Tracking Software (please email us for source code)

3D Fish Tracking and Analyzing Software (coming soon)

 

 

 

 

 


             

 

 

Context Awareness Data Analysis for Body Area Sensor Networks

 

Overview

The ultimate goal of this project is to develop new context awareness data analysis framework that supports the development of next generation pervasive healthcare monitoring based on wireless body area sensor networks (BodyNets). One of the pilot projects is to build a patient care center, a system for periodic and opportunistic patient data collection, analysis, and exchange for people living in rural area. In this collaborative project, I am leading the data analysis efforts. Due to the noisy sensor measurements, low bandwidth and unreliable communications between sensors, and limited sensor storage and computation speed, data analysis for BodyNets is very challenging and extremely difficult. We herein propose a new data analysis framework to address these issues. We have obtained very promising preliminary results, which have been reported in our recent papers [26, 27] .

 

References

Link to references

 

Software

Communication and Data Analysis for BodyNets (Support the BSN platform and can be compiled under TinyOS2.x)

 

 

 

 

Risk Analysis for Acute Coronary Syndromes in Chest Pain Patients

Overview

The use of nuclear cardiac stress testing has been incorporated into chest pain unit (CPU) evaluation protocols in the evaluation of patients deemed at low to intermediate risk of acute coronary syndromes (ACS) defined as unstable angina or acute myocardial infarction (AMI). The objective of this project is to develop a computer-aided predicative model to investigate the risk factors (e.g., age, sex, cardiac risk factors) on the incidence of ACS for the purpose of developing a tool that may assist physicians to predicate the ACS in chest pain patients. 

References

Link to references

 

Software

Online Risk Calculator for ACS in Patients Undergoing Stress Testing

 

 

 

 

Protein Structure Classification from NMR Spectra

 

Overview

Knowledge of the three-dimensional structure of proteins is integral to understanding their functions and a necessity in the era of proteomics. The structural class of a protein lies at the top of any hierarchical characterization of its fold. The designation of class based on protein structure content has been extremely useful from both experimental and theoretical points of view. The objective of this project is to investigate effective and efficient data mining methods for the classification of the protein structure directly from Nuclear Magnetic Resonance (NMR) using the chemical shift information [28-31].

 

References

Link to references

 

Software

Protein Structure Classification from NMR Spectra

 

 

 

4. Past and Current Research Collaborators

 

Working in the high impact interdisciplinary research projects to contribute the core area of computer science has played a vital role in the success of our research. We have found that perform theoretical research, combined with practical applications, especially where there is a cross contribution between theory and practice as well as between different domains, is extremely productive. Our past and current collaboration experience put us in an excellent position to initialize and solicit this kind of project. Feel free to contact us at ycao@cs.uml.edu if you are interested in collaborating with us.

 

Medical Image Analysis and Retrieval

Piet C. De Groen, M.D., Mayo Clinic, Rochester, MN

Charles E. Kahn Jr, MD, MS, Medical College of Wisconsin, Milwaukee, WI

Dr. Sanqing Hu, School of Biomedical Engineering, Science & Health Systems, Drexel University, Philadelphia, PA

Dr. Wallapak Tavanapong, Department of Computer Science, Iowa State University, Ames, IA

Dr. Johnny S. Wong, Department of Computer Science, Iowa State University, Ames, IA

Dr. Alex Liu, Department of Computer Science, California State University, Fresno, CA

Rodney Kent Hutson, Jr., M.D, chief of the Department of Radiology at Erlanger Health System

Francis M. Fesmire, MD, FACEP, Research Director, Department of Emergency Medicine, University Tennessee, School of Medicine Chattanooga and Director of Heart-Stroke Center Erlanger Medical Center

Dr. Tian Zhao, Associate Professor of Computer Science, University of Wisconsin-Milwaukee

Dr. Jayashree Kalpathy-Cramer, Research Scientists of Biomedical Informatics,Oregon Health & Science University

 

Motion Tracking, Analyzing, and Visualization

Dr. Ulrike Muller, Department of Biology, California State University, Fresno, CA

Dr. Joy Goto, Department of Chemistry, California State University, Fresno, CA

Dr. Ebraheem Fontaine, Department of Mechanical Engineering, California Institute of Technology, Pasadena, CA

 

Context Awareness Data Analysis for Body Area Sensor Networks

Dr. B. Prabhakaran, Department of Computer Science, University of Texas at Dallas, Dallas, TX

Dr. Sanqing Hu, School of Biomedical Engineering, Science & Health Systems, Drexel University, Philadelphia, PA

Dr. Ming  Li, Department of Computer Science, California State University, Fresno, CA

Dr. Alex Liu, Department of Computer Science, California State University, Fresno, CA

Thomas Devlin,M.D. Medical Director, Erlanger Southeast Regional Stroke Center

Gregory Heath, Ph DHSc, MPH, FACSM, FAHA, Director of Research and Professor of Medicine at UTCOMC and Assistant Provost for Research and Engagement and Guerry Professor at UTC

 

Protein Structure Classification from NMR Spectra

Dr. Krish Krishnan, Department of Chemistry, California State University, Fresno, CA

Dr. Charles Yan, Department of Computer Science, Utah State University, Logan , UT

 

 

 

 

 

5. People

Interacting with students and other researchers is always my favorite activity. My past tutoring and advising experiences make me believe that different people learn in different way and respond best to different approaches. My education goal is to help students to prepare for their professional career. By assisting students to identify the goal, prioritize the tasks, execute the plan, and evaluate the results, I am able to help students to become the future leaders of science and engineering. We are always looking self motivated students and researchers to join our team. Please do not hesitate to contact us at ycao@cs.uml.edu for further information

Faculty: Dr. Yu Cao

Current Graduate Students: Yii Li (Medical Image Retrieval), Swapna Philip (Data Analysis for BodyNets), Julie Pena (Medical Image Retrieval), Satmeet Ubhi (Motion Tracking and Visualization)

Current Undergraduate Students: Ronald John Lugge III (3D Motion Analysis and Game Development)

Graduate Alumina: Sung Baang, Sachin Raka, Mohamed Ali, Rehana Ferwin

Undergraduate Alumina: Matthew Calderaz, Thell Smith, Matthew Daniel Mclaughlin, Brandon Joseph Wilson

 

 

 

6. Sponsors

 

We are active researchers that explore the creative solutions on several fundamental issues for building intelligent information system and biomedical information system. From an application point of view, such knowledge is very promising to make impact to the areas of biological science; medical science, healthy care; education; homeland security; public safety, etc. We are grateful爐o the爁ollowing爋rganizations爁or their generous support爐o our past and current research projects. If you share our vision and are interested in funding our research to address the grand challenges facing society, please contact us at ycao@cs.uml.edu.

 

US National Science Foundation

Mayo Clinic 

 

 

 

7. References (please refer to my publication page for more details)

[1]           Y. Cao, W. Tavanapong, K. Kim, J. Wong, J. Oh, and P. C. d. Groen, "A framework for parsing colonoscopy videos for semantic units," in Proceedings of the IEEE International Conference on Multimedia and Expo, Taipei, Taiwan, 2004.

[2]           Y. Cao, W. Tavanapong, D. Li, J. Oh, P. C. d. Groen, and J. Wong, "A Visual model approach for parsing colonocsopy videos," in Proceedings of the International Conference on Image and Video Retrieval, Dublin, Ireland, 2004.

[3]           Y. Cao, D. Li, W. Tavanapong, J. Oh, J. Wong, and P. C. d. Groen, "Parsing and browsing tools for colonoscopy videos," in Proc. of ACM Multimedia, New York, NY, USA, 2004.

[4]           S. Hwang, J. Oh, J. Lee, Y. Cao, W. Tavanapong, D. Liu, J. Wong, and P. C. d. Groen, "Automatic measurement of quality metrics for colonoscopy videos," in Proceedings of the Annual ACM International Conference on Multimedia, Singapore, 2005.

[5]           Y. Cao, D. Liu, W. Tavanapong, J.-H. Oh, J. Wong, and P.-C. Groen, "Automatic classification of image with appendiceal orifice in colonoscopy videos," in Proceedings of the IEEE International Conference of the Engineering in Medicine and Biology Society, New York City, NY, USA, 2006.

[6]           Y. Cao, D. Liu, W. Tavanapong, J. Wong, J. Oh, and P. C. d. Groen, "Computer-aided Detection of Diagnostic and Therapeutic Operations in Colonoscopy Videos," IEEE Transactions on Biomedical Engineering,, vol. 54, pp. 1268-1279, 2007.

[7]           Y. Cao, S. Baang, S. Liu, M. Li, and S. Hu, "Audio-Visual Event Classification via Spatial-Temporal-Audio Words," in Proc. of IAPR/IEEE International Conference on Pattern Recognition (ICPR), Tampa, FL, USA, 2008.

[8]           Y. Cao, S. Liu, M. Li, S. Hu, and S. Baang, "Medical Video Event Classification Using Shared Features," in Proc. of IEEE International Symposium on Multimedia (ISM), Berkeley, CA, USA, 2008.

[9]           J. Oh, S. Hwang, Y. Cao, W. Tavanapong, D. Liu, J. Wong, and P. C. d. Groen, "Measuring objective quality of colonoscopy," IEEE Transactions on Biomedical Engineering, vol. 56, pp. 2190-2196, 2009.

[10]         "ARRS GoldMiner," in http://goldminer.arrs.org, 2009.

[11]         J. Charles E. Kahn and C. Thao, "GoldMiner: a radiology image search engine," American Journal of Roentgenology, vol. 188, pp. 1475-1478, 2007.

[12]         J. Charles E. Kahn and D. L. Rubin, "Automated semantic indexing of figure captions to improve radiology image retrieval," Journal of the American Medical Informatics Association, vol. 16, pp. 380-386, 2009.

[13]         H. M黮ler, J. Kalpathy-Cramer, C. E. K. Jr, W. Hatt, S. Bedrick, and W. Hersh, "Overview of the ImageCLEFmed 2008 Medical Image Retrieval Task," in 9th Workshop of the Cross-Language Evaluation Forum, 2008.

[14]         H. M黮ler, J. Kalpathy-Cramer, I. Eggel, S. Bedrick, R. Said, B. Bakke, C. E. K. Jr, and W. Hersh, "Overview of the 2009 Medical Image Retrieval Task," in Working Notes of CLEF 2009 (Cross Language Evaluation Forum), 2009.

[15]         Y. Cao, R. Troncy, B. Prabhakaran, and J. Gao, "Data Semantics for Multimedia Systems and Applications " in Proc. of IEEE International Symposium on Multimedia (ISM), San Diego, CA, USA, 2009 (To Appear, Invited).

[16]         M.-L. Shyu, Y. Cao, J. Kong, M. Li, M. Lux, and J. Bao, "Introduction to the special issue on "data semantics for multimedia systems"," Multimedia Tools and Applications, An International Journal from Springer, 2010 (Guest Editorial).

[17]         S. Liu, Y. Cao, M. Li, P. Kilaru, T. Smith, and S. Toner, "A Semantics- and Data-Driven SOA for Biomedical Multimedia Systems," in Proc. of IEEE International Workshop on Data Semantics for Multimedia Systems and Applications (DSMSA), Berkeley, CA, USA, 2008.

[18]         M.-L. Shyu, Y. Cao, M. Li, J. Kong, and J. Bao, "Introduction to the Special Issue on Data Semantics and Multimedia Information Management," Journal of Multimedia, 2010 (Guest Editorial).

[19]         S.-H. Liu, Y. Cao, M. Li, T. Smith, J. Harris, J. Bao, B. R. Bryant, and J. Gray, "A SOA-Based Functional and QoS Semantics-Driven Biomedical Multimedia Processing," in Methodologies for Non-Functional Requirements in Service Oriented Architecture, J. Suzuki, Ed. Hershey, PA, USA: IGI Global (formerly Idea Group), 2010 (Accepted).

[20]         J. Bao, Y. Cao, W. Tavanapong, and V. Honavar, "Integration of Domain-Specific and Domain-Independent Ontologies for Colonoscopy Video Database Annotation," in Proc. of the International Conference on Information and Knowledge Engineering, Las Vegas, Nevada, 2004.

[21]         D. Liu, Y. Cao, W. Tavanapong, J. O. Johnny Wong, and P. C. d. Groen, "Arthemis: A Case Study of Annotation Software in an Integrated Capturing and Analysis System for Colonoscopy," Computer Methods and Programs in Biomedicine, vol. 88, pp. 152-163, 2007.

[22]         H. T. Kim, C. Saito, N. T. Mekdara, S. Choudhury, A. Goodarzi, F. Mazloomi, T. Sakha, M. Soltani, S. Ubhi, Y. Cao, J. Goto, and U. K. Muller, "The effects of the glutamate agonist BMAA on the walking behavior of adult fruit flies," in Annual Meeting of the Society for Integrative and Comparative Biology. Seattle, WA, USA, 2010 (Poster, Accepted).

[23]         E. I. Fontaine, F. Zabala, M. H. Dickinson, and J. W. Burdick, "Wing and body motion during flight initiation in Drosophila revealed by automated visual tracking," Journal of Experimental Biology, vol. 212, pp. 1307-1323, 2009.  

[24]         E. I. Fontaine, D. Lentink, S. Kranenbarg, U. M黮ler, J. van Leeuwen, A. H. Barr, and J. W. Burdick, "Automated visual tracking for studying the ontogeny of zebrafish swimming," The Journal of Experimental Biology, pp. 1305-1316, 2008.

[25]         Y. Cao, S. Read, S. Raka, and R. Nandamuri, "A Theoretic Framework for Object Class Tracking," in Proc. of IEEE International Conference on Networking, Sensing and Control (ICNSC), Hainan, China, 2008.

[26]         M. Chen, M. Li, V. Leung, S. Prasad, S.-H. Liu, and Y. Cao, "Recent Advances in Body Sensor Networks: A Survey," Computer Communications, The International Journal for the Computer and Telecommunications Industry (Elsevier), 2010 (Accepted with major revision).

[27]         M. Li, Y. Xiao, S.-H. Liu, Y. Cao, and W. Zhang, "Body Sensor Networks: Applications, Architectures, and Communication Protocols," in Emerging Wireless Networks: Concepts, Techniques and Applications, C. Makaya    and S. Pierre, Eds. London, UK: Auerbach Publications, Taylor & Francis Group, 2010 (Submitted).

[28]         S. P. Mielke and V. V. Krishnan, "Protein structural class identification directly from NMR spectra using averaged chemical shifts," Bioinformatics, vol. 19, pp. 2054-64, 2003.

[29]         S. P. Mielke and V. V. Krishnan, "An evaluation of chemical shift index-based secondary structure determination in proteins: influence of random coil chemical shifts," Journal of Biomolecular NMR, vol. 30, pp. 143-53, 2004.

[30]         S. P. Mielke and V. V. Krishnan, "Estimation of protein secondary structure content directly from NMR spectra using an improved empirical correlation with averaged chemical shift," Journal of Structural and Functional Genomics, vol. 6, pp. 281-285, 2005.

[31]        S. P. Mielke and V. V. Krishnan, "Characterization of protein secondary structure from NMR chemical shifts," Progress in Nuclear Magnetic Resonances, vol. 54, pp. 141-165, 2009.

[32]  Buchheit RC, Fesmire FM, Cao Y, et al. Nuclear stress testing in the emergency department chest pain patients with suspected acute coronary syndrome: who should we stress? Ann Emerg Med 2011;58:S209.

[33] Fesmire FM, Hughes AD, Stout PK, et al: Selective dual nuclear scanning in low risk patients with chest pain to reliably identify and exclude acute coronary syndromes. Ann Emerg Med 2001;38:207-215.

[34] Fesmire FM, Hughes AD, Fody EP, Stout PK, et al: The Erlanger Protocol: A one year experience with serial 12-lead ECG monitoring, 2-hour delta serum marker measurements, and selective nuclear stress test to identify and exclude acute coronary syndromes. Ann Emerg Med 2002; 40: 584-594.