The University of Georgia (UGA) Data Science and AI Seminars are monthly online seminars that cover interdisciplinary research topics in data science (DS), artificial intelligence (AI), statistics, engineering, biomedical informatics, and public health. We aim to bring together researchers from these fields to discuss exciting topics on DS/AI with interdisciplinary applications.

Upcoming Talks
  • Speaker: Gari Clifford (Professor of Biomedical Informatics and Biomedical Engineering, Emory University and Georgia Institute of Technology)
  • Title: Boosting the performance of a deep learner using realistic models – an application to cardiology and post-traumatic stress disorder
  • Date/Time: Friday, December 3, 2021, 12:00PM – 1:00PM
  • Zoom Link: https://zoom.us/j/98339758184?pwd=QlZFQUpzZFhHclBtaGJyS1N0cE5QZz09
  • Abstract: In this talk I will discuss the concept of leveraging large databases to improve training in smaller databases, with a particular focus on using realistic models, rather than synthetic data. Notably, as databases increase in size, the quality of data labels drops. Often, the data become noisier with rising levels of non-random missingness. Increasingly, transfer learning is being leveraged to mitigate these problems, allowing algorithms to tune on smaller (or rarer) populations while leveraging information from much larger datasets. I’ll present an emerging paradigm in which we insert an extensive model-generated database in the transfer learning process to help a deep learner explore a much larger and denser data distribution. Since a model allows the generation of realistic data beyond the boundaries of the real data, the model can help train the deep learner to extrapolate beyond the observable collection of samples. Using cardiac time series data, I’ll demonstrate that this technique provides a significant performance boost, and discuss some possible extensions and consequences.
  • Bio: Gari Clifford is a tenured Professor of Biomedical Informatics and Biomedical Engineering at Emory University and the Georgia Institute of Technology, and the Chair of the Department of Biomedical Informatics at Emory. His research team applies signal processing and machine learning to medicine to classify, track and predict health and illness. His focus research areas include critical care, digital psychiatry, global health, mHealth, neuroinformatics and perinatal health, particularly in LMIC settings. After training in Theoretical Physics, he transitioned to Machine Learning and Engineering for his doctoral work at the University of Oxford in the 1990’s. He subsequently joined MIT as a postdoctoral fellow, then a Principal Research Scientist, where he managed the creation of the MIMIC II database, the largest open-access critical care database in the world. He later returned to Oxford as an Associate Professor of Biomedical Engineering, where he helped found its Sleep & Circadian Neuroscience Institute and served as Director of the Centre for Doctoral Training in Healthcare Innovation at the Oxford Institute of Biomedical Engineering. Gari is a strong supporter of commercial translation, working closely with industry as an advisor to multiple companies, co-founding and serving as CTO of an MIT spin-out (MindChild Medical) since 2009, and co-founding and serving as CSO for Lifebell AI since 2020. Gari is a champion for open-access data and open-source software in medicine, particularly through his leadership of the PhysioNet/CinC Challenges and contributions to the PhysioNet Resource. He is committed to developing sustainable solutions to healthcare problems in resource poor locations, with much of his work focused in Guatemala.
  • Speaker: Vikas Singh (Professor of Biostatistics and Computer Sciences, University of Wisconsin-Madison)
  • Title: TBA
  • Date/Time: Friday, April 8, 2022
  • Zoom Link: TBA
  • Abstract: TBA
  • Bio: TBA
Past Talks
  • Speaker: Yiran Chen (Professor of Electrical and Computer Engineering, Duke University)
  • Title: Scalable, Heterogeneity-Aware and Privacy-Enhancing Federated Learning
  • Date/Time: Friday, November 12, 2021, 10AM – 11AM
  • Zoom Link: https://zoom.us/j/98339758184?pwd=QlZFQUpzZFhHclBtaGJyS1N0cE5QZz09
  • Abstract: Federated learning has become a popular distributed machine learning paradigm for developing on-device AI applications. However, the data residing across devices is intrinsically statistically heterogeneous (i.e., following non-IID data distribution) and mobile devices usually have limited communication bandwidth to transfer local updates. Such statistical heterogeneity and communication limitation are two major bottlenecks that hinder the application of federated learning. In addition, recent works have demonstrated that sharing model updates makes federated learning vulnerable to inference attacks. In this talk, we will present our recent works on the federated learning frameworks to address the scalability and heterogeneity issues simultaneously. In addition, we will also reveal the essential reason of privacy leakage in federated learning and provide a privacy-enhancing defense mechanism accordingly.
  • Bio: Yiran Chen received B.S (1998) and M.S. (2001) from Tsinghua University and Ph.D. (2005) from Purdue University. After five years in industry, he joined University of Pittsburgh in 2010 as Assistant Professor and then was promoted to Associate Professor with tenure in 2014, holding Bicentennial Alumni Faculty Fellow. He is now the Professor of the Department of Electrical and Computer Engineering at Duke University and serving as the director of the NSF AI Institute for Edge Computing Leveraging the Next-generation Networks (Athena) and the NSF Industry–University Cooperative Research Center (IUCRC) for Alternative Sustainable and Intelligent Computing (ASIC), and the co-director of Duke Center for Computational Evolutionary Intelligence (CEI). His group focuses on the research of new memory and storage systems, machine learning and neuromorphic computing, and mobile computing systems. Dr. Chen has published 1 book and about 500 technical publications and has been granted 96 US patents. He has served as the associate editor of a dozen international academic transactions/journals and served on the technical and organization committees of more than 60 international conferences. He is now serving as the Editor-in-Chief of the IEEE Circuits and Systems Magazine. He received seven best paper awards, one best poster award, and fifteen best paper nominations from international conferences and workshops. He received many professional awards and is the distinguished lecturer of IEEE CEDA (2018-2021). He is a Fellow of the ACM and IEEE and now serves as the chair of ACM SIGDA.
  • Speaker: Jun Liu (Professor of Statistics, Harvard University)
  • Title: Data Splitting for Graphical Model Selection With FDR Control
  • Date/Time: Thursday, October 21, 2021, 3:50PM – 4:50PM
  • Zoom Link: https://zoom.us/j/99986325350?pwd=QUVqdldrMm1OMVNaNzJEai9jZkVTUT09
  • Abstract: Simultaneously finding multiple influential variables and controlling the false discovery rate (FDR) for statistical and machine learning models is a problem of renewed interest recently. A classical statistical idea is to introduce perturbations and examine their impacts on a statistical procedure. We here explore the use of data splitting (DS) for controlling FDR in learning linear, generalized linear, and graphical models. Our proposed DS procedure simply splits the data into two halves at random, and computes a statistic reflecting the consistency of the two sets of parameter estimates (e.g., regression coefficients). The FDR control can be achieved by taking advantage of such a statistic, which possesses the property that, for any null feature its sampling distribution is symmetric about 0. Furthermore, by repeated sample splitting, we propose Multiple Data Splitting (MDS) to stabilize the selection result and boost the power. Interestingly, MDS not only helps overcome the power loss caused by DS with the FDR still under control, but also results in a lower variance for the estimated FDR compared with all other considered methods. DS and MDS are straightforward conceptually, easy to implement algorithmically, and efficient computationally. Simulation results as well as a real data application show that both DS and MDS control the FDR well and MDS is often the most powerful method among all in consideration, especially when the signals are weak and correlations or partial correlations are high among the features. Our preliminary tests on nonlinear models such as generalized linear models and neural networks also show promises. The presentation is based on joint work with Chenguang Dai, Buyu Lin, and Xin Xing.
  • Bio: Dr. Jun Liu is a Professor of Statistics at Harvard University, with a joint appointment in the Harvard School of Public Health. Dr. Liu received his BS degree in mathematics in 1985 from Peking University and Ph.D. in statistics in 1991 from the University of Chicago. He held Assistant, Associate, and full professor positions at Stanford University from 1994 to 2003. Dr. Liu won the NSF CAREER Award and the Stanford Terman fellowship in 1995, won the Mitchell Award for the best statistics application paper in 2000. In 2002, he received the prestigious COPSS Presidents’ Award. He was a Medallion Lecturer of the Institute of Mathematical Statistics (IMS), a Bernoulli Lecturer in 2004, and a Kuwait Lecturer of Cambridge University in 2008. He was elected to Fellow of the IMS in 2004 and Fellow of the American Statistical Association in 2005. He served on numerous grant review panels of the NSF and NIH and editorial boards of numerous leading statistical journals. He was a co-editor of the Journal of the American Statistical Association. Dr. Liu and his collaborators introduced the statistical missing data formulation and Gibbs sampling strategies for biological sequence analysis in the early 1990s. The resulting algorithms for protein sequence analysis, gene regulation analysis, and genetic studies have been adopted by many research groups and become standard tools for computational biologists. Dr. Liu has made fundamental contributions to statistical computing and Bayesian modeling. He pioneered sequential Monte Carlo (SMC) methods invented a few novel Markov chain Monte Carlo (MCMC) techniques. His studies of SMC and MCMC algorithms have had a broad impact on both theoretical understandings and practical applications. Dr. Liu has also pioneered novel Bayesian modeling techniques for discovering subtle interactions and nonlinear relationships in high-dimensional data. Dr. Liu has published one research monograph and more than 200 research articles in leading scientific journals and is one of the ISI Highly Cited mathematicians.
  • Speaker: Xia Hu (Associate Professor of Computer Science, Rice University)
  • Title: Towards Effective Interpretation of Deep Neural Networks: Algorithms and Applications
  • Date/Time: Friday, October 15, 2021, 9:30AM – 10:30AM
  • Zoom Link: https://zoom.us/j/97005929961?pwd=bXN6MVo0bmlhN3BMRDE4SFFqYitjUT09
  • Abstract: While Deep neural networks (DNN) have achieved superior performance in many downstream applications, they are often regarded as black-boxes and are criticized by their lack of interpretability, since these models cannot provide meaningful explanations on how a certain prediction is made. Without the explanations to enhance the transparency of DNN models, it would become difficult to build up trust among end-users. In this talk, I will present a systematic framework from modeling and application perspectives for generating DNN interpretability, aiming at dealing with two main technical challenges in interpretable machine learning, i.e., faithfulness and understandability. Specifically, to tackle the faithfulness challenge of post-hoc interpretation, I will introduce how to make use of feature inversion and additive decomposition techniques to explain predictions made by two classical DNN architectures, i.e., Convolutional Neural Networks and Recurrent Neural Networks. In addition, to develop DNNs that could generate more understandable interpretation to human beings, I will present a novel training method to regularize the interpretations of a DNN with domain knowledge.
  • Bio: Dr. Xia “Ben” Hu is an Associate Professor at Rice University in the Department of Computer Science. Dr. Hu has published over 100 papers in several major academic venues, including NeurIPS, ICLR, KDD, WWW, IJCAI, AAAI, etc. An open-source package developed by his group, namely AutoKeras, has become the most used automated deep learning system on Github (with over 8,000 stars and 1,000 forks). Also, his work on deep collaborative filtering, anomaly detection and knowledge graphs have been included in the TensorFlow package, Apple production system and Bing production system, respectively. His papers have received several Best Paper (Candidate) awards from venues such as WWW, WSDM and ICDM. He is the recipient of NSF CAREER Award. His work has been cited more than 10,000 times with an h-index of 44. He was the conference General Co-Chair for WSDM 2020.
  • Speaker: Christos Davatzikos (Wallace T. Miller Sr. Professor of Radiology, University of Pennsylvania)
  • Title: Machine Learning in Neuroimaging: applications to brain aging, Alzheimer’s Disease, and Schizophrenia
  • Date/Time: Friday, September 24, 2021, 10:00AM – 11:00AM
  • Zoom Link: https://zoom.us/j/94740164574?pwd=TDQzcW5VUndieWtwY2MyT1FrcVpHdz09
  • Abstract: Machine learning has deeply penetrated the neuroimaging field in the past 15 years, by providing a means to construct imaging signatures of normal and pathologic brain states on an individual person basis. In this talk, I will discuss examples from our laboratory’s work on imaging signatures of brain aging and early stages of neurodegenerative diseases, brain development and neuropsychiatric disorders. I will discuss some challenges, such as disease heterogeneity and integration of data from multiple sites in order to achieve sample sizes required by deep learning studies. I will discuss the integration of these methods and results in the context of a dimensional neuroimaging system and its contribution to integrated, precision diagnostics.
  • Bio: Christos Davatzikos is the Wallace T. Miller Sr. Professor of Radiology at the University of Pennsylvania, and Director of the Center for Biomedical Image Computing and Analytics. He holds a secondary appointment in Electrical and Systems Engineering at Penn as well as at the Bioengineering an Applied Mathematics graduate groups. He obtained his undergraduate degree by the National Technical University of Athens, Greece in 1989, and his Ph.D. degree from Johns Hopkins, in 1994, on a Fulbright scholarship. He then joined the faculty in Radiology and later in Computer Science, where he founded and directed the Neuroimaging Laboratory. In 2002 he moved to Penn, where he founded and directed the section of biomedical image analysis. Dr. Davatzikos’ interests are in medical image analysis. He oversees a diverse research program ranging from basic problems of imaging pattern analysis and machine learning, to a variety of clinical studies of aging and Alzheimer’s Disease, schizophrenia, brain cancer, and brain development. Dr. Davatzikos has served on a variety of scientific journal editorial boards and grant review committees. He is an IEEE fellow, a fellow of the American Institute for Medical and Biological Engineering, and member of the council of distinguished investigators of the US Academy of Radiology and Biomedical Imaging Research.
Organizers
  • Tianming Liu, Distinguished Research Professor, Department of Computer Science, UGA
  • Ping Ma, Distinguished Research Professor, Department of Statistics, UGA
  • WenZhan Song, Georgia Power Mickey A. Brown Professor, College of Engineering, UGA
  • Changying “Charlie” Li, Professor, College of Engineering, UGA
  • Yuan Ke, Assistant Professor, Department of Statistics, UGA
  • Zhong-Ru (Paul) Xie, Assistant Professor, College of Engineering, UGA
  • Sheng Li, Assistant Professor, Department of Computer Science, UGA