Research

September 16, 2020

Modeling Graphene Squeeze-Film Pressure Sensors

Completed as part of Lawrence Berkeley National Lab’s Science Undergraduate Laboratory Internship program with the Center for Computational Engineering group
Mentors: Daniel Ladiges, Andy Nonaka, Ann Almgren, John Bell
June 2019 to December 2020
Coded in C and Fortran on AMReX in Linux

Pressure sensors are used for a vast range of applications, such as in small electronics, automobiles, and in the future as bio-skin and during surgeries. Because of the multitude of applications, there is considerable interest in improving these nanosensors. However, these changes often cause small pressure sensors to have lifetimes of only a few hours. Recently Dolleman et al. have designed a new version of these sensors, which is expected to be stable over a longer lifespan. We create a model of an open squeeze-film pressure sensor. This model will enable simulations of these devices, aiding in both device design and interpretation of measurements.
This project was coded within the framework of AMReX(developed by the CCSE group at LBNL). This framework in addition to the direct mentorship of many members of the group allowed me to take lead on this project. Some of the coding was above my ability, to enable me to continue working members of the CCSE helped code the hardest parts.

Joint Sequence Analysis: How to Handle Missing Values and Mixed Variable Types

Completed as part of Lawrence Berkeley National Lab’s Science Undergraduate Laboratory Internship program with the Scientific Data Management group
Mentors: Alina Lazar, Keshung Wu, Alexander Sim
June 2018 to December 2019
Coded in R with data from a survey of the Bay Area

This study focuses on developing methodologies to minimize the effects of incomplete data. Specifically, it hopes to reduced the noise and bias caused in categorical sequence data by data gaps. Some strategies investigated include choosing a substitution “cost” to replace missing values and deleting the missing values at the end of a sequence. Cluster validity metrics are used to determine the accuracy of the unsupervised clustering algorithms and t-SNE is employed to visualize clusters and age biases. It became clear that deleting missing values provided the best results, but all data sets are different. Thus this study recommends employing the studied procedures before conducting analysis on longitudinal sequence data to ensure the results are unbiased. After these tests optimize the data, clustering is conducted to understand the correlation between a person’s state in life and their travel.
After initial algorithm testing mentors gave direction, but I took ownership of the project.

Infections Modeled using a Cellular Automata Structure

Mentor: Alicia Prieto
Fellow student: Lindsey Chludzinski
August 2017 to March 2020
Coded in Matlab
Presented at Mathfest August 2018
Used as an example of a successful undergraduate research project in Agent-Based Modeling in Mathematical Biology: A Few Examples

The treatment of infections is an important focus for many medical professionals and mathematicians. Innate immunity is the body’s first defense against invading pathogens. However, if an infection occurs, the immune system’s main way of destroying invading pathogens is by using phagocytes to engulf and kill the microorganisms. In the process of destroying the microorganisms, neutrophils arrive in what is called the inflammatory response. Then, through extravasation, neutrophils migrate through the blood vessel wall into the infected tissue. Neutrophils then bind bacteria, engulf, and destroy them. Finally, a neutrophil dies by apoptosis and leave pus at the infected site. The purpose of our study is to create an interactive and mathematical simulation of a various number of infections, using cellular automata modeling.
The project idea was provided by our mentor, the rest has been me and another student. She still provides occasional guidance, but follows a hands off approach.

Performance Evaluation of Calorimeter Clustering Algorithms for Particle Tracking

Completed as a part of XSEDE’s EMPOWER program (Summer 2020)
Mentors: Alina Lazar, Daniel Murane, Xiangyang Ju, Paolo Calafiura
December 2019 to present
Coded in Python
Ran using Cori, the supercomputer at Lawrence Berkeley National Laboratory

The challenge of reconstructing tracks of particles produced in high energy collisions is mainly computational. With the ever-growing data from scientific experiments, it is imperative to have automatic ways to analyze that data. Combinatorics approaches currently used to track particles will become inadequate as the number of simultaneous collisions will increase in the next phase of the High Luminosity Large Hadron Collider (HLLHC). To reduce the complexity of combinatorial approaches we evaluate several iterative algorithms based on clustering algorithms to reconstruct particle trajectories. Specifically, we analyze clustering algorithms based on sparse binning and DBSCAN. The sparse binning algorithm separates the detector space into bins before performing the grouping step. This idea speeds up the algorithm but affects the accuracy. We ran a high performance computing implementation of the proposed clustering approaches on a public dataset containing a large set of simulated collision events. The performance evaluation is done for three different clustering implementations in terms of average accuracy and computational speed.

Automatic Tagging of Stack Overflow Data Using BERT Word Embeddings and Deep Learning

Completed as a part of CRA-WP’s Collaborative Research Experiences for Undergraduates (CREU) Program (2018)
Mentors: Alina Lazar and Bonita Sharif
Coded in Python and R
August 2018 to December 2019
Presented at Grace Hopper Celebration 2019

Question-and-answer (QA) websites like Stack Overflow require users to attach up to five tags when they submit a question. How-ever, users may assign tags that are not relevant to the question.A better approach would be to recommend to users the most appropriate tags for their question and let them choose. The goal of this project is to combine newly developed natural language representations together with deep learning algorithms to improve the prediction accuracy of tags for Stack Overflow questions. We used word representations generated by word2vec and a Convolutional Neural Network (CNN).
The initial project idea was the work of our mentors. The rest of the project work from initial research to completion of posters was split between myself and another student, Hannah Senediak. Hannah focused mostly on testing various internet algorithms and traveling to give presentations while I found statistics of the dataset, layered the algorithms Hannah was testing, and creating posters.