Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Future Blog Post

less than 1 minute read

Published:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

publications

Links do Matter: Understanding the Drivers of Developer Interactions in Software Ecosystems

Abstract: Studies of collaborating individuals engaged in collective enterprises usually focus on the individuals, rather than the links supporting their interaction. Accordingly, large scale software development ecosystems have also been examined primarily in terms of developer engagement. We posit that communication links between developers play a central role in the sustenance and effectiveness of such ecosystems. In this paper, we investigate whether and how developer attributes relate to the importance of the communication channels between them. We present a technique using 2nd order Markov models to extract features of interest of the links and apply the technique on data from a real-world project. Our statistical models - developed on records involving 900+ software developers, exchanging 20,000+ comments, across 500 units of work - offer surprising insights on factors associated with link importance, even after controlling for known effects. These results inform a deeper appreciation of the importance of links in large scale software development along with a number of practical implications.

Interactional Motifs: Leveraging Risks in Large and Distributed Software Development Teams

Excerpt: …In this chapter, we present a perspective of developer interaction using the lensofmotifs. Through a case study using development data from a large real-word sys-tem involving 2000+ individuals and 150000+ units of work, we demonstrate how amotif based view can endow a deeper sense of two of the critical drivers of softwaredevelopment risk–workload and task completion time.

Challenges in Combating COVID-19 Infodemic – Data, Tools, and Ethics

Abstract: While the COVID-19 pandemic continues its global devastation, numerous accompanying challenges emerge. One important challenge we face is to efficiently and effectively use recently gathered data and find computational tools to combat the COVID-19 infodemic, a typical information overloading problem. Novel coronavirus presents many questions without ready answers; its uncertainty and our eagerness in search of solutions offer a fertile environment for infodemic. It is thus necessary to combat the infodemic and make a concerted effort to confront COVID-19 and mitigate its negative impact in all walks of life when saving lives and maintaining normal orders during trying times. In this position paper of combating the COVID-19 infodemic, we illustrate its need by providing real-world examples of rampant conspiracy theories, misinformation, and various types of scams that take advantage of human kindness, fear, and ignorance. We present three key challenges in this fight against the COVID-19 infodemic where researchers and practitioners instinctively want to contribute and help. We demonstrate that these three challenges can and will be effectively addressed by collective wisdom, crowdsourcing, and collaborative research.

Combating Disinformation in a Social Media Age

Abstract: The creation, dissemination, and consumption of disinformation and fabricated content on social media is a growing concern, especially with the ease of access to such sources, and the lack of awareness of the existence of such false information. In this article, we present an overview of the techniques explored to date for the combating of disinformation with various forms. We introduce different forms of disinformation, discuss factors related to the spread of disinformation, elaborate on the inherent challenges in detecting disinformation, and show some approaches to mitigating disinformation via education, research, and collaboration. Looking ahead, we present some promising future research directions on disinformation.

Disinformation in the Online Information Ecosystem: Detection, Mitigation and Challenges

Abstract: With the rapid increase in access to internet and the subsequent growth in the population of online social media users, the quality of information posted, disseminated and consumed via these platforms is an issue of growing concern. A large fraction of the common public turn to social media platforms and in general the internet for news and even information regarding highly concerning issues such as COVID-19 symptoms. Given that the online information ecosystem is extremely noisy, fraught with misinformation and disinformation, and often contaminated by malicious agents spreading propaganda, identifying genuine and good quality information from disinformation is a challenging task for humans. In this regard, there is a significant amount of ongoing research in the directions of disinformation detection and mitigation. In this survey, we discuss the online disinformation problem, focusing on the recent ‘infodemic’ in the wake of the coronavirus pandemic. We then proceed to discuss the inherent challenges in disinformation research, and then elaborate on the computational and interdisciplinary approaches towards mitigation of disinformation, after a short overview of the various directions explored in detection efforts.

Data Generation for Neural Disinformation Detection

Abstract: Incorporating large language models for various domain-specific NLP tasks has become prevalent due to the easy availability of pre-trained model checkpoints. However, fine-tuning these pre-trained models is necessary to improve performance on domain-specific tasks. Neural fake news detection is one such domain-specific task where the large language model needs to detect machine-generated fake news. Fine-tuning for a neural fake news detection task can be challenging since it requires collecting actual news articles and generating neural fake news counterparts. Therefore, in this paper, we explore the characteristics of the underlying data generation process of fine-tuning large language models for neural fake news detection. We present experiments to develop a deeper understanding of the fundamental properties of data generation. Some interesting findings have the potential to guide future research on neural fake news detection and to determine the quantity and variability of data required for fine-tuning large language models.

(under review at EMNLP’22) Characterizing Harmful Agendas in News Articles

Abstract: Manipulated news online is a growing problem which necessitates the use of automated systems to curtail its spread. However, these systems must be interpretable given the sensitivity of related issues like censorship. Automatically characterizing news articles requires detection of the article’s factuality, any authorial deception, and its agenda. We argue that while misinformation and disinformation detection have been studied, there has been a lack of investment in the important open challenge of detecting harmful agendas in news articles; identifying harmful agendas is critical in order to flag the most insidious manipulated news campaigns online. In this work, we propose this new task and release an initial dataset, NewsAgendas, of annotated news articles for agenda identification. We show how interpretable systems can be effective on this task and demonstrate that they can perform comparably to black-box models.

Text Transformations in Contrastive Self-Supervised Learning: A Review

IJCAI-ECAI 2022, The 31st International Joint Conference on Artificial Intelligence

Contrastive self-supervised learning has become a prominent technique in representation learning. The main step in these methods is to contrast semantically similar and dissimilar pairs of samples. However, in the domain of Natural Language Processing~(NLP), the augmentation methods used in creating similar pairs with regard to contrastive learning (CL) assumptions are challenging. This is because, even simply modifying a word in the input might change the semantic meaning of the sentence, and hence, would violate the distributional hypothesis. In this review paper, we formalize the contrastive learning framework, emphasize the considerations that need to be addressed in the data transformation step, and review the state-of-the-art methods and evaluations for contrastive representation learning in NLP. Finally, we describe some challenges and potential directions for learning better text representations using contrastive methods.

Recommended citation:

talks

teaching

CIS Research Aide

Undergraduate and Graduate Courses, W.P. Carey School of Business, ASU

I was appointed as a CIS Research Aide for Fall 2019 and Spring 2020 under the supervision of Prof. Daniel Mazzola at the W.P. Carey School of Business, ASU. I was responsible for serving as an assistant for the courses CIS 430: Mobile Platforms for Business, CIS 505: Introduction to Enterprise Analytics and CIS 506: Information Management. As part of my duties, I assisted students with assignments, projects and also graded assignments and quizzes.

Teaching Assistant: CSE 110 - Principles of Programming

Undergraduate course, School of CS & AI, ASU

I was appointed as the TA for the undergraduate course Principles of Programming (Java) offered in Fall’19. My responsobilities included teaching the recitals (labs), grading, and holding office hours for help with programming assignments and course content.

Project Coordinator: CSE 472 - Social Media Mining

Undergraduate course, School of CS & AI, ASU

I served as the Project Coordinator for the course CSE 472: Social Media Mining, offered by my Ph.D advisor Dr. Huan Liu, in Fall 2020. I was in charge of designing a course project, mentoring and assisting students, and grading the final projects. Project topic I designed: Machine Generated Text Detection.

Reviewer

Journal, JNCA: Journal of Network and Computer Applications, SNAM: Social Network Analysis and Mining, IPM: Information Processing and Management, CogSys: Cognitive Systems Research