AI Detects Drug Dealers on Instagram With About 95% Accuracy.
Researchers in the US have developed a multimodal machine learning system that can analyze various content, including images, to identify Instagram drug dealer accounts and messages.Â
Researchers in the US have developed a multimodal machine learning system that can analyze various content, including images, to identify Instagram drug dealer accounts and messages.
The study, "Identifying Illegal Drug Dealers on Instagram Using Large-scale Multimodal Data Pooling," was the result of a joint effort by three researchers from the University of West Virginia and a researcher from Case Western Reserve University. To facilitate the project, researchers created a database called the Identity of Drug Dealers on Instagram (IDDIG) that contained 4,000 user accounts, 1,400 drug dealer accounts, and the rest as a control group to test the identification process.
Initial testing of this method reported nearly 95% accuracy in identifying drug dealers offered by Instagra, and the framework also uses geographic factors and specific types of identification to detect changing indicators of illicit drug-related activity. This led to a hashtag-based community search project designed. of drugs. Since the database developed for project required manual labeling, the structure includes a classification system based on Transformers' Google Bidirectional Encoder (BERT) representation and a user-friendly annotation system using image classification based on ResNet. It's possible.
Dealer identification in drug conversations
Recreational drugs are discussed in a variety of contexts on social media platforms such as Instagram. The people who post are consumers, not sellers. Depending on your local regulatory requirements and your ability to dispense prescription drugs, you may be a legitimate consumer even if different drug laws apply.
Also, the behavior of drug dealers on Instagram is not always straightforward. Dealers often advertise with comments and hashtags rather than multimedia messages, which tend to be easier to identify as drug-related content in both human and machine control systems. Therefore, hashtags and comments are included to identify assets in the new system.
In addition to BERT-based text analysis and ResNet-based image search, the task includes functional-level multimodal data fusion proposed in the IEEE 2016 paper Discriminant Correlation Analysis: Real-Time Function-Level Fusion for Multimodal biometric recognition.
Hashtags as input to the database
The project's web crawling engine embarks on its journey to identify drug-trading accounts by using a hashtag search API to track 200 drug-related hashtag paths identified by subject matter experts. Images in posts using the hashtag are classified using a binary classification model based on VGG16. An image correlated with an image of a known drug is then stored in the system and the message is converted to a JSON object for later retrieval.
The structure is then expanded with relevant comments and information (both text and images) embedded on the home page of posters that engage with hashtags and whose content is marked as drug related. So 10,000 potential posts and 23,034 user home pages were added to the dataset.
As the drug-related hashtag continues to evolve to pattern and evade scrutiny by authorities, all new hashtags in flagged posts that are not part of the original hashtag collection will be flagged and logged for future use. When tagged in the
web interface (see image above), multimodal data fusion must take into account the fact that not all messages contain all four possible data types. So the algorithm can accept 9 out of 16 parts of 4 data types using concatenation and join functions. A missing element here corresponds to zero in the calculation.
NetworkX
Finally, the dataset is used using the NetworkX Python language pack proposed in 2008 by the Los Alamos National Laboratory in New Mexico. Network X is widely used for large-scale operations involving graphs with more than 10 million nodes. By treating the hashtags in the dataset as if they were included in a single post, the researchers were able to generate undirected drug-related graphs for NetworkX analysis.
The IDDIG dataset was tested using a variety of protocols including multimodal data fusion, multi-source fusion, and 4-source fusion, achieving up to 95% accuracy results in drug-related post and user identification compared to identification methods. Using the human cycle, it was also possible to create a "sunburst chart" representing the overall geographic distribution of drug-related activity on Instagram and other possible directions for future research in a similar project.