CyberSentinel-AI/README_EN.md

# 🔥 CyberSentinel AI - Automated Security Monitoring and AI Analysis System

[![License](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)

CyberSentinel AI is an **automated security monitoring and AI analysis system** designed to **track the latest security vulnerabilities (CVEs)** and **security-related repositories on GitHub** in real-time. It leverages **Artificial Intelligence (AI) technology** for in-depth analysis and **automatically publishes valuable security intelligence to a blog platform**.

## 🚀 Key Features

*   **Multi-Source Data Monitoring**:
    *   **CVE Monitoring**: Real-time scraping of the latest CVE-related information from GitHub, enabling rapid discovery and tracking of the latest vulnerability trends.
    *   **GitHub Repository Monitoring**: Comprehensive monitoring of security-related open-source projects on GitHub through keyword searches and predefined watch lists.
*   **Intelligent AI Analysis**:
    *   **OpenAI & Gemini Dual Engine**: Integrated with OpenAI and Gemini AI models, providing powerful natural language processing capabilities for in-depth security data analysis.
    *   **Multi-Dimensional Security Assessment**: Evaluation of CVEs and repositories from multiple dimensions, including **vulnerability principles**, **exploitation methods**, **risk levels**, and **impact scope**, ensuring depth and breadth of analysis.
    *   **Value Judgment and Filtering**: Intelligent AI-driven judgment of security information value, automatically filtering out low-value information and focusing on truly noteworthy security threats and technologies.
*   **Automated Workflow**:
    *   **Fully Automated Monitoring**: The system runs **24/7** unattended, automating security information collection, analysis, and report generation.
    *   **Daily Security Briefing**: Generates daily security briefing reports on a schedule, summarizing the latest CVE vulnerabilities and GitHub security repository dynamics, and pushing them to a blog platform.
    *   **Dynamic Blacklisting**: Automatically updates blacklists based on AI analysis results, reducing interference from invalid information and improving monitoring efficiency.
*   **Flexible Configuration and Management**:
    *   **Multi-GitHub Token Support**: Supports configuration of multiple GitHub Tokens, intelligently rotating usage to effectively avoid API rate limits.
    *   **Configurable Monitoring Parameters**: Keywords, watch repository lists, blacklists, etc., can be flexibly adjusted through configuration files to meet different monitoring needs.
    *   **Detailed Logging**: Detailed logs are recorded for all critical steps of system operation, facilitating troubleshooting and system monitoring.
*   **Automated Blog Publishing**:
    *   **Integrated Blog Platform**: Integrated with a blog platform API to automatically publish daily security briefing reports, quickly sharing security intelligence.
    *   **Markdown Reports**: Analysis results and security briefings are generated in **Markdown format**, making them easy to read and edit.

## 🛠️ Technical Implementation

### 1. Monitoring Modules (Monitors)

*   **`cve_monitor.py`**: **CVE Monitor**
    *   **GitHub API Interaction**: Uses the GitHub API to search for CVE-related repositories, keyword `CVE-202+`, and sorts by `updated` time.
    *   **CVE Information Extraction**: Extracts CVE numbers from repository names and descriptions using regular expressions.
    *   **Repository Information Crawling**: Retrieves repository descriptions, star counts, update times, recent commits, and other information.
    *   **Blacklist Filtering**: Supports **user blacklists** and **repository blacklists** to filter out invalid information sources.
    *   **File Content Analysis**: Clones repositories locally and **intelligently analyzes** **README.md** and other **high-priority files**, calculates file **relevance scores**, and initially filters high-value repositories.
    *   **Intelligent Token Management**: Implements automatic rotation and status checking of GitHub Tokens, dynamically switching available tokens to ensure the continuity of monitoring tasks.
    *   **Database Storage**: Uses the **SQLite** database **`database/cve_record.db`** to store CVE records, including CVE numbers, descriptions, publication dates, last modified dates, repository URLs, and other information.

*   **`github_monitor.py`**: **GitHub Repository Monitor**
    *   **Keyword Search**: Periodically searches GitHub repositories based on the `GITHUB_KEYWORDS` list defined in the configuration file **`config.py`**.
    *   **Watch List**: Supports the `WATCHED_REPOSITORIES` list in the configuration file **`config.py`** to **focus monitoring** on predefined security repositories.
    *   **Repository Information Crawling**: Retrieves detailed repository information, including descriptions, star counts, last update times, recent commit records, and more.
    *   **Commit Record Analysis**: Crawls the **recent commit records** of repositories, **intelligently analyzes** commit information and file changes, and initially judges the **security relevance** of repositories.
    *   **Blacklist Filtering**: Supports **user blacklists** and **repository blacklists** to filter out invalid information sources.
    *   **Intelligent Token Management**: Shares the Token management mechanism with the CVE monitor.
    *   **Database Storage**: Uses the **SQLite** database **`database/github_repo.db`** to store GitHub repository records, including repository names, URLs, descriptions, last update times, star counts, whether they are high-value repositories, and other information.

### 2. AI Analysis Module (AI)

*   **`analyzer.py`**: **AI Analyzer**
    *   **OpenAI & Gemini API**: Integrates **OpenAI API** (primary) and **Gemini API** (backup), supports **multi-model** switching, such as `gpt-4o-mini-2024-07-18` (fallback model).
    *   **Prompt Engineering**: Designed **different Prompt templates** for **different analysis scenarios** (CVE analysis, new repository analysis, repository update analysis, specific watch repository analysis) to optimize AI analysis results.
    *   **JSON Format Output**: Requires AI to **strictly output analysis results in JSON format** for easy program parsing and data processing.
    *   **Multi-Dimensional Security Analysis**: AI analysis results include rich information such as **brief descriptions of vulnerabilities/repositories**, **detailed summaries**, **risk levels**, **key points**, **technical details**, **affected components**, **value assessments**, **security types**, **update types**, and **vulnerability exploitation status**.
    *   **Result Validation and Standardization**: Performs **strict format validation** and **content standardization** on the JSON results returned by AI to ensure the accuracy and usability of the data.
    *   **Dynamic Blacklist Update**: Based on AI analysis results, **automatically judges** whether repositories or users should be added to the blacklist and **dynamically updates the blacklist file**.
    *   **Analysis Result Persistence**: **Saves** AI analysis results as **JSON files** and **updates** corresponding records in the **database**.
    *   **Article Title Classification**: Supports **AI classification** of security article titles for generating security briefing reports.
    *   **API Failover**: When OpenAI API calls fail, **automatically switches** to **backup OpenAI API** or **Gemini API** to improve system **stability and availability**.

### 3. Data Processing and Management (Utils)

*   **`logger.py`**: **Logger**
    *   Uses the `logging` module to provide **complete logging** functionality, covering **DEBUG**, **INFO**, **WARNING**, **ERROR**, and other levels.
    *   Log information is **detailed** and **structured**, making it easy to troubleshoot and monitor the system.
    *   Logs are output to the file **`logs/security_monitor.log`** and rolled over daily.

*   **`csv_writer.py`**: **CSV Result Writer** (currently not used, can be extended)
    *   Provides the function of **exporting analysis results to CSV files** for easy data analysis and sharing.

*   **`article_fetcher.py`**: **Article Fetcher**
    *   **Multi-Source Fetching**: Currently supports fetching security articles from **BruceFeIix** and **D洞见 (doonsec)** WeChat official accounts.
    *   **Article Title and URL Extraction**: Uses regular expressions to extract article titles and URLs from web page content.
    *   **Retry Mechanism**: Uses a **backoff strategy** retry mechanism to improve the **stability and success rate** of article fetching.
    *   **Article Title Cleaning**: **Standardizes** and **cleans** article titles, removing redundant markers and formats.

*   **`article_manager.py`**: **Article Manager**
    *   **Article De-duplication**: **Automatically filters** processed article URLs to avoid duplicate analysis and pushing.
    *   **AI Classification Result Processing**: Processes AI article title classification results and **organizes article lists by category**.
    *   **Daily Security Briefing Report Generation**: **Regularly** generates **Markdown format** daily security briefing reports, summarizing the latest security articles and AI analysis results.
    *   **Automated Blog Publishing**: Calls the `blog_manager.py` module to **automatically publish daily security briefing reports to a blog platform**.
    *   **Article Data Persistence**: **Saves** processed URLs and classified articles as **JSON files** for easy subsequent use and management.

*   **`blog_manager.py`**: **Blog Manager**
    *   **Blog Platform API Interaction**: Encapsulates **common functions** for interacting with blog platform APIs, such as **creating articles** and **updating articles**.
    *   **Article ID Mapping Management**: **Records** the **article IDs** of daily security briefing reports on the blog platform for easy subsequent updates and management.
    *   **Automated Blog Publishing**: Implements the function of **automatically publishing daily security briefing reports to a blog platform**.

### 4. Database (Database)

*   **`database/models.py`**: **Database Model Definition**
    *   Uses **SQLAlchemy** to define two data models, **`CVERecord`** (CVE record) and **`Repository`** (GitHub repository record), to facilitate data storage and querying.
    *   The database uses **SQLite**, and the file paths are **`database/cve_record.db`** and **`database/github_repo.db`**.

### 5. Configuration File (Config)

*   **`config.py`**: **System Configuration File**
    *   Centrally manages the system's **configuration parameters**, such as database paths, API keys, monitoring intervals, keyword lists, blacklists, etc.
    *   Facilitates users to **customize** and **adjust** system behavior.
    *   Includes the following main configuration items:
        *   `DATABASE_PATH`: Database file path
        *   `MONITOR_INTERVAL`: Monitoring cycle interval (seconds)
        *   `GITHUB_TOKEN`: GitHub API Token (supports list `GITHUB_TOKENS`)
        *   `GITHUB_KEYWORDS`: List of GitHub repository search keywords
        *   `WATCHED_REPOSITORIES`: List of GitHub repositories to focus on monitoring
        *   `BLACKLIST_USERS`: User blacklist
        *   `BLACKLIST_REPOSITORIES`: Repository blacklist
        *   `PRIMARY_AI_CONFIG`: Primary AI service (OpenAI) configuration
        *   `BACKUP_AI_CONFIGS`: List of backup AI service (OpenAI) configurations
        *   `GEMINI_AI_CONFIG`: Gemini AI service configuration
        *   `BLOG_TOKEN`: Blog platform API Token

### 6. Main Program (Main)

*   **`main.py`**: **System Main Program**
    *   **Initializes** each module (monitors, AI analyzer, article manager, etc.).
    *   **Starts** the monitoring cycle, **regularly** executing CVE monitoring, GitHub repository monitoring, AI analysis, article crawling, and blog publishing tasks.
    *   Uses **multi-threading** to achieve **concurrent monitoring** and **AI analysis**, improving system efficiency.
    *   **Exception handling** and **retry mechanisms** ensure stable system operation.
    *   **Status Monitoring Thread**: **Regularly checks** system operating status and records logs.
    *   **Daily Blog Publishing**: **Regularly** automatically publishes daily security briefing reports to a blog platform.
    *   **Command-Line Startup**: Users can start and stop the monitoring system via the command line.

## ⚙️ Running Environment

*   Python 3.8+
*   Dependencies (see `requirements.txt`)

## 📦 Installation Steps

1.  **Clone the code repository**

    ```bash
    git clone [Project Repository Address]
    cd [Project Directory]
    ```

2.  **Install dependencies**

    ```bash
    pip install -r requirements.txt
    ```

3.  **Configure the `config.py` file**

    *   Configure GitHub API Token (`GITHUB_TOKEN` or `GITHUB_TOKENS`)
    *   Configure OpenAI API key and Base URL (`PRIMARY_AI_CONFIG`, `BACKUP_AI_CONFIGS`)
    *   Configure Gemini API key and Base URL (`GEMINI_AI_CONFIG`)
    *   Configure Blog platform API Token (`BLOG_TOKEN`) (if you need to automatically publish to a blog)
    *   Modify other configuration items as needed, such as monitoring interval, keyword list, blacklist, etc.

4.  **Run the system**

    ```bash
    python main.py
    ```

## 📝 Future Plans

*   **More Data Source Support**: Expand support for more security information sources, such as security communities, vulnerability platforms, etc.
*   **More Refined AI Analysis**: Continuously optimize Prompt engineering to improve the accuracy and depth of AI analysis.
*   **Richer Features**: Such as vulnerability early warning, threat intelligence visualization, custom reports, etc.
*   **Web UI Management Interface**: Develop a Web UI management interface to facilitate users to configure and manage the monitoring system.

## 🤝 Contribution

Contributions are welcome! If you have any suggestions or bug reports, please submit an Issue or Pull Request.

## 📜 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

**Thank you for your attention!** ⭐ **Star** this project to support our work!