CyberSentinel AI is an automated security monitoring and AI analysis system designed to track the latest security vulnerabilities (CVEs) and security-related repositories on GitHub in real-time. It leverages Artificial Intelligence (AI) technology for in-depth analysis and automatically publishes valuable security intelligence to a blog platform.

🚀 Key Features

Multi-Source Data Monitoring:
- CVE Monitoring: Real-time scraping of the latest CVE-related information from GitHub, enabling rapid discovery and tracking of the latest vulnerability trends.
- GitHub Repository Monitoring: Comprehensive monitoring of security-related open-source projects on GitHub through keyword searches and predefined watch lists.
Intelligent AI Analysis:
- OpenAI & Gemini Dual Engine: Integrated with OpenAI and Gemini AI models, providing powerful natural language processing capabilities for in-depth security data analysis.
- Multi-Dimensional Security Assessment: Evaluation of CVEs and repositories from multiple dimensions, including vulnerability principles, exploitation methods, risk levels, and impact scope, ensuring depth and breadth of analysis.
- Value Judgment and Filtering: Intelligent AI-driven judgment of security information value, automatically filtering out low-value information and focusing on truly noteworthy security threats and technologies.
Automated Workflow:
- Fully Automated Monitoring: The system runs 24/7 unattended, automating security information collection, analysis, and report generation.
- Daily Security Briefing: Generates daily security briefing reports on a schedule, summarizing the latest CVE vulnerabilities and GitHub security repository dynamics, and pushing them to a blog platform.
- Dynamic Blacklisting: Automatically updates blacklists based on AI analysis results, reducing interference from invalid information and improving monitoring efficiency.
Flexible Configuration and Management:
- Multi-GitHub Token Support: Supports configuration of multiple GitHub Tokens, intelligently rotating usage to effectively avoid API rate limits.
- Configurable Monitoring Parameters: Keywords, watch repository lists, blacklists, etc., can be flexibly adjusted through configuration files to meet different monitoring needs.
- Detailed Logging: Detailed logs are recorded for all critical steps of system operation, facilitating troubleshooting and system monitoring.
Automated Blog Publishing:
- Integrated Blog Platform: Integrated with a blog platform API to automatically publish daily security briefing reports, quickly sharing security intelligence.
- Markdown Reports: Analysis results and security briefings are generated in Markdown format, making them easy to read and edit.

🛠️ Technical Implementation

1. Monitoring Modules (Monitors)

cve_monitor.py: CVE Monitor
- GitHub API Interaction: Uses the GitHub API to search for CVE-related repositories, keyword CVE-202+, and sorts by updated time.
- CVE Information Extraction: Extracts CVE numbers from repository names and descriptions using regular expressions.
- Repository Information Crawling: Retrieves repository descriptions, star counts, update times, recent commits, and other information.
- Blacklist Filtering: Supports user blacklists and repository blacklists to filter out invalid information sources.
- File Content Analysis: Clones repositories locally and intelligently analyzes README.md and other high-priority files, calculates file relevance scores, and initially filters high-value repositories.
- Intelligent Token Management: Implements automatic rotation and status checking of GitHub Tokens, dynamically switching available tokens to ensure the continuity of monitoring tasks.
- Database Storage: Uses the SQLite database database/cve_record.db to store CVE records, including CVE numbers, descriptions, publication dates, last modified dates, repository URLs, and other information.
github_monitor.py: GitHub Repository Monitor
- Keyword Search: Periodically searches GitHub repositories based on the GITHUB_KEYWORDS list defined in the configuration file config.py.
- Watch List: Supports the WATCHED_REPOSITORIES list in the configuration file config.py to focus monitoring on predefined security repositories.
- Repository Information Crawling: Retrieves detailed repository information, including descriptions, star counts, last update times, recent commit records, and more.
- Commit Record Analysis: Crawls the recent commit records of repositories, intelligently analyzes commit information and file changes, and initially judges the security relevance of repositories.
- Blacklist Filtering: Supports user blacklists and repository blacklists to filter out invalid information sources.
- Intelligent Token Management: Shares the Token management mechanism with the CVE monitor.
- Database Storage: Uses the SQLite database database/github_repo.db to store GitHub repository records, including repository names, URLs, descriptions, last update times, star counts, whether they are high-value repositories, and other information.

2. AI Analysis Module (AI)

analyzer.py: AI Analyzer
- OpenAI & Gemini API: Integrates OpenAI API (primary) and Gemini API (backup), supports multi-model switching, such as gpt-4o-mini-2024-07-18 (fallback model).
- Prompt Engineering: Designed different Prompt templates for different analysis scenarios (CVE analysis, new repository analysis, repository update analysis, specific watch repository analysis) to optimize AI analysis results.
- JSON Format Output: Requires AI to strictly output analysis results in JSON format for easy program parsing and data processing.
- Multi-Dimensional Security Analysis: AI analysis results include rich information such as brief descriptions of vulnerabilities/repositories, detailed summaries, risk levels, key points, technical details, affected components, value assessments, security types, update types, and vulnerability exploitation status.
- Result Validation and Standardization: Performs strict format validation and content standardization on the JSON results returned by AI to ensure the accuracy and usability of the data.
- Dynamic Blacklist Update: Based on AI analysis results, automatically judges whether repositories or users should be added to the blacklist and dynamically updates the blacklist file.
- Analysis Result Persistence: Saves AI analysis results as JSON files and updates corresponding records in the database.
- Article Title Classification: Supports AI classification of security article titles for generating security briefing reports.
- API Failover: When OpenAI API calls fail, automatically switches to backup OpenAI API or Gemini API to improve system stability and availability.

3. Data Processing and Management (Utils)

logger.py: Logger
- Uses the logging module to provide complete logging functionality, covering DEBUG, INFO, WARNING, ERROR, and other levels.
- Log information is detailed and structured, making it easy to troubleshoot and monitor the system.
- Logs are output to the file logs/security_monitor.log and rolled over daily.
csv_writer.py: CSV Result Writer (currently not used, can be extended)
- Provides the function of exporting analysis results to CSV files for easy data analysis and sharing.
article_fetcher.py: Article Fetcher
- Multi-Source Fetching: Currently supports fetching security articles from BruceFeIix and D洞见 (doonsec) WeChat official accounts.
- Article Title and URL Extraction: Uses regular expressions to extract article titles and URLs from web page content.
- Retry Mechanism: Uses a backoff strategy retry mechanism to improve the stability and success rate of article fetching.
- Article Title Cleaning: Standardizes and cleans article titles, removing redundant markers and formats.
article_manager.py: Article Manager
- Article De-duplication: Automatically filters processed article URLs to avoid duplicate analysis and pushing.
- AI Classification Result Processing: Processes AI article title classification results and organizes article lists by category.
- Daily Security Briefing Report Generation: Regularly generates Markdown format daily security briefing reports, summarizing the latest security articles and AI analysis results.
- Automated Blog Publishing: Calls the blog_manager.py module to automatically publish daily security briefing reports to a blog platform.
- Article Data Persistence: Saves processed URLs and classified articles as JSON files for easy subsequent use and management.
blog_manager.py: Blog Manager
- Blog Platform API Interaction: Encapsulates common functions for interacting with blog platform APIs, such as creating articles and updating articles.
- Article ID Mapping Management: Records the article IDs of daily security briefing reports on the blog platform for easy subsequent updates and management.
- Automated Blog Publishing: Implements the function of automatically publishing daily security briefing reports to a blog platform.

4. Database (Database)

database/models.py: Database Model Definition
- Uses SQLAlchemy to define two data models, CVERecord (CVE record) and Repository (GitHub repository record), to facilitate data storage and querying.
- The database uses SQLite, and the file paths are database/cve_record.db and database/github_repo.db.

5. Configuration File (Config)

config.py: System Configuration File
- Centrally manages the system's configuration parameters, such as database paths, API keys, monitoring intervals, keyword lists, blacklists, etc.
- Facilitates users to customize and adjust system behavior.
- Includes the following main configuration items:
  - DATABASE_PATH: Database file path
  - MONITOR_INTERVAL: Monitoring cycle interval (seconds)
  - GITHUB_TOKEN: GitHub API Token (supports list GITHUB_TOKENS)
  - GITHUB_KEYWORDS: List of GitHub repository search keywords
  - WATCHED_REPOSITORIES: List of GitHub repositories to focus on monitoring
  - BLACKLIST_USERS: User blacklist
  - BLACKLIST_REPOSITORIES: Repository blacklist
  - PRIMARY_AI_CONFIG: Primary AI service (OpenAI) configuration
  - BACKUP_AI_CONFIGS: List of backup AI service (OpenAI) configurations
  - GEMINI_AI_CONFIG: Gemini AI service configuration
  - BLOG_TOKEN: Blog platform API Token

6. Main Program (Main)

main.py: System Main Program
- Initializes each module (monitors, AI analyzer, article manager, etc.).
- Starts the monitoring cycle, regularly executing CVE monitoring, GitHub repository monitoring, AI analysis, article crawling, and blog publishing tasks.
- Uses multi-threading to achieve concurrent monitoring and AI analysis, improving system efficiency.
- Exception handling and retry mechanisms ensure stable system operation.
- Status Monitoring Thread: Regularly checks system operating status and records logs.
- Daily Blog Publishing: Regularly automatically publishes daily security briefing reports to a blog platform.
- Command-Line Startup: Users can start and stop the monitoring system via the command line.

⚙️ Running Environment

Python 3.8+
Dependencies (see requirements.txt)

📦 Installation Steps

Clone the code repository

git clone [Project Repository Address]
cd [Project Directory]

Install dependencies
```
pip install -r requirements.txt
```
Configure the config.py file
- Configure GitHub API Token (GITHUB_TOKEN or GITHUB_TOKENS)
- Configure OpenAI API key and Base URL (PRIMARY_AI_CONFIG, BACKUP_AI_CONFIGS)
- Configure Gemini API key and Base URL (GEMINI_AI_CONFIG)
- Configure Blog platform API Token (BLOG_TOKEN) (if you need to automatically publish to a blog)
- Modify other configuration items as needed, such as monitoring interval, keyword list, blacklist, etc.
Run the system
```
python main.py
```

📝 Future Plans

More Data Source Support: Expand support for more security information sources, such as security communities, vulnerability platforms, etc.
More Refined AI Analysis: Continuously optimize Prompt engineering to improve the accuracy and depth of AI analysis.
Richer Features: Such as vulnerability early warning, threat intelligence visualization, custom reports, etc.
Web UI Management Interface: Develop a Web UI management interface to facilitate users to configure and manage the monitoring system.

🤝 Contribution

Contributions are welcome! If you have any suggestions or bug reports, please submit an Issue or Pull Request.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

Thank you for your attention! ⭐ Star this project to support our work!