Skip to content

Creating posts from RSS feeds in Flarum

Blog
  • posttorss.webp

    One of the things that one of my projects has been doing successfully for a few months is querying RSS feeds, then using the Flarum API to create discussions as posts

    Want this ? Sure you do ! Below are the steps, including all scripts etc to make this work

    Firstly, you will need the flarum api client from here

    Installation

    composer require maicol07/flarum-api-client

    Configuration

    In order to start working with the client you might need a Flarum master key:

    • Generate a 40 character random, hard to guess string - this is the Token needed for this package (you can use a generator for this - a good example is here)
    • Manually add it to the api_keys table using phpmyadmin/adminer or another solution.

    The master key is required to access non-public discussions and running actions otherwise reserved for Flarum administrators.

    Install SimplePie

    Next, install SimplePie to parse the RSS feeds

    composer require simplepie/simplepie

    Create storage DB

    Now access your database using phpmyadmin (or something similar) and create a new database called “feed”

    With the database created, run the following script which will create a table called “queue” with a few simple columns

    CREATE TABLE `queue` (
      `id` bigint(20) NOT NULL,
      `url` varchar(500) NOT NULL,
      `title` varchar(500) NOT NULL,
      `seen` int(1) NOT NULL DEFAULT 0
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
    

    As your “feed” database gets bigger, it’ll need some form of index to make it simpler and faster to search. Create as follows in phpmyadmin

    ALTER TABLE `queue`
      ADD PRIMARY KEY (`id`),
      ADD KEY `title` (`title`),
      ADD KEY `url` (`url`);
    

    Finally, we’ll set an AUTO INCREMENT on the ID field of the table

    ALTER TABLE `queue`
      MODIFY `id` bigint(20) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=1;
    COMMIT;
    

    Create credentials file

    For security reasons, we “include” a details.php file (you can call this whatever you like - just remember to reflect any change of name in the below main script) outside of the web root. We are going to be running this from PHP-CLI anyway, so it shouldn’t be exposed

    details.php in my case is being included like the below - it’s located at the root of my domain, but outside of the web root

    include("/var/www/vhosts/metabullet.com/details.php");

    Your details.php file should contain this

    <?php
    
    // Variables for posting to Twitter
    define('CONSUMER_KEY', 'YOUR_KEY'); 
    define('CONSUMER_SECRET', 'YOUR_SECRET');
    define('ACCESS_TOKEN', 'YOUR_ACCESS_TOKEN');
    define('ACCESS_TOKEN_SECRET', 'YOUR_ACCESS_TOKEN_SECRET);
    
    $header = array(
        "Authorization: Token THE_TOKEN_YOU_GENERATED_EARLIER",
        "Content-Type: application/json",
    );
    
    // Create DB connection
    $servername = "localhost";
    $login = "YOUR_DB_USER";
    $dbpw = "YOUR_DB_PASSWORD";
    $dbname = "feed";
    $conn = new mysqli($servername, $login, $dbpw, $dbname);
    
    // Check connection
    if ($conn->connect_error) {
        die("Connection failed: " . $conn->connect_error);
    } else {
        echo "Connected to database\n";
    }
    ?>
    

    Create the RSS parser script

    Create a new PHP file called rssparser.php - again, located outside of the web root

    <?php
    //use Abraham\TwitterOAuth\TwitterOAuth;
    @$url = $argv[1];
    @$max = $argv[2];
    if (!$url) {
        die("\n ***** You must provide a URL to process *****\n");
    }
    if (!$max) {
        die("\n ***** You must provide a quantity to process *****\n");
    }
    include "details.php";
    require 'vendor/autoload.php';
    $feed = new SimplePie();
    $feed->enable_cache();
    $feed->set_cache_location("/home/phenomlab/system/.cache");
    $feed->force_feed();
    $feed->set_timeout(30);
    $feed->set_feed_url("$url");
    $feed->init();
    $feed->handle_content_type();
    $feed->enable_order_by_date(true);
    $number = $feed->get_item_quantity($max);
    foreach ($feed->get_items(0, $number) as $items) {
        echo "\033[32m\nProcessing story | " . $items->get_title() . "\n\033[0m";
        $description = str_replace("View Entire Post &rsaquo;", "", $items->get_description());
        $description = str_replace("<img", "\n\n<img", $items->get_description());
        $description = str_replace('<img src="', '', $items->get_description());
        $description = str_replace('" />', '', $items->get_description());
        $description = strip_tags(html_entity_decode($items->get_description()), "<img>") . "\n";
        $description .= "\n" . '[Link to original article](' . $items->get_link() . ')' . "\n\n";
        //echo 'Description: ' . $description . "\n";
        $content = $items->get_content(true);
        //echo '[Link to original article](' .$item->get_link() . ')'."\n";
        // Define variables for use later on in the script
        $subject = $items->get_title();
        $body = trim($description);
        $link = $items->get_link();
        // Query the database for each item. Perform action based on results
        $stmt = $conn->prepare('SELECT url, seen FROM queue WHERE url = ?');
        $stmt->bind_param('s', $link);
        $stmt->execute();
        $stmt->store_result();
        $stmt->bind_result($checklink, $seen);
        $stmt->fetch();
        // Test to see if we have processed these before. If we have, skip them to avoid duplicates
        if (!$checklink || !$seen) {
            echo "Checking " . $link . " \nLine item does not exist - \033[32m\[Processing]\n\033[0m ";
            // Processing new items. Insert record into database to prevent duplication on subsequent processing runs
            $seen = 1;
            $stmt = $conn->prepare('INSERT INTO queue (url, title, seen) VALUES(?, ?, ?)');
            $stmt->bind_param("ssi", $link, $subject, $seen);
            $stmt->execute();
            // Process each newly identified unique post into Flarum using the API
            $ch = curl_init();
            curl_setopt($ch, CURLOPT_URL, 'https://hub.phenomlab.net/api/discussions');
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
            curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
            curl_setopt($ch, CURLOPT_POST, 22);
            curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode((array(
                'data' => array(
                    'type' => "discussions",
                    'attributes' => array(
                        'title' => "$subject",
                        'content' => "$body",
                    ),
                    'relationships' => array(
                        'tags' => array(
                            'data' => array(
                                array(
                                    'type' => 'tags',
                                    'id' => "23",
                                ),
                            ),
                        ),
                    ),
                ),
            ))));
            curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
            $result = curl_exec($ch);
            echo $result;
            //$connection = new TwitterOAuth(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET);
            //$status = $subject . ' ' . $link . ' #infosec #security #technology #phenomlab';
            //$post_tweets = $connection->post("statuses/update", ["status" => $status]);
        }
        // Item has already been processed. Continue loop until count exhausted
        else {
            echo "Checking " . $checklink . "\nLine item already processed - \033[33m[Ignored]\n\033[0m";
        }
    }
    

    Important notes

    @$max = $argv[2]; is the number of RSS items that the script will parse for each resource URL

    curl_setopt($ch, CURLOPT_POST, 22); - “22” in this case is the ID of the user I want to post as. This user needs admin rights.

    array(
    'type' => 'tags',
     'id' => "23"
    )
    

    This array tells the Flarum API in which tag to post. In this case, “23” is the ID of the “news” tag.

    Test it !

    To test your script to ensure it’s working, run from the CLI and the working directory of where your files are located. Note, that the RSS URL will need to change to the one you’re interested in targeting, and the number afterwards is the amount of articles you want to pull at once.

    php rssparser.php http://feeds.bbci.co.uk/news/rss.xml 10

    Watch for the output on the screen. The first time this is run, the script will create posts for all new RSS feeds it has no reference for. Whilst each post item is created, the “feed” database is populated so that subsequent runs are not duplicated.

    Now what ?

    I have this rssparser.php scheduled to run every hour.

    Enjoy - let me know if you have any issues getting this to work.

  • phenomlabundefined phenomlab referenced this topic on
  • phenomlabundefined phenomlab referenced this topic on