Season of KDE 2026: Week 1 Progress for Automating Promo Data Collection
Hi all! I'm CJ, and I'm participating in Season of KDE 2026 by automating portions of the data collection for the KDE promo team. This post is an update on the work I've done in the first week of SoK.
My mentor gave me a light task to help me get set up and familiarize myself with the tools I'll be using for the rest of the project. The task was to automate the population of a spreadsheet that tracks follower and post counts for X (formerly known as Twitter), Mastodon, BlueSky, and Threads.
The spreadsheet takes the follower and post counts of some of KDE's social media platforms and makes calculations based off that data. Important things to note:
- data from the sites is entered manually
- there are a lot of styles and formulas in the sheet
Fetching Account Data
Grabbing data was mostly no trouble. Mastodon and BlueSky were especially easy to work with. They have a public and well documented API that lets people collect all kinds of data in human-readable formats. One particular endpoint from both sources output account information, including follower and post counts, for a given account in neat JSON files (BlueSky, Mastodon). All it took were GET requests to these endpoints and it was smooth sailing.
X and Threads proved a bit more finnicky. Both of their APIs limit access to much of their functionality usually meaning webscraping methods are the most accessible for grabbing public account data. Threads shows users' follower counts out directly on an account's landing page, so processing a GET request to the URL of KDE's Threads account made it easy to grab. The problem is that there seems to be no direct way to grab the post count either through their API or with webscraping methods. For now, we've chosen to leave that be and circle back when I explore Threads more in the future. X presents a similar problem but there is an open-source frontend alternative named Nitter, instances of which lay all the stats information out in the open. The reliability of this method depends on public Nitter instances being available so it may be worth coming back around to this in a later part of the project, but for now it's a viable solution for getting follower and post counts.
Inputting the Data Into the Spreadsheet
With the data all fetched, all that was left is to add that data to the ODF spreadsheet. I had this down as the easy part of the task but in the end it wasn't so simple. The two major Python packages I found that can interpret and write ODF files: Pandas and pyexcel. Both of these have no problem reading data from the files, but when it comes to saving they don't preserve some elements of the spreadsheet. In the end we went the simple route which is to save the data to a separate ODF file using one of the Python-ODF interfaces and import that into the data sheet. This took a little finagling with formulas to get things working without popping errors into cells the sheet, but in the end we have an output ODF spreadsheet file containing the required data and the original spreadsheet with all the calculations pulling that data into its formulas, removing any requirement of a human interfacing with this portion of data collection.
Learned Lessons
I feel like this week's task was a great first step into data collection automation. It was challenging without being too difficult to make progress on and forced me to explore different avenues for gathering data. On the confidence side, getting a (mostly) successful task out the gate helped me feel more comfortable with the tools and processes that will likely appear throughout the entirety of my SoK experience. Things will scale up from here on out though so I'm also keeping myself in check.
From what I understand some of the most difficult parts of automated data collection come through having to interface with Javascript and not getting banned, both of which I've yet to come face-to-face with in any substantial capacity so far. Along with that I've face unexpected problems, such as the issue with modifying ODF files and that some websites don't play as well with certain browsers, which I don't have an easy way to test for yet. With these in mind I'm trying to tread lightly and be diligent with research and good practice as I continue on.