If people are resorting to scraping then clearly the official API isn't fit for their purposes.
As an example, here are two key features that are missing from the Twitter API at the moment:
- Bookmarks. You can privately bookmark tweets on the Twitter website and apps. There is no way to access the list of tweets you have bookmarked in the API.
- Threads. The concept of threads - where a tweet has replies from the same author get special treatment in terms of display - is key to how Twitter is used today. The official API doesn't support them, in that there is no way to look at a tweet and see that there exists a threaded tweet reply.
There is no good commercial reason for excluding either of these features from the public API, other than that Twitter have made a strategic decision not to invest resources in expanding the API to keep up with new features they are adding to the platform.
Given that, is it any surprise that people are resorting to scraping?
You say "maybe they have a reason for not providing everything" - I cannot think of a reason not to provide me with API access to my own private bookmarks other than "we decided to invest our engineering resources elsewhere".
Which isn't a bad reason! But it's not a good argument for people not to scrape their own data.
Everything Twitter used to and not anymore enable third party profit off “their valuable” data. It’s economically/strategically unsustainable or so they think.
I doubt it. People who write scrapers are talented engineers. Talented engineers understand the limitations of scraping compared to using an official API, and know to try to get things done with the official API first.
The Twint README calls out reasons for going beyond the API at the start - things like Twitter's increasingly strict rate limits and the limit of only 3,200 historic tweets for a user account.
I've been maintaining a browser extension[1] that has used various versions of the unofficial API over a period of six years -- from literally navigating the site as the user and grabbing HTML, to the various incarnations of JSON and HTML hybrid (SSR) APIs that their web and mobile clients have used internally over that time.
Believe me, I would LOVE if the official API supported threads. I have tried several times to make it work, but the official APIs are stuck in a circa-2012 idea of how Twitter works. Replies just aren't a thing to it.
I totally agree, writing scrapers doesn't necessarily make one a talented engineer. You may often find it easier to write a scrapers than going through the API documentation or series of authentication procedures.
As an example, here are two key features that are missing from the Twitter API at the moment:
- Bookmarks. You can privately bookmark tweets on the Twitter website and apps. There is no way to access the list of tweets you have bookmarked in the API.
- Threads. The concept of threads - where a tweet has replies from the same author get special treatment in terms of display - is key to how Twitter is used today. The official API doesn't support them, in that there is no way to look at a tweet and see that there exists a threaded tweet reply.
There is no good commercial reason for excluding either of these features from the public API, other than that Twitter have made a strategic decision not to invest resources in expanding the API to keep up with new features they are adding to the platform.
Given that, is it any surprise that people are resorting to scraping?