If people are resorting to scraping then clearly the official API isn't fit for ...

simonw · on April 13, 2020

You say "maybe they have a reason for not providing everything" - I cannot think of a reason not to provide me with API access to my own private bookmarks other than "we decided to invest our engineering resources elsewhere".

Which isn't a bad reason! But it's not a good argument for people not to scrape their own data.

dewey · on April 13, 2020

> But it's not a good argument for people not to scrape their own data.

Why would you want to scrape your own data if you an already request all your data and get a whole archive? https://help.twitter.com/en/managing-your-account/how-to-dow...

simonw · on April 13, 2020

Because then you have to trigger and download a GB+ file every time you want to programmatically access your latest bookmarks.

numpad0 · on April 13, 2020

Everything Twitter used to and not anymore enable third party profit off “their valuable” data. It’s economically/strategically unsustainable or so they think.

fireattack · on April 13, 2020

API can only get first 3200 tweets from a user is also a dealbreaker.

scared2 · on April 13, 2020

Or they don't know how to effectively use the API, maybe.

simonw · on April 13, 2020

I doubt it. People who write scrapers are talented engineers. Talented engineers understand the limitations of scraping compared to using an official API, and know to try to get things done with the official API first.

The Twint README calls out reasons for going beyond the API at the start - things like Twitter's increasingly strict rate limits and the limit of only 3,200 historic tweets for a user account.

paulgb · on April 13, 2020

I've been maintaining a browser extension[1] that has used various versions of the unofficial API over a period of six years -- from literally navigating the site as the user and grabbing HTML, to the various incarnations of JSON and HTML hybrid (SSR) APIs that their web and mobile clients have used internally over that time.

Believe me, I would LOVE if the official API supported threads. I have tried several times to make it work, but the official APIs are stuck in a circa-2012 idea of how Twitter works. Replies just aren't a thing to it.

[1] https://github.com/paulgb/Treeverse

scared2 · on April 15, 2020

I don't think this program actually solved the 3,200 limit you mentioned, not sure though.

armitron · on April 13, 2020

People who write robust, performant, maintainable systems that don’t collapse under their own weight are talented engineers.

Most scrapers including this one lack these characteristics.

scared2 · on April 15, 2020

I totally agree, writing scrapers doesn't necessarily make one a talented engineer. You may often find it easier to write a scrapers than going through the API documentation or series of authentication procedures.