Blog

怎么爬 Twitter (GraphQL)

2021-05-12

#Twitter
#Twitter Monitor
#Twitter Graphql
#Twitter Api

上接 怎么爬Twitter

目前常用 Twitter 接口状态

名称ResufulGraphql备注
UserInfooo
Searcho?印象中Search曾短暂使用过Graphql,但不确定
TimelinexoRestful会无限429
Statusoo

Twitter 混用 Graphql api(以下简称graphql) 和 Restful(以下简称 restful 或 rest) 有很长一段时间了,虽然我写这篇文章的时候只是启用了时间线,但是现在又逐渐在主题帖、用户信息以及…… NFT 头像信息上面动手脚,我觉得这玩意迟早会替代掉 restful ,而最近重爬了 Twitter Monitor 的所有推文数据,修理了不少以前留下来的bug,顺便 restful 时间线开始无限429,翻各种 issue 都没人解答,我觉得是时候准备迁移了

于是开始整理这边的文章

RATE LIMIT

类型次数备注
UserByRestId500
UserByScreenName500
UserTweets500
TweetDetail500conversation,取得投票结果需要这个接口
AudioSpaceById500
BroadCast187好奇怪的数字
Search250* 搜索接口并不使用graphql
Recommendation60就是那个 "你可能会喜欢"

疑似 graphql api 一律限制 500,有效期从 3 小时砍到 15 分钟

UserInfo

由于这边的函数自带 multi_curl,所以我写得轻松一点

<?php
require(__DIR__ . '/init.php');

$fetch = new Tmv2\Fetch\Fetch();
$token = $fetch->tw_get_token();
$count = 0;
$change_count = 0;

$tmp = array_fill(0, 50, 783214);//"twitter"
$end = false;
for(;;) {
    $users = $fetch->tw_get_userinfo($tmp, $token);
    foreach ($users as $user) {
        if ($user === NULL || isset($user["errors"])) {
            $token = $fetch->tw_get_token();
            $change_count++;
            echo "change token $change_count: -->" . $token[1] . "<--\n";
            break;
        }
        $count++;
        $tmpInfo = path_to_array("user_info_legacy", $user);
        echo "-->" . $count . ' '. $tmpInfo["name"] .' ('. $tmpInfo["screen_name"] .")<--\n";
    }
}
//UserByRestId 
//-->498 Twitter (Twitter)<--
//-->499 Twitter (Twitter)<--
//-->500 Twitter (Twitter)<--
//change token 1: -->1495801929439535111<--

//UserByScreenName
//-->998 Twitter (Twitter)<--
//-->999 Twitter (Twitter)<--
//-->1000 Twitter (Twitter)<--
//change token 1: -->1495802088294690816<--

TimeLine

由于Graphql接口比较慢(估计是生成过程的优化实在顶不住大量数据的混合),单线程循环跑起来很耗时间,我写了一个脚本,以10并发请求100条最新推文尝试找到这个极限。

<?php
require(__DIR__ . '/init.php');//这个是Twitter Monitor的init.php
$fetch = new Tmv2\Fetch\Fetch();
$token = $fetch->tw_get_token();
$count = 0;
$tweet_count = 0;
$graphqlObject = [
    "userId" => 783214,
    "count" => 100,
    "withHighlightedLabel" => true,
    "withTweetQuoteCount" => true,
    "withQuickPromoteEligibilityTweetFields" => true,
    "withSuperFollowsUserFields" => true,
    "withSuperFollowsTweetFields" => true,
    "withDownvotePerspective" => false,
    "withReactionsMetadata" => false,
    "includePromotedContent" => true,
    "withReactionsPerspective" => false,
    "withTweetResult" => false,
    "withReactions" => false,
    "withUserResults" => false,
    "withVoice" => true,
    "withNonLegacyCard" => true,
    "withBirdwatchPivots" => false,
    "withV2Timeline" => false
];
$tmp = array_fill(0, 9, "https://twitter.com/i/api/graphql/" . queryhqlQueryIdList["UserTweets"]["queryId"] . "/UserTweets?variables=" . urlencode(json_encode($graphqlObject)));
$end = false;
for(;;) {
    $tweets = $fetch->tw_fetch_multi($tmp, $token);
    foreach ($tweets as $tweet) {
        $generateTweetData = new Tmv2\Core\Core($tweet, true, [], false);
        echo "-->" . $count .'-' . $tweet_count . ' '. $generateTweetData->cursor["top"] .' '. $generateTweetData->cursor["bottom"] ."<--\n";
        if ($generateTweetData->errors[0] !== 0) {
            $end = true;
            break;
        }
        $tweet_count += 100;
        $count++;
    }
    if ($end) {
        break;
    }
}
//一次性代码追求什么性能和漂亮,能跑就行
//输出
//...
//-->997-99700 HCaAgIDEm/+RuSkAAA== HBaQgLnJntzU4CUAAA==<--
//-->998-99800 HCaAgICkmv+RuSkAAA== HBaQgLnJntzU4CUAAA==<--
//-->999-99900  <--

最后发现998是最后一次能显示cursor,到999就没了,但这个现实是从0开始的,所以暂且认为 Timeline的极限是 999 次,再多就需要更换guest-token了,频繁更换guest-token可能会导致429,这时需要考虑用 代理池/多IP/分布式 等方法

Token 池

由于一个guest-token有使用次数和有效期(10800s)的限制,所以制作一个token池是可行的,我正在尝试制作一个 Token 池,做完将会补充此段

queryId

这些id还是存在于 main文件,可以参考以下脚本获取:

之前的脚本已经失效,新的获取方式请参考 BANKA2017/twitter-monitor ~/apps/scripts/updateQueryIdList.mjs,如果要用其他语言重构需要注意以下几点:

  • 必须要设置合理的 User-Agent,直接用curl或者axios这种会返回错误的信息
  • 这个脚本不稳定,未来可能会再次失效,需要持续关注

列表挺长的,我只列出 Twitter Monitor 需要用到的几个,其他请自行寻找用处

{
  "UsersByRestIds": {
    "queryId": "I5nvpI91ljifos1Y3Lltyg",
    "operationName": "UserByRestId",
    "operationType": "query"
  },
  "UserByScreenName": {
    "queryId": "7mjxD3-C6BxitPMVQ6w0-Q",
    "operationName": "UserByScreenName",
    "operationType": "query"
  },
  "UserTweets": {
    "queryId": "LNhjy8t3XpIrBYM-ms7sPQ",
    "operationName": "UserTweets",
    "operationType": "query"
  },
  "UserTweetsAndReplies": {
    "queryId": "Vg5aF036K40ST3FWvnvRGA",
    "operationName": "UserTweetsAndReplies",
    "operationType": "query"
  },
  "TweetDetail": {
    "queryId": "bRL1YYMraLIBpo1PGLeFcw",
    "operationName": "TweetDetail",
    "operationType": "query"
  },
}

链接拼接的格式就是

let url = `https://twitter.com/i/api/graphql/${queryId}/${operationName}/?variables=` + encodeURIComponent(JSON.stringify(Variables))

这些queryId可能会被更新或者删除,但暂时没发现使用旧queryId会造成什么不良影响

2022.09.06 更新

这些 queryId 与请求时的 features 参数相关,如无必要请务必要不要随意更新,更新后请及时补充相关请求的 features 所需要的参数,若缺少相关参数会返回如下内容

{
  "errors": [
    {
      "message": "The following features cannot be null: responsive_web_enhance_cards_enabled",
      "extensions": {
        "name": "BadRequestError",
        "source": "Client",
        "code": 336,
        "kind": "Validation",
        "tracing": {
          "trace_id": "eeeeeeeeeeeeeeee"
        }
      },
      "code": 336,
      "kind": "Validation",
      "name": "BadRequestError",
      "source": "Client",
      "tracing": {
        "trace_id": "eeeeeeeeeeeeeeee"
      }
    }
  ]
}

标注 * 的是非必须

* csrf-token

首先这玩意我真不知道什么环境下才会强制启用,估计是登录以后才会需要,不是必须的,本地生成

//ct0 in cookie
//x-csrf-token in header
const t = (() => {
  const e = window.crypto || window.msCrypto;
  if (!e) return;
  const t = new Uint8Array(32);
  e.getRandomValues(t);
  let n = "";
  for (let e = 0; e < t.length; e++) n +=
    t[e].toString(16).substr(-1);
  return n
})();

从最后生成的结果来看……不就是32位随机字符串嘛,我就直接

echo md5(time());

是的,首次访问会设置,但都不是必须的,我先摆个 pattern 在这里 /set-cookie: ([^;]+);/

guest_id_marketing: v1%3A164301325110776087
guest_id_ads: v1%3A164301325110776087
personalization_id: "v1_FBBNMaLDB1sdu2yWcCdHIQ=="
guest_id: v1%3A164301325110776087

guest-token

  • 通过
    curl 'https://twitter.com' --compressed
    

    此时得到的网页会有以下几行赋予 guest-token,就是那个 gt
    <script nonce="MDRjZmJlNWItYWNmOC00MTdiLWIxYjUtYTFhZTUyYTc2ODg4">
      document.cookie = decodeURIComponent("gt=1232704521454999999; Max-Age=10800;  Domain=.twitter.com; Path=/; Secure");
    </script>
    
  • curl 'https://api.twitter.com/1.1/guest/activate.json' \
    -X 'POST' \
    -H 'authorization: Bearer   AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjh  LTvJu4FA33AGWWjCpTnA' \
    --compressed
    

    使用这种方式可以顺便取得上面那几个cookie

authorization

Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA

这玩意我就没见它变过

用户信息

Request

  • Method: GET
  • URL:
    • by screen name https://twitter.com/i/api/graphql/7mjxD3-C6BxitPMVQ6w0-Q/UserByScreenName?variables={VARIABLES}
    • by user id https://twitter.com/i/api/graphql/I5nvpI91ljifos1Y3Lltyg/UserByRestId?variables={VARIABLES}
      • VARIABLES:
          {
            "screen_name": "USER_SCREEN_NAME",//by screen name
            "withSafetyModeUserFields": true,
            "withSuperFollowsUserFields": true
          }
        
          {
            "userId": "USER_ID",//by user id
            "withSafetyModeUserFields": true,
            "withSuperFollowsUserFields": true
          }
        
  • Headers:
    • Content-Type: application/json
    • x-guest-token: 1232704521454999999
    • authorization: Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA

Response

  • Body
    • success
    {
      "data": {
        "user": {
          "result": {
            "__typename": "User",
            "id": "VXNlcjo3ODMyMTQ=",
            "rest_id": "783214",
            "affiliates_highlighted_label": {},
            "has_nft_avatar": true,//nft头像的边框是六边形
            "legacy": {
              "blocked_by": false,
              "blocking": false,
              "can_dm": false,
              "can_media_tag": true,
              "created_at": "Tue Feb 20 14:35:54 +0000 2007",
              "default_profile": false,
              "default_profile_image": false,
              "description": "What's happening?!",
              "entities": {
                "description": { "urls": []},
                "url": {
                  "urls": [
                    {
                      "display_url": "about.twitter.com",
                      "expanded_url": "https://about.twitter.com/",
                      "url": "https://t.co/DAtOo6uuHk",
                      "indices": [0, 23]
                    }
                  ]
                }
              },
              "fast_followers_count": 0,
              "favourites_count": 6292,
              "follow_request_sent": false,
              "followed_by": false,
              "followers_count": 60784817,
              "following": false,
              "friends_count": 12,
              "has_custom_timelines": true,
              "is_translator": false,
              "listed_count": 87616,
              "location": "everywhere",
              "media_count": 2439,
              "muting": false,
              "name": "Twitter",
              "normal_followers_count": 60784817,
              "notifications": false,
              "pinned_tweet_ids_str": [],
              "profile_banner_extensions": {
                "mediaColor": {
                  "r": {
                    "ok": {
                      "palette": [
                        { "percentage": 65.52, "rgb": { "blue": 0, "green": 0, "red": 0 }},
                        { "percentage": 18.59, "rgb": { "blue": 221, "green": 144, "red": 6 }},
                        { "percentage": 10.43, "rgb": { "blue": 124, "green": 58, "red": 252 }},
                        { "percentage": 3.27, "rgb": { "blue": 105, "green": 69, "red": 1 }},
                        { "percentage": 0.69,"rgb": { "blue": 89, "green": 44, "red": 153}}
                      ]
                    }
                  }
                }
              },
              "profile_banner_url": "https://pbs.twimg.com/profile_banners/783214/1642704439",
              "profile_image_extensions": {
                "mediaColor": {
                  "r": {
                    "ok": {
                      "palette": [
                        { "percentage": 71.78,"rgb": { "blue": 255, "green": 227, "red": 182}},
                        { "percentage": 11.06,"rgb": { "blue": 255, "green": 192, "red": 90}},
                        { "percentage": 7.59,"rgb": { "blue": 252, "green": 249, "red": 218}},
                        { "percentage": 6.51,"rgb": { "blue": 25, "green": 23, "red": 16}},
                        { "percentage": 0.35,"rgb": { "blue": 254, "green": 204, "red": 1}}
                      ]
                    }
                  }
                }
              },
              "profile_image_url_https": "https://pbs.twimg.com/profile_images/1486805599367180290/Lp3amoqK_normal.jpg",
              "profile_interstitial_type": "",
              "protected": false,
              "screen_name": "Twitter",
              "statuses_count": 14967,
              "translator_type": "regular",
              "url": "https://t.co/DAtOo6uuHk",
              "verified": true,
              "want_retweets": false,
              "withheld_in_countries": []
            },
            "professional": {
              "rest_id": "1420110046596374541",
              "professional_type": "Business",
              "category": []
            },
            "smart_blocked_by": false,
            "smart_blocking": false,
            "super_follow_eligible": false,
            "super_followed_by": false,
            "super_following": false,
            "legacy_extended_profile": {
              "birthdate": { "day": 21, "month": 3, "visibility": "Public", "year_visibility": "Self"}
            },
            "is_profile_translatable": false
          }
        }
      }
    }
    
    • failure
      • 被封禁的 @realDonaldTrump
        {
          "data": {
            "user": {
              "result": {
                "__typename": "UserUnavailable",
                "unavailable_message": {
                  "entities": [
                    {
                      "fromIndex": 28,
                      "toIndex": 32,
                      "ref": {
                        "type": "TimelineUrl",
                        "url": "https://help.twitter.com/rules-and-policies/twitter-rules",
                        "urlType": "ExternalUrl"
                      }
                    }
                  ],
                  "rtl": false,
                  "text": "Twitter 会冻结违反 Twitter 规则的账号。了解更多"
                },
                "reason": "Suspended"
              }
            }
          }
        }
        
      • 不存在的帐号,脸滚键盘打的就不说是谁了
        { "data": {}}//不存在的用户啥都不返回了
        
  • 与旧版相比基本没有什么改变,只需要修改两点,下面是前后对比:
    //rest api
    const userInfo = ...//取得信息
    let id_str = user_info.id_str
    let user_info = user_info
    
    //GraphQL
    const userInfo = ...//通过上述手段取得信息
    let id_str = user_info.data.user.result.rest_id
    let user_info = user_info.data.user.result.legacy
    

关注者和正在关注

关注者

  • Method: GET
  • URL: https://twitter.com/i/api/graphql/neVf0YKN1h09TFZr4D43MA/Followers?variables={VARIABLES}
    • VARIABLES:
      {
        "userId": "USER_ID",
        "count": 20,
        "includePromotedContent": false,
        "withSuperFollowsUserFields": true,
        "withDownvotePerspective": false,
        "withReactionsMetadata": false,
        "withReactionsPerspective": false,
        "withSuperFollowsTweetFields": true,
        "__fs_interactive_text": false,
        "__fs_responsive_web_uc_gql_enabled": false,
        "__fs_dont_mention_me_view_api_enabled": false
      }
      
  • Headers:
    • Content-Type: application/json
    • x-guest-token: 1232704521454999999
    • authorization: Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA

Verified 用户验证

Verified 原本指的是那些在用户名(name)后面带有小蓝勾的用户,一般为政企或名人,需要由Twitter验证

2022 年马斯克收购 Twitter 后开始为 Twitter Blue 用户提供小蓝勾,这种方式在 Twitter 中被称作 Blue Verified,用于校验的字段被写作is_blue_verified,可以从JSON.data.user.result.is_blue_verified验证

至此,只要用户符合Blue Verified或者原版Verified其中一种就可以获得小蓝勾

Twitter 又加了一种小金标,只要字段 ext_verified_type 值为 Business 即可展示小金标,在此以前 Twitter 借用了为各国官媒添加标记的位置来标识此类账号。目前暂时还不知道这个字段还能有什么值

同时,使用了新的 GrapHQL QueryID 查询 UsersVerifiedAvatars 接口即可批量查询用户是否取得Blue Verified,这个接口原本用于查询用户是否拥有 NFT 头像

另外有人写了浏览器插件用于快速查成分

  • Method: GET
  • URL: https://twitter.com/i/api/graphql/AkfLpq1RURPtDOcd56qyCg/UsersVerifiedAvatars?variables={VARIABLES}&features={FEATURES}
    • VARIABLES:
        {
          "userIds": ["uid1", "uid2", "uid3"]//and more...
        }
      
    • FEATURES:
        {
          "responsive_web_twitter_blue_verified_badge_is_enabled": true
        }
      
  • Headers:
    • Content-Type: application/json
    • x-guest-token: 1232704521454999999
    • authorization: Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA

Response

  • Body
    • success
     {
        "result": {
          "__typename": "User",
          "is_blue_verified": true,
          "has_nft_avatar": false,
          "rest_id": "1511811738076856322"
        }
      }
    
    • failure
    { "code": 366, "message": "NumericString value expected. Received " }
    

推文内容

时间线

  • Method: GET
  • URL: https://twitter.com/i/api/graphql/LNhjy8t3XpIrBYM-ms7sPQ/UserTweets?variables={VARIABLES}&features={FEATURES}
    • VARIABLES:
        {
          "userId": "USER_ID",
          "count": 20,//这个值不宜过大,会导致503,Twitter Monitor 默认最大配置为500
          "withHighlightedLabel": true,
          "withTweetQuoteCount": true,
          "includePromotedContent": true,
          "withTweetResult": false,
          "withReactions": false,
          "withUserResults": false,
          "withVoice": false,
          "withNonLegacyCard": true,
          "withBirdwatchPivots": false,
          "cursor": "CURSOR"
        }//TODO timeline_v2
      
    • FEATURES:
        {
          "dont_mention_me_view_api_enabled": true,
          "interactive_text_enabled": true,
          "responsive_web_uc_gql_enabled": false,
          "vibe_tweet_context_enabled": false,
          "responsive_web_edit_tweet_api_enabled": false,
          "standardized_nudges_misinfo": false,
          "responsive_web_enhance_cards_enabled": false
        }
      
  • Headers:
    • Content-Type: application/json
    • x-guest-token: 1232704521454999999
    • authorization: Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA

Response

  • Body
    • success
    //太长了我不想放了
    
    • failure
    {
      "data": {
        "user": {
          "result": {
            "__typename": "UserUnavailable"
          }
        }
      }
    }
    
  • count 默认值为 20,上限为100
  • cursor 后面会提及获取方式,可不填,不填则获取最近的 count
  • userId 为用户的数字 UID,就是上面的 rest_id
  • 请求里面的 features 其实存在很久了,直到最新的接口不加上就不返回内容了……
  • 最终可获取推文量为850

Tweets

此处出现大量的结构变化,虽然初次处理很烦,但一劳永逸

上面那句话就是扯淡,实际上暗改更多了

这次更新最明显的特征就是合并了 globalObjectstimeline

在新版全部timeline信息都在 JSON.data.user.result.timeline.timeline.instructions[0].entries或者JSON.data.user.result.timeline.timeline.instructions[1].entries,主要取决于TimelineClearCache有无出现

{
  "instructions": [
    { "type": "TimelineClearCache" },
    { "type": "TimelineAddEntities", "entries": [...] },
  ]
}
  • *TimelineClearCache 估计是拿来清理不需要的节点,比如删推就可以通过此处清理,我猜的,因为没实践过
  • TimelineAddEntities 时间线上的所有信息都在这个节点的 entries 节点内

以下以 NODE 代称 JSON.data.user.result.timeline.timeline.instructions[1].entries 的一个节点

cursor

向上向下刷新用的cursor仍然位于最后两个NODE节点。

tweets

因为合并了全部内容,所以每个节点内不再是纯粹的推文,需要判断 NODE.content.entryType 的值是否为 TimelineTimelineItem ,如不是则可能是各种乱七八糟的用户推荐或者广告

下面是一些常用组件的迁移方向

  • 所有原本在 globalObjects.tweets 内节点的内容都被移至 NODE.content.itemContent.tweet_results.result.legacy,但tweet_id被转移到 NODE.content.itemContent.tweet_results.result.rest_id
  • 以前需要到 globalObjects.users 寻找到用户信息也被移到 NODE.content.itemContent.tweet.core.user_results.result.legacytweet_id被转移到 NODE.content.itemContent.tweet.core.user_results.result.rest_id
  • 以前被视为独立的转推推文(位于 globalObjects.tweets )被移到 NODE.content.itemContent.tweet_results.result.legacy.retweeted_status_result.result
  • 被引用的推文从 globalObjects.tweets 转移到 NODE.content.itemContent.tweet.quoted_status_result.result
  • 转推的原始推文信息被移动到了 NODE.content.itemContent.tweet.legacy.retweeted_status.legacy,不使用原始推文会丢失所有 extended_entities 的内容,同时各种 hashtag、url 等文字的替换会出现位置错误的问题(这个等一等,等我买个老花镜来比较它跟上面那个是什么关系)
  • 转推的媒体被转移到 NODE.content.itemContent.tweet.legacy.retweeted_status.legacy.extended_entities.media(好乱啊,让我捋捋)

Cards

卡片转移到 NODE.content.itemContent.tweet_results.result.card.legacy

原本我以为会很复杂,其实还是不需要做大量变动,如果以前有写过这部分处理就会发现卡片的内容被移到 legacy,所以可以重新将binding_values改为以前的kv对模式:

//重新将 Array 改回 Object
$tmpBindingValueList = [];
foreach ($cardInfo["binding_values"] as $bindingValue) {
    $tmpBindingValueList[$bindingValue["key"]] = $bindingValue["value"];
}
$cardInfo["binding_values"] = $tmpBindingValueList;

//这是改成 graphql 的代码
//$tmpList = [];
//foreach ($cardInfo["binding_values"] as $key => $value) {
//    $tmpList[] = ["key" => $key, "value" => $value];
//}
//$cardInfo["binding_values"] = $tmpList;

NSFW

这个一般只会在图片处提醒一下,但也在一些地区(比如日本)某些推文整篇都被限制,根据 Notices on Twitter and what they mean,被标记成成人内容的推文会被限制,但不同地区为什么会有不同的标准,我暂且不明白,先放一个例子,这类推文一般无法在非登录状态下取得

2022-11-11 更新

得到这些信息的共同点是使用了新的Bearer Token,关于新旧Bearer Token的异同请看我的另一篇文章

{
  "entryId": "tombstone-1469626851568271362",
  "sortIndex": "1469626851568271362",
  "content": {
    "entryType": "TimelineTimelineItem",
    "itemContent": {
      "itemType": "TimelineTombstone",
      "tombstoneDisplayType": "Inline",
      "tombstoneInfo": {
        "text": "",
        "richText": {
          "rtl": false,
          "text": "年齢制限のある成人向けコンテンツです。このコンテンツは、18歳未満のユーザーには適切でない可能性があります。このメディアを表示するには、Twitterにログインしてください。詳細はこちら",
          "entities": [
            {
              "fromIndex": 76,
              "toIndex": 80,
              "ref": {"type": "TimelineUrl","url": "https://twitter.com","urlType": "ExternalUrl"}
            },
            {
              "fromIndex": 87,
              "toIndex": 93,
              "ref": {"type": "TimelineUrl","url": "https://help.twitter.com/rules-and-policies/notices-on-twitter","urlType": "ExternalUrl"}
            }
          ]
        }
      }
    }
  }
}

而媒体资源上的NSFW内容由上传者自行标记,可选的类型包括 裸体、暴力和敏感内容

Errors

TODO 本节待更新

以前判断挺轻松的,只需要判断有没有errors就行了,现在需要判断不存在data.user.result.timeline,错误原因出现在data.user.result.__typename

twitter会偷懒,现在错误原因基本都是 Something went wrong……

致谢

  • Juicpt 指出不少新的变动
  • 评论区的大家

参考

怎么爬 Twitter (GraphQL)

https://blog.nest.moe/posts/how-to-crawl-twitter-with-graphql

转载或引用本文时请遵守知识共享署名许可


评论区