今天,Twitter 开始在用户信息和时间线(搜索以及单条推文状态仍然使用rest api)上面启用 GraphQL ,同时原本使用的 rest api 失效, 在这里记录一下处理思路,但主要爬虫思路仍以怎么爬Twitter为主

2021-07-01: rest api又能用了

2021-08-15: 这玩意还在暗改,非常不稳定(不少地方加了个 result 字段)

写在前面的警告

根据 Twitter 的 rate-limit,graphql 的允许请求次数是非常少的,每个 guest token 每周期只有 150 次,且全部请求共用次数

1
2
3
4
5
6
7
"graphql": {
"/graphql": {
"limit": 150,
"remaining": 150,
"reset": 1625068800
}
}

获取 queryId

这些id还是存在于 main文件,可以参考以下脚本获取:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<?php
preg_match('/https:\/\/abs\.twimg\.com\/responsive-web\/client-web([^\/]+|)\/main\.[^.]+\.js/', file_get_contents("https://twitter.com/"), $link);

//get js
$jsString = ($link[0]??"");

if ($jsString != "") {
preg_match_all('/{queryId:"([^"]+)",operationName:"([^"]+)",operationType:"([^"]+)"}/', file_get_contents($jsString), $queryIdList);
$list = [];
for ($x = 0; $x < count($queryIdList[0]); $x++) {
$list[$queryIdList[2][$x]] = [
"queryId" => $queryIdList[1][$x],
"operationName" => $queryIdList[2][$x],
"operationType" => $queryIdList[3][$x],
];
}

file_put_contents(__DIR__ . '/graphqlQueryIdList.json', json_encode($list));
}

列表挺长的,我只列出 Twitter Monitor 需要用到的几个,其他请自行寻找用处

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
"UserByRestIdWithoutResults": {
"queryId": "WN6Hck-Pwm-YP0uxVj1oMQ",
"operationName": "UserByRestIdWithoutResults",
"operationType": "query"
},
"UserByScreenNameWithoutResults": {
"queryId": "Vf8si2dfZ1zmah8ePYPjDQ",
"operationName": "UserByScreenNameWithoutResults",
"operationType": "query"
},
"UserTweets": {
"queryId": "FdxZdF5OKM29Chu3szqvYQ",
"operationName": "UserTweets",
"operationType": "query"
},
"UserTweetsAndReplies": {
"queryId": "2Kp5fEiA-6QtZoCKRCcGKg",
"operationName": "UserTweetsAndReplies",
"operationType": "query"
},
}

链接拼接的格式就是

1
let url = `https://twitter.com/i/api/graphql/${queryId}/${operationName}/?variables=` + encodeURIComponent(JSON.stringify(Variables))

用户信息

Request

  • Method: GET

  • URL: https://mobile.twitter.com/i/api/graphql/WN6Hck-Pwm-YP0uxVj1oMQ/UserByScreenNameWithoutResults

  • Headers:

    • Content-Type: application/json
    • x-guest-token: 1232704521454999999
    • authorization: Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA
  • Variables:

    1
    2
    3
    4
    {
    "userId": "USER_ID",
    "withHighlightedLabel": true
    }
  • Method: GET

  • URL: https://mobile.twitter.com/i/api/graphql/Vf8si2dfZ1zmah8ePYPjDQ/UserByRestIdWithoutResults

  • Headers:

    • Content-Type: application/json
    • x-guest-token: 1232704521454999999
    • authorization: Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA
  • Variables:

    1
    2
    3
    4
    {
    "screen_name": "USER_SCREEN_NAME",
    "withHighlightedLabel": true
    }

Response

  • Body

    • success
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    {
    "data": {
    "user": {
    "id": "VXNlcjo3ODMyMTQ=",
    "rest_id": "783214",
    "affiliates_highlighted_label": {},
    "legacy": {
    "blocked_by": false,
    "blocking": false,
    "can_dm": false,
    "can_media_tag": true,
    "created_at": "Tue Feb 20 14:35:54 +0000 2007",
    "default_profile": false,
    "default_profile_image": false,
    "description": "What's happening?!",
    "entities": {
    "description": {
    "urls": []
    },
    "url": {
    "urls": [{
    "display_url": "about.twitter.com",
    "expanded_url": "https://about.twitter.com/",
    "url": "https://t.co/TAXQpsHa5X",
    "indices": [0, 23]
    }
    ]
    }
    },
    "fast_followers_count": 0,
    "favourites_count": 6320,
    "follow_request_sent": false,
    "followed_by": false,
    "followers_count": 59602983,
    "following": false,
    "friends_count": 35,
    "has_custom_timelines": true,
    "is_translator": false,
    "listed_count": 87347,
    "location": "everywhere",
    "media_count": 2254,
    "muting": false,
    "name": "Twitter",
    "normal_followers_count": 59602983,
    "notifications": false,
    "pinned_tweet_ids_str": [],
    "profile_banner_extensions": {
    "mediaColor": {
    "r": {
    "ok": {
    "palette": [{
    "percentage": 55.4,
    "rgb": {
    "blue": 247,
    "green": 161,
    "red": 17
    }
    }, {
    "percentage": 25.44,
    "rgb": {
    "blue": 42,
    "green": 32,
    "red": 22
    }
    }, {
    "percentage": 13.86,
    "rgb": {
    "blue": 161,
    "green": 165,
    "red": 165
    }
    }, {
    "percentage": 3.27,
    "rgb": {
    "blue": 93,
    "green": 103,
    "red": 103
    }
    }, {
    "percentage": 0.96,
    "rgb": {
    "blue": 192,
    "green": 170,
    "red": 107
    }
    }
    ]
    }
    }
    }
    },
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/783214/1619544410",
    "profile_image_extensions": {
    "mediaColor": {
    "r": {
    "ok": {
    "palette": [{
    "percentage": 84.2,
    "rgb": {
    "blue": 240,
    "green": 155,
    "red": 30
    }
    }, {
    "percentage": 14.53,
    "rgb": {
    "blue": 255,
    "green": 255,
    "red": 255
    }
    }, {
    "percentage": 1.4,
    "rgb": {
    "blue": 240,
    "green": 198,
    "red": 130
    }
    }
    ]
    }
    }
    }
    },
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/1354479643882004483/Btnfm47p_normal.jpg",
    "profile_interstitial_type": "",
    "protected": false,
    "screen_name": "Twitter",
    "statuses_count": 14285,
    "translator_type": "regular",
    "url": "https://t.co/TAXQpsHa5X",
    "verified": true,
    "want_retweets": false,
    "withheld_in_countries": []
    }
    }
    }
    }
    • failure
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    {
    "data": {
    "user": {
    "id": "VXNlcjoyNTA3Mzg3Nw==",
    "rest_id": "25073877",
    "affiliates_highlighted_label": {},
    "legacy_extended_profile": {},
    "is_profile_translatable": true
    }
    },
    "errors": [{
    "message": "Authorization: User has been suspended. (63)",
    "path": ["user", "legacy"],
    "locations": [{
    "line": 18,
    "column": 3
    }
    ],
    "source": "Client",
    "code": 63,
    "kind": "Permissions",
    "tracing": {
    "trace_id": "000d5049004b1788"
    },
    "extensions": {
    "source": "Client",
    "code": 63,
    "kind": "Permissions",
    "tracing": {
    "trace_id": "000d5049004b1788"
    }
    }
    }
    ]
    }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    {
    "data": {
    "user": {
    "id": "VXNlcjo3",
    "rest_id": "7",
    "affiliates_highlighted_label": {}
    }
    },
    "errors": [{
    "message": "_Missing: User not found.",
    "path": ["user", "legacy"],
    "locations": [{
    "line": 18,
    "column": 3
    }
    ],
    "source": "Server",
    "code": 50,
    "kind": "NonFatal",
    "tracing": {
    "trace_id": "00cefc2a00e86206"
    },
    "extensions": {
    "source": "Server",
    "code": 50,
    "kind": "NonFatal",
    "tracing": {
    "trace_id": "00cefc2a00e86206"
    }
    }
    }
    ]
    }
  • 与旧版相比基本没有什么改变,只需要修改两点,下面是前后对比:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    //rest api
    const userInfo = ...//取得信息
    let id_str = user_info.id_str;
    let user_info = user_info;

    //GraphQL
    const userInfo = ...//通过上述手段取得信息
    let id_str = user_info.data.user.rest_id;
    let user_info = user_info.data.user.legacy;

推文内容

时间线

  • Method: GET

  • URL: https://mobile.twitter.com/i/api/graphql/FdxZdF5OKM29Chu3szqvYQ/UserTweets

  • Headers:

    • Content-Type: application/json
    • x-guest-token: 1232704521454999999
    • authorization: Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA
  • Variables:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    {
    "userId": "USER_ID",
    "count": 20,
    "withHighlightedLabel": true,
    "withTweetQuoteCount": true,
    "includePromotedContent": true,
    "withTweetResult": false,
    "withReactions": false,
    "withUserResults": false,
    "withVoice": false,
    "withNonLegacyCard": true,
    "withBirdwatchPivots": false,
    "cursor": "CURSOR"
    }

Response

  • Body

    • success
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    {
    "data": {
    "user": {
    "result": {
    "__typename": "User",
    "timeline": {
    "timeline": {
    "instructions": [{
    "type": "TimelineAddEntries",
    "entries": [{
    "entryId": "tweet-1391857183801974794",
    "sortIndex": "1391857183801974794",
    "content": {
    "entryType": "TimelineTimelineItem",
    "itemContent": {
    "itemType": "TimelineTweet",
    "tweet": {
    "rest_id": "1391857183801974794",
    "core": {
    "user": {
    "id": "VXNlcjo3ODMyMTQ=",
    "rest_id": "783214",
    "affiliates_highlighted_label": {},
    "legacy": {
    "blocked_by": false,
    "blocking": false,
    "can_dm": false,
    "can_media_tag": true,
    "created_at": "Tue Feb 20 14:35:54 +0000 2007",
    "default_profile": false,
    "default_profile_image": false,
    "description": "What's happening?!",
    "entities": {
    "description": {
    "urls": []
    },
    "url": {
    "urls": [{
    "display_url": "about.twitter.com",
    "expanded_url": "https://about.twitter.com/",
    "url": "https://t.co/TAXQpsHa5X",
    "indices": [
    0,
    23
    ]
    }
    ]
    }
    },
    "fast_followers_count": 0,
    "favourites_count": 6318,
    "follow_request_sent": false,
    "followed_by": false,
    "followers_count": 59500431,
    "following": false,
    "friends_count": 35,
    "has_custom_timelines": true,
    "is_translator": false,
    "listed_count": 87330,
    "location": "everywhere",
    "media_count": 2254,
    "muting": false,
    "name": "Twitter",
    "normal_followers_count": 59500431,
    "notifications": false,
    "pinned_tweet_ids_str": [],
    "profile_banner_extensions": {
    "mediaColor": {
    "r": {
    "ok": {
    "palette": [{
    "percentage": 55.4,
    "rgb": {
    "blue": 247,
    "green": 161,
    "red": 17
    }
    }, {
    "percentage": 25.44,
    "rgb": {
    "blue": 42,
    "green": 32,
    "red": 22
    }
    }, {
    "percentage": 13.86,
    "rgb": {
    "blue": 161,
    "green": 165,
    "red": 165
    }
    }, {
    "percentage": 3.27,
    "rgb": {
    "blue": 93,
    "green": 103,
    "red": 103
    }
    }, {
    "percentage": 0.96,
    "rgb": {
    "blue": 192,
    "green": 170,
    "red": 107
    }
    }
    ]
    }
    }
    }
    },
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/783214/1619544410",
    "profile_image_extensions": {
    "mediaColor": {
    "r": {
    "ok": {
    "palette": [{
    "percentage": 84.2,
    "rgb": {
    "blue": 240,
    "green": 155,
    "red": 30
    }
    }, {
    "percentage": 14.53,
    "rgb": {
    "blue": 255,
    "green": 255,
    "red": 255
    }
    }, {
    "percentage": 1.4,
    "rgb": {
    "blue": 240,
    "green": 198,
    "red": 130
    }
    }
    ]
    }
    }
    }
    },
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/1354479643882004483/Btnfm47p_normal.jpg",
    "profile_interstitial_type": "",
    "protected": false,
    "screen_name": "Twitter",
    "statuses_count": 14285,
    "translator_type": "regular",
    "url": "https://t.co/TAXQpsHa5X",
    "verified": true,
    "want_retweets": false,
    "withheld_in_countries": []
    }
    }
    },
    "legacy": {
    "created_at": "Mon May 10 20:46:26 +0000 2021",
    "conversation_id_str": "1391857183801974794",
    "display_text_range": [
    0,
    35
    ],
    "entities": {
    "user_mentions": [],
    "urls": [],
    "hashtags": [],
    "symbols": []
    },
    "favorite_count": 60375,
    "favorited": false,
    "full_text": "your Twitter personality in one pic",
    "is_quote_status": false,
    "lang": "en",
    "quote_count": 39412,
    "reply_count": 14356,
    "retweet_count": 3895,
    "retweeted": false,
    "source": "<a href=\"https://www.sprinklr.com\" rel=\"nofollow\">Sprinklr</a>",
    "user_id_str": "783214",
    "id_str": "1391857183801974794"
    }
    },
    "tweetDisplayType": "Tweet"
    }
    }
    }, {
    "entryId": "cursor-top-1394557381009997825",
    "sortIndex": "1394557381009997825",
    "content": {
    "entryType": "TimelineTimelineCursor",
    "value": "HCaAgICglKO72iYAAA==",
    "cursorType": "Top"
    }
    }, {
    "entryId": "cursor-bottom-1372263069125111809",
    "sortIndex": "1372263069125111809",
    "content": {
    "entryType": "TimelineTimelineCursor",
    "value": "HBaEwLfZtP+giyYAAA==",
    "cursorType": "Bottom",
    "stopOnEmptyResponse": true
    }
    }
    ]
    }
    ],
    "responseObjects": {
    "feedbackActions": []
    }
    }
    }
    }
    }
    },
    "errors": [{
    "message": "Authorization: User has been suspended. (63)",
    "path": ["user","result","timeline","timeline","instructions",0,"entries",14,"content","itemContent","tweet","core"],
    "locations": [{"line": 980,"column": 3}],
    "source": "Client",
    "code": 63,
    "kind": "Permissions",
    "tracing": {"trace_id": "009692e800adbf03"},
    "extensions": {
    "source": "Client",
    "code": 63,
    "kind": "Permissions",
    "tracing": {"trace_id": "009692e800adbf03"}
    }
    }
    ]
    }
    • failure
    1
    2
    3
    4
    5
    6
    7
    8
    9
    {
    "data": {
    "user": {
    "result": {
    "__typename": "UserUnavailable"
    }
    }
    }
    }
  • count 默认值为 20
  • cursor 后面会提及获取方式,可不填,不填则获取最近的20条
  • userId 为用户的数字 UID

Tweets

此处出现大量的结构变化,虽然初次处理很烦,但一劳永逸

这次更新最明显的特征就是合并了 globalObjectstimeline,在新版全部timeline信息都在 JSON.data.user.result.timeline.timeline.instructions[0].entries,向上向下刷新用的cursor仍然位于最后两个节点。

以下以 NODE 代称 JSON.data.user.result.timeline.timeline.instructions[0].entries 的一个节点

因为合并了全部内容,所以每个节点内不再是纯粹的推文,需要判断 NODE.content.entryType 值是否为 TimelineTimelineItem ,如不是则可能是各种乱七八糟的用户推荐或者广告

下面是一些常用组件的迁移方向

  • 所有原本在 globalObjects.tweets 内节点的内容都被移至 NODE.content.itemContent.tweet.legacy
  • 以前需要到 globalObjects.users 寻找到用户信息也被移到 NODE.content.itemContent.tweet.core.user.legacy
  • 以前被视为独立的转推推文(位于 globalObjects.tweets )被移到 NODE.content.itemContent.tweet.legacy.retweeted_status
  • 被引用的推文从 globalObjects.tweets 转移到 NODE.content.itemContent.tweet.quoted_status
  • 转推的原始推文信息被移动到了 NODE.content.itemContent.tweet.legacy.retweeted_status.legacy,不使用原始推文会丢失所有 extended_entities 的内容,同时各种 hashtag、url 等文字的替换会出现位置错误的问题
  • 转推的媒体被转移到 NODE.content.itemContent.tweet.legacy.retweeted_status.legacy.extended_entities.media

Cards

卡片转移到 NODE.content.itemContent.tweet.card.legacy

原本我以为会很复杂,其实还是不需要做大量变动,如果以前有写过这部分处理就会发现卡片的内容被移到 legacy,所以可以重新将binding_values改为以前的kv对模式:

1
2
3
4
5
6
7
8
9
10
11
12
13
//重新将 Array 改回 Object
$tmpBindingValueList = [];
foreach ($cardInfo["binding_values"] as $bindingValue) {
$tmpBindingValueList[$bindingValue["key"]] = $bindingValue["value"];
}
$cardInfo["binding_values"] = $tmpBindingValueList;

//这是改成 graphql 的代码
//$tmpList = [];
//foreach ($cardInfo["binding_values"] as $key => $value) {
// $tmpList[] = ["key" => $key, "value" => $value];
//}
//$cardInfo["binding_values"] = $tmpList;

Errors

以前判断挺轻松的,只需要判断有没有errors就行了,现在需要判断不存在data.user.result.timeline,错误原因出现在data.user.result.__typename

twitter会偷懒,现在错误原因基本都是 Something went wrong……

感想

2021-05-12 本是 Twitter Monitor 两周年,结果 Twitter 给我送了一个炸了一天的大礼包

booooooooooom

评论