Load Google Search Crawler IP Address Ranges with Refit

Load Google Search Crawler IP Address Ranges with Refit

Rate Limit is a great tool to protect your own website and content from misuse; in some cases, however, rate limiting is also bad: for example, if you want your own content to be indexed by Google Search in order to increase your own visibility.

According to the documentation , Google offers various options for this; however, the most secure is the recognition of the search crawler by the IP address, as this is virtually impossible to falsify or can only be falsified with a great deal of effort.

Google Bot Json

Google offers a static Json file for this purpose, in which Google regularly publishes the IP address ranges used by the search crawl bots. The Json format has a basic structure that looks like this:

1 "creationTime": "2024-11-26T15:46:03.000000",
2    "prefixes": [
3        {
4            "ipv6Prefix": "2001:4860:4801:10::/64"
5        },

where besides ipv6Prefix also ipv4Prefix can be defined.

This json can be defined very easily as C# record classes:

1public sealed record class GoogleSearchCrawlerIPAddressResult(
2    [property: JsonPropertyName("creationTime")] DateTimeOffset CreationTime,
3    [property: JsonPropertyName("prefixes")] List<GoogleSearchCrawlerIPAddressRangeItem> IPRanges);
4
5public sealed record class GoogleSearchCrawlerIPAddressRangeItem(
6     [property: JsonPropertyName("ipv6Prefix")] string? IPv6,
7     [property: JsonPropertyName("ipv4Prefix")] string? IPv4);

Refit

To make this as easy as possible to consume using C#, Refit is a good option.

To do this, Refit must be stored as a NuGet package in the project files.

 1<Project Sdk="Microsoft.NET.Sdk">
 2
 3  <PropertyGroup>
 4    <OutputType>Exe</OutputType>
 5    <TargetFramework>net9.0</TargetFramework>
 6    <ImplicitUsings>enable</ImplicitUsings>
 7    <Nullable>enable</Nullable>
 8  </PropertyGroup>
 9
10  <ItemGroup>
11    <PackageReference Include="Refit" Version="8.0.0" />
12  </ItemGroup>
13
14</Project>

A refit interface can then be created with the end point:

1public interface IGoogleSearchCrawlerIPAddressRangesHttpClient
2{
3    [Get("/static/search/apis/ipranges/googlebot.json")]
4    Task<GoogleSearchCrawlerIPAddressResult> GetRanges(CancellationToken cancellationToken);
5}

We can now use the interface to create an instance with Refit and load data:

1IGoogleSearchCrawlerIPAddressRangesHttpClient googleSearchBotIPAddressClient = RestService
2    .For<IGoogleSearchCrawlerIPAddressRangesHttpClient>("https://developers.google.com");
3
4GoogleSearchCrawlerIPAddressResult result = await googleSearchBotIPAddressClient.GetRanges(CancellationToken.None);

You can now work with the data; for example, simply output it:

 1foreach (GoogleSearchCrawlerIPAddressRangeItem entry in result.IPRanges)
 2{
 3    if (entry.IPv4 is not null)
 4    {
 5        Console.WriteLine($"> Found IPv4 range: {entry.IPv4}");
 6    }
 7    if (entry.IPv6 is not null)
 8    {
 9        Console.WriteLine($"> Found IPv6 range: {entry.IPv6}");
10    }
11}

Full example:

 1using System.Text.Json.Serialization;
 2using Refit;
 3
 4Console.WriteLine("Loading IP Addresses from Google Search Bot Json");
 5
 6IGoogleSearchCrawlerIPAddressRangesHttpClient googleSearchBotIPAddressClient = RestService
 7    .For<IGoogleSearchCrawlerIPAddressRangesHttpClient>("https://developers.google.com");
 8
 9GoogleSearchCrawlerIPAddressResult result = await googleSearchBotIPAddressClient.GetRanges(CancellationToken.None);
10
11Console.WriteLine($"Found {result.IPRanges.Count} entries from {result.CreationTime:o}.");
12
13foreach (GoogleSearchCrawlerIPAddressRangeItem entry in result.IPRanges)
14{
15    if (entry.IPv4 is not null)
16    {
17        Console.WriteLine($"> Found IPv4 range: {entry.IPv4}");
18    }
19    if (entry.IPv6 is not null)
20    {
21        Console.WriteLine($"> Found IPv6 range: {entry.IPv6}");
22    }
23}
24
25Console.WriteLine("Finished.");
26
27
28// Refit definition
29
30public interface IGoogleSearchCrawlerIPAddressRangesHttpClient
31{
32    [Get("/static/search/apis/ipranges/googlebot.json")]
33    Task<GoogleSearchCrawlerIPAddressResult> GetRanges(CancellationToken cancellationToken);
34}
35
36public sealed record class GoogleSearchCrawlerIPAddressResult(
37    [property: JsonPropertyName("creationTime")] DateTimeOffset CreationTime,
38    [property: JsonPropertyName("prefixes")] List<GoogleSearchCrawlerIPAddressRangeItem> IPRanges);
39
40public sealed record class GoogleSearchCrawlerIPAddressRangeItem(
41     [property: JsonPropertyName("ipv6Prefix")] string? IPv6,
42     [property: JsonPropertyName("ipv4Prefix")] string? IPv4);

Comments

Twitter Facebook LinkedIn WhatsApp