Dissecting Tor Bridges and Pluggable Transport – Part I: Finding the Built-in Tor Bridges and How Tor Browser Works
A FortiGuard Labs Threat Research Report
Background
At the SecureWV 2019 Cybersecurity Conference, held in Charleston, West Virginia, Peixue and I presented our talk “Dissecting Tor Bridges and Pluggable Transport.” We are now sharing more details of this research, with our analysis being posted in two blogs. In this first blog, I will explain how I found built-in Tor bridges and how Tor browser works with Bridge enabled using reverse engineering.
Tor Browser and Tor Network
Tor Browser is a tool that provides anonymous Internet connectivity combined with layers of encryption through the Tor network. When users explore websites using Tor Browser, their real IP address is hidden by the Tor network so that the destination website never knows what the true source IP address is. Users can also set up their own website in the Tor network with a domain name ending with “.onion”. That way, only Tor Browser can access it and nobody knows what its real IP address is. It’s one of the reasons why ransomware criminals require victims to access the payment page on a .onion website through Tor Browser. The Tor project team is aware of this practice because the Tor project blog clearly states that “Tor is misused by criminals.”
Tor Browser is an open source project with a design based on Mozilla Firefox. You can download the source code from its official website. The Tor network is a worldwide overlay network comprising thousands of volunteer-run relays. It consists of two kinds of relay nodes: normal relay nodes and bridge relay nodes. The normal relay nodes are listed in the main Tor directory, and the connections to them can be easily identified and blocked by censors.
The bridge information is defined in the profile file of Firefox, so you can display it by entering “about:config” in the address bar of Tor Browser, as shown in Figure 1.
However, the bridge relay nodes are not listed in the main Tor directory, which means that connections to them can’t be easily blocked by censors. In this blog I will be discussing how to find these bridges and relay nodes using functions built into Tor Browser.
To use a bridge relay in Tor Browser, there are two options. Tor Browser has some built-in bridges for users to choose. If the built-in bridges don’t work, the users can obtain additional bridges from the Tor Network Settings, by visiting https://bridges.torproject.org/, or by sending an email to bridges@bridges.torproject.org.
Analysis Platform
This analysis is done on the following platform, as well as the following Tor Browser version and extensions:
- Windows 7 32-bit SP1
- Tor Browser 8.0
- TorLauncher 0.2.16.3 (one extension)
- Torbutton 2.0.6 (one extension)
Figure 2 shows the version information of Tor Browser that I worked on.
During my analysis, Tor Brower pushed out a new version: Tor Browser 9.0, on October 22, 2019. You can refer to the Appendix of this analysis for more information about it.
Starting Tor Browser with Built-in Bridges
This version of Tor Browser I analyzed provides four kinds of bridges: “obfs4”, “fte”, “meek-azure” and “obfs3”. They are called pluggable transports. You can see the detailed settings in Figure 3.
Obfs4 Bridge is strongly recommended on Tor official website. All of the analysis below is based on this kind of bridge. I chose bridge “obfs4” in the list shown in Figure 3 to start my analysis. Looking into the traffic when Tor Browser makes an “obfs4” connection, I found that the TCP sessions are created by obfs4proxy.exe, which is a bridge client process.
Figure 4 is a screenshot of the process tree when starting Tor Browser with “obfs4”. As you can see, “firefox.exe” starts “tor.exe”, which then starts “obfs4proxy.exe”. The process “obfs4proxy.exe” locates in “Tor_installation_folderBrowserTorBrowserTorPluggableTransports”. Originally, I thought the built-in “obfs4” bridges should be hard-coded inside the “obfs4proxy.exe” process.
Tracing and Tracking Within the Bridge Process “obfs4proxy.exe”
I started the debugger and attached it to “obfs4proxy.exe”. I then set a breakpoint on the API “connect”, which is often used to establish TCP connections. Usually, using reverse engineering could quickly discover the IP addresses and ports from this API. However, I never got it triggered before the connections to “obfs4” bridge were established. After further analysis of the process “obfs4proxy.exe”, I learned it used another API called “MSAFD_ConnectEx” from mswsock.dll instead.
Figure 5 shows that “obfs4proxy.exe” is about to call the API “mswsock.MSAFD_ConnectEx()” to make a TCP connection to a built-in “obfs4” bridge, whose IP address and port are “192.95.36.142:443”. The second argument of this function is a pointer to a structure variable of struct sockaddr_in, which holds the IP address and Port to be connected to. Later on, it calls the APIs “WSASend” and “WSARecv” to communicate with the “obfs4” bridge. As you may have noticed, the debugger OllyDbg could not recognize this API because it is not an export function of “mswsock.dll”. In the IDA Pro’s analysis of mswsock.dll, we can see that the address 750A7842 is just the API of “MSAFD_ConnectEx()”. By the way, the instruction “call dword ptr [ebx]” is used to call almost all the system APIs that “obfs4proxy.exe” needs, which is a way to hide APIs against analysis.
From my analysis, most of the PE files (exe and dll files, like “obfs4proxy.exe”) used by Tor seem to be compiled by the “GCC MINGW-64w compiler”, which always uses “mov [esp], …” to pass arguments to functions instead of “push …” instructions that create trouble for static analysis. By tracing and tracking the call stack flow from “MSAFD_ConnectEx()”, I realized that my original thought was wrong because the built-in IP addresses and Ports are not hard-coded in “obfs4proxy.exe”, but taken from the parent process “tor.exe” through a local loopback TCP connection.
Usually, the third packet from “tor.exe” to “obfs4proxy.exe” contains one built-in obfs4 bridge’s IP address and Port in binary, just like in Figure 6. It is a Socks5 packet that is 0xA bytes long. “05 01 00 01” is a header of its Socks5 protocol, and the rest of the data are the IP address and port in binary. The packet indicates that it asks “obfs4proxy.exe” to make a connection to a bridge with the binary IP address and Port. “obfs4proxy.exe” then parses the packet and converts the binary IP and Port to a string, which in this case is “154.35.22.13:16815”.
Moving to Tor.exe
“tor.exe” uses a third-party module named “libevent.dll”, which is from libevent (an event notification library), to drive Tor to perform its tasks. Tor places most of its socket tasks (connect(), send(), recv() and so on) on events to be automatically called by libevent. When tracing the packet with the bridge’s IP address and Port in “Tor.exe”, you can see in the call stack context that many return addresses are in the module “libevent.dll”. In Figure 7, it paused on “Tor.exe” calling the API “ws2_32.send()” to send the packet containing the bridge’s IP address and Port, just like the received packet shown in Figure 6.
Figure 7 is the “Call stack” window, which shows the return addresses of “libevent.dll”.
Through tracing/tracking of “tor.exe” sending out the bridge’s IP address and Port, I found a place where it starts a new event with a callback function that then sends the bridge’s IP address and Port. The ASM code snippet below shows the context of calling “libevent.event_new()” in “tor.exe”. Its second argument is the socket handle; its third argument is the event action, which is 14H here, standing for EV_WRITE and EV_PERSIST; its fourth argument is a callback function (sub_2833EE for this case); and its fifth argument contains the bridge’s IP address and Port that will be passed to the callback function (sub_2833EE) once it is called by libevent.
The following ASM code snippet is from “tor.exe”, whose base address for this time is 00280000h.
[…]
.text:00281C84 mov edx, eax
.text:00281C86 mov eax, [ebp+var_2C] ;
.text:00281C89 mov [eax+14h], edx
.text:00281C8C mov eax, [ebp+var_2C] ;
.text:00281C8F mov ebx, [eax+0Ch]
.text:00281C92 call sub_5133E0
.text:00281C97 mov edx, eax
.text:00281C99 mov eax, [ebp+var_2C]
.text:00281C9C mov [esp+10h], eax ; argument for callback function
.text:00281CA0 mov [esp+0Ch], offset sub_2833EE ; the callback function
.text:00281CA8 mov [esp+8], 14h ; #define EV_WRITE 0x04|#define EV_PERSIST 0x10
.text:00281CB0 mov [esp+4], ebx ; socket
.text:00281CB4 mov [esp], edx
.text:00281CB7 call event_new ; event_new(event_base, socket, event EV_READ/EV_WRITE, callback_fn, callback_args);
.text:00281CBC mov edx, eax
.text:00281CBE mov eax, [ebp+var_2C]
.text:00281CC1 mov [eax+18h], edx
[…]
- Report: Organizations remain vulnerable to increasing insider threats
- The RCS Texting Protocol Is Way Too Easy to Hack