Reverse engineering Bandcamp authentication protocol

Did you know that the albums you purchase on Bandcamp can disappear from your collection without notice? This can happen for various reasons. For example, a seller might decide on a whim to remove the album from the platform. Bandcamp apparently allows this in their terms of use:

Content you purchase in a Transaction cannot be guaranteed to be available to you perpetually.

Users bear all risk from the denial of access to any Content purchased through the Service.

The only way to make sure your albums stay in your possession is to download them immediately after purchase. Heck, even Bandcamp officially recommends this:

[…] we encourage you to promptly download any Content you purchase through the Site […]

However, even if the album has been removed, and you hadn’t dowloaded it, not all is lost. In the Bandcamp mobile app, you can continue to listen to all your albums (but without the option to download them), even after they’ve been removed from the platform. This obviously means that Bandcamp doesn’t delete the actual content from their servers. And if the app can still access the lost albums, so can everyone who is patient enough to reverse engineer the app. Surprisingly, no one has done this by now. Could it be that it’s impossible? Let’s dive in and see what’s going on inside the Bandcamp app!

Inspecting the network traffic

As always, my first step was to inspect the network traffic between the Bandcamp app and their backend servers. My favorite tool for this purpose has always been Burp Suite Community Edition. After setting up the proxy and opening the collection page in the app, I quickly noticed the following HTTP request in proxy logs:

GET /api/collectionsync/1/collection?page_size=200 HTTP/2
Host: bandcamp.com
Authorization: Bearer MTQ0NjJkZmQ5OTM2NDE1ZTZjNGZmZjI3

This API endpoint returns the information about all your albums (things like album name, band info, release date, purchase date, etc.). Not only that, but it also lists all the album tracks, together with something that looks like high-quality audio URLs:

{
  "token": "1:1700775127:355751800:a",
  "tralbum_id": 355751800,
  "title": "Pinnacle Of Bedlam",
  "tracks": [
    {
      "track_id": 3770803404,
      "title": "Cycles Of Suffering",
      "audio_url": "https://t4.bcbits.com/stream/b32687/mp3-128/3770803404",
      "hq_audio_url": "https://t4.bcbits.com/stream/fc3538/mp3-v0/3770803404",
      "track_number": 1
    },
    {
      "track_id": 2590214273,
      "title": "Purgatorical Punishment",
      "audio_url": "https://t4.bcbits.com/stream/2b6cad/mp3-128/2590214273",
      "hq_audio_url": "https://t4.bcbits.com/stream/a37aa9/mp3-v0/2590214273",
      "track_number": 2
    }
  ]
}

Unfortunately, hq_audio_url is a bit of a misnomer. High quality in this context refers to MP3 V0, which is a lossy format (unlike regular Bandcamp downloads on the web page, where you can choose from various selection of lossless formats). The good news is that it’s very unlikely you can even hear the difference between lossless formats and high bitrate lossy formats. In any case, it’s better to have your music than not to have it at all, so I’ll happily take MP3 V0 over nothing any day. Anyway, I tried one of the download links from the JSON response and it worked:

First obstacle had been conquered. It was an important milestone for me, because at this moment I knew that even if I couldn’t figure out how to get the authentication token programmatically, I would still be able to manually download the missing albums from my collection: I could just save all HTTP responses and extract the audio URLs by hand. It wouldn’t be the most exciting job ever, but it would do the trick. Ah, who am I kidding? Of course I would not be happy with such half-assed solution. I obviously had to automate this process, which meant I needed to figure out how to get the authentication token.

Authentication protocol

Logins are typically very simple: you send a POST request with your username and password, and you get an authentication token in return. Bandcamp’s login protocol is much more convoluted. Here is the high level description of the login flow:

App sends the login request to /oauth_login endpoint.
Server returns 418 status code and a hex-encoded, random-looking X-Bandcamp-Dm header.
App resends the login request with its own X-Bandcamp-Dm value.
Server returns 451 status code and introduces a new X-Bandcamp-Pow header.
App sends the final login request with its own X-Bandcamp-Pow value.
Server returns 200 status code and an authentication token.

Based on this entire exchange, it appeared that X-Bandcamp-Dm and X-Bandcamp-Pow response headers served as some sort of challenge. Correct outgoing header values were necessary for successful authentication, and were in some way dependent on the incoming values.

Figuring out how the correct header values are generated just by looking at network traffic was clearly impossible; the answer to this question could only be found in the client application code. On the off chance that someone else had already figured out the algorithm, I did a quick Google and GitHub search for X-Bandcamp-Dm. I got a couple of hits, but all of them were just documenting the struggles of other people:

ok how they handle the new DM is, basically put, a pain in the ass, we’ll see if I get around figuring out wtf is happening (MITM Android Bandcamp app)

I’d love to support that if someone wants to reverse-engineer the X-Bandcamp-DM and X-Bandcamp-PoW headers. (Mopidy-Bandcamp)

Unfortunately I cannot recreate x-bandcamp-dm value in headers. (Python Bandcamp scraper)

Apparently, X-Bandcamp-Dm and X-Bandcamp-Pow headers were the secret ingredient that made it difficult to reverse engineer the login API. It was time to decompile the mobile app and find the answer to how these secret values are generated.

Decompiling the Android app

Since reversing managed code is much easier than reversing native code, I chose to decompile the Android mobile app. I’ve always used JADX for this purpose and it has always served me well (its GUI features would turn out to be particularly useful). After downloading the Bandcamp application package from APKMirror and opening it with JADX, I found out that the app was obfuscated:

I had never reverse engineered obfuscated code before. To determine how difficult the process would be, I searched for all occurrences of string X-Bandcamp-Dm. Number of results: zero. So, not only was the code obfuscated, but the string values were obfuscated as well. That meant the job of figuring out how the mysterious header values were calculated was not going to be easy. In fact, I wasn’t sure if it was going to be possible at all, since I didn’t have any clue where to start. I had doubts whether I even wanted to embark on this journey, but ultimately, I decided to do it, even if it takes me half a year (luckily, I only needed three weeks).

Obfuscation techniques

The app was using many different obfuscation techniques, and since I was a complete newbie in this area, I had to learn from scratch how to defeat each one. I’m going to show you some of the techniques I’ve seen, along with the tips on how to fight them. Reverse engineering veterans among you probably know all of them already, but if you are a beginner, I hope that you’ll learn something new and see that obfuscation is not as intimidating as it might appear.

Renaming

Renaming is an obfuscating method where identifiers (variable, class, field, and method names) are renamed to random gibberish. This is probably the most well-known type of obfuscation, so it’s not surprising that it’s frequently used in Bandcamp mobile app. For example, a typical method call might look something like this:

dVar.F(f17618q, b(dVar.i()));

When I first looked at this code, I had no idea what methods F, b and i were doing. Luckily, JADX is almost a full-blown IDE, so it contains features such as “Find usage” and “Go to declaration”. These two made the analysis much easier, because I was able to traverse the call chains until I reached some method with a normal, unobfuscated name, such as this one:

public void setHeaders(s7.d dVar) {
  this.f4578a.d(dVar);
}

When you repeat this process for all unknown method names, you will eventually discover that F sets the request header value in the HTTP client, b calculates the value of the header, and i composes the body of the outgoing HTTP request. The obfuscated code then becomes something that you can easily reason about:

request.setHeaders("X-Bandcamp-Dm", calculateHash(request.getParams()));

After spending a lot of time with the obfuscated code, you become so familiar with it that you start noticing things that were impossible to see before. For example, after a while it became obvious to me that these two methods were aliases for HMAC SHA-256 and HMAC SHA-512 cryptographic hash functions, respectively:

public static String e(String str, String str2, int i10, float f10) {
  return com.bandcamp.shared.platform.a.d().h(str, null, str2, i10, f10);
}

public static String d(String str, String str2, int i10) {
  return com.bandcamp.shared.platform.a.d().B(str, null, str2, i10);
}

Once you discover the real purpose of a method or a variable, you can also rename it in JADX. I didn’t use this feature, though. After finally understanding the meaning of the code, I didn’t feel the need to rename anything, because I had already formed a mental map, and the obfuscated code began to look just like regular code to me. This would probably be more challenging for larger apps or if I had to deobfuscate multiple features, not just header calculation.

String obfuscation

Even when all method and variable names are random nonsense, you still expect to at least be able to search for string constants. Since the app sends and receives the header X-Bandcamp-Dm, that string has to be somewhere in the code, right? But as I mentioned earlier, that header name was nowhere to be found. How do you even proceed from here? I started looking for string fragments. How about the string "X", the first character of the header name? There were dozens of occurrences of this value across the codebase, and most of them led nowhere, but there was also this one:

public static String f17618q = "X";
public static String f17620s = "pmac";
public static String f17621t = "D";
public static String f17622u = "M";
public static String f17619r = "nab";

static {
  f17620s += "d";
}

public <T> void d(s7.d<T> dVar) {
  char[] charArray = f17619r.toCharArray();
  for (int i10 = 0; i10 < charArray.length / 2; i10++) {
    char c10 = charArray[i10];
    charArray[i10] = charArray[(charArray.length - i10) - 1];
    charArray[(charArray.length - i10) - 1] = c10;
  }
  String str = new String(charArray);
  char[] charArray2 = f17620s.toCharArray();
  for (int i11 = 0; i11 < charArray2.length / 2; i11++) {
    char c11 = charArray2[i11];
    charArray2[i11] = charArray2[(charArray2.length - i11) - 1];
    charArray2[(charArray2.length - i11) - 1] = c11;
  }
  String str2 = new String(charArray2);
  dVar.F(f17618q + "-" + str + str2 + "-" + f17621t + f17622u, b(dVar.i()));
}

X, pmac, D, M, nab, d—what a weird bunch. Hm, but doesn’t it look an awful lot like something we are looking for? If your first thought was “this seems to be a permutation of the X-Bandcamp-Dm header name”, you were 100% right! The sole purpose of this class is to hide the well-known string by constructing it using string concatenation and reversing. All these shenanigans can be replaced with a single line of code:

public <T> void d(s7.d<T> dVar) {
  dVar.F("X-Bandcamp-Dm", b(dVar.i()));
}

Searching for string fragments has served me well multiple times, so I hereby officially declare it to be a very useful method for finding obfuscated string values.

Reflection

This is where analyzing the obfuscated code becomes much more difficult. I’ve mentioned earlier that even if a method has been renamed, you can still learn something about it by following its call chain. But in some cases, you don’t have this luxury: if a method is invoked using reflection, you can’t track its usage directly anymore. Take a look at this simple example:

obj.getClass()
  .getMethod("v0".replace("0", "alue"), new Class[0]);
  .invoke(obj, new Object[0]);

If you searched for all usages of method value() defined in the CacheListenerEvent class, you wouldn’t have found anything. But if you knew that it might have been called via reflection, you could have searched for value, val, or lue, and you would have found this call eventually. It’s not always that easy, though. In some cases, even string search wouldn’t have helped you:

Class.forName(sb2.toString())
  .getMethod(x7.d.c("lmrgdw", 2), Object.class)
  .invoke(cls, "2" + obj.toString().replaceAll("3", "5"));

Which method is being called here? You can’t easily discover that using only static analysis—you must directly invoke x7.d.c("lmrgdw", 2) to determine the result of the call.

In the end, there is no guaranteed way to defeat this obfuscation technique. You just need to be patient, and in the worst case, be ready to search for all usages of reflection to find that single call you need.

X-Bandcamp-Dm

I showed you all these obfuscation techniques because all of them were used in some form in the X-Bandcamp-Dm calculation. As I was learning more about deobfuscation, I was also slowly piecing together the X-Bandcamp-Dm algorithm. One day, I would learn how the header name was being constructed and where it was used. The next day, I would learn that the final header value is an output of the HMAC function and what its inputs are. After that, I would reverse engineer the weird, home-made key derivation function used to generate the keys for the HMAC calculation. Ultimately, I realized that X-Bandcamp-Dm is an HMAC SHA-256 hash of the incoming X-Bandcamp-Dm value, outgoing HTTP request body, and one more value that I couldn’t yet identify. That unidentified value was being initialized in the following method:

@Override // java.util.Observer
public void update(Observable observable, Object obj) {
  if ((obj instanceof String) && f17625n + 48 == ((String) obj).charAt(0)) {
    f17624m = obj.toString().substring(1).getBytes("utf-8");
  }
}

Of course, there was a catch—I couldn’t find any direct callers of this method. It could mean only one thing: the call sites were obfuscated to use reflection. I tried brute-forcing my way out of this problem by inspecting every observer chain in the code, but there were hundreds of them. Most of them were obfuscated, so this approach wasn’t going to work in a reasonable timeframe.

I was stuck on this for almost one entire week. After many unsuccessful attempts to find the caller of this method, it finally dawned on me. The condition before the assignment was checking if the string obj starts with the character "3" (the value of the field f17625n was always 3, and 48 is the numeric value of ASCII character 0). This meant that obj didn’t start with 3 randomly, but on purpose. Otherwise, the code couldn’t possibly work. And what’s the way to ensure that something starts with 3? Well, prepend "3" to it, of course! I searched for "3" + and found this:

@Override // java.util.Observer
public void update(Observable observable, Object obj) {
  ((Class) ((Object[]) obj)[2]).getMethod(
    a.this.f18941o.substring(0, 5) + a.this.f18942p.substring(0, 1),
    Object.class
  ).invoke(
    (Class) ((Object[]) obj)[2],
    "3" + x7.h.d(
      (String) ((Object[]) obj)[0],
      (String) ((Object[]) obj)[1],
      0
    )
  );
}

This was the most heavily obfuscated piece of code I had encountered. It was similar to a final boss fight, because it was using all obfuscation methods that had been bothering me previously (renaming, string obfuscation, reflection). However, by this point, all of this had become standard procedure for me. I quickly discovered that the reflection call was invoking the method notify, which was then notifying the observer I was interested in. This was the final piece of the puzzle! I deobfuscated the remaining parameters and updated my API client code. The moment I ran it and finally received the HTTP status code 451 instead of 418 from Bandcamp servers will forever remain as one of the happiest moments in my hacking history.

Looking back, it’s so funny that the X-Bandcamp-Dm calculation algorithm is so simple and clean and so easy to describe, yet it took me weeks to recreate it from thousands of lines of obfuscated code.

var input = response.Headers["X-Bandcamp-Dm"];

var key1 = FunkyKdf1(input, staticKey1);
var key2 = FunkyKdf2(input, staticKey2);

var output = HmacSha256(key2 + request.Body, key1);

request.Headers["X-Bandcamp-Dm"] = output;

X-Bandcamp-Pow

With X-Bandcamp-Dm out of the way, it was time to figure out the meaning of the X-Bandcamp-Pow header. Compared to the time I had spent on X-Bandcamp-Dm, reversing the calculation of X-Bandcamp-Pow was a breeze. It turned out to be a proof-of-work scheme that closely resembles Hashcash, the scheme that inspired Bitcoin’s own proof-of-work implementation. Bandcamp’s version concatenates the request body with the incoming X-Bandcamp-Pow value and an increasing counter. Next, it repeatedly calculates the SHA-1 hash of the new string until the output has the desired number of leading zero bits. The final counter value is then encoded using Base36 and appended to the original X-Bandcamp-Pow value.

For example, if X-Bandcamp-Pow is 1:10:f6e592b662b3, it means we need to find a hash with 10 leading zero bits. If we find it in 760 iterations, then the outgoing X-Bandcamp-Pow value will be 1:10:f6e592b662b3:l4 (760 is l4 in Base36).

It seems to me that the only reason for the introduction of this header was that everyone wanted to be a part of the blockchain craze at that time (X-Bandcamp-Pow was first introduced in December 2019, a year and a half after X-Bandcamp-Dm). I don’t see any other explanation, because X-Bandcamp-Pow doesn’t offer any additional advantages over X-Bandcamp-Dm (which can’t be brute-forced anyway).

But I digress. The moment of truth had arrived. I implemented proof-of-work calculation in my API client, ran it, and got the following output:

HTTP/2 418 I'm a teapot
HTTP/2 451 Unavailable For Legal Reasons
HTTP/2 200 OK

My first login request was successful, and the authentication token was finally mine! After this, implementing the rest of the API for downloading the albums from the collection was trivial.

Bandcamp downloader

The command line tool I wrote is available here. It has an absolutely minimal set of features: you can list all your purchased albums and you can download a specific album from your collection in MP3 V0 format. Here is one usage example:

# List all albums in your Bandcamp collection
$ dotnet run --username $USERNAME --password $PASSWORD
870109722 Bolt Thrower — Realm of Chaos
910230745 Cannibal Corpse — Evisceration Plague
157725502 Cryptopsy — None So Vile
388372040 Incantation — Onward to Golgotha
212824804 Archspire — Relentless Mutation

# Download the album with the specified ID
$ dotnet run --username $USERNAME --password $PASSWORD --album 870109722

I don’t plan to extend it with more features, since my main goal in this quest was to enable Bandcamp users to download the albums they can’t download in any other way. Also, there are already many feature-rich Bandcamp downloaders around, and it would make more sense to extend them with proper authentication than to reimplement all their features from scratch in my repo. If you are a maintainer of one such downloader, feel free to reuse the authentication code that I have implemented.

Enjoy downloading your lost albums and listening to them once again!