Java: Find heap objects without a reference (sun.misc.Unsafe)

Java is generally a safe language and, unlike in languages that treat memory as a big byte array, such as C and C++, you can't access an object unless you have a reference to it. Even Java reflection is unable to do that. There is, however, a class in the standard library that provides access to native memory management – the infamous sun.misc.Unsafe. If you're interested in various hacky ways to improve performance, you've probably used it before.

Obtaining the sun.misc.Unsafe

The constructor of this class is private and the getUnsafe() method will not work from your code. The class has a singleton field called "theUnsafe" on my platform. It may have different names on, for example, Android or other JDK implementations. Using simple reflection we can get a reference to this field:

	Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
	theUnsafe.setAccessible(true);
	Unsafe unsafe = theUnsafe.get(null);

Here is a demonstration on how the Unsafe can be used to obtain information from an object without a reference to it anywhere. If the object was stored in a field, we could, of course, use simple reflection but this is not the case. This is the setup we're going to try to break:

public class Main {
	private static final Unsafe unsafe;
	private static final Scanner stdin = new Scanner(System.in);
	private static int passwordHash;

	static {
		// initialize unsafe
	}

	public static void main(String[] args) throws Exception {
		readPassword();
		doUnsafeStuff();
	}

	private static void readPassword() {
		passwordHash = stdin.nextLine().hashCode();
	}

	private static void doUnsafeStuff() throws Exception {
		// this where we need to steal the password contents
	}
}

As you can see, only a hash of the password is stored after reading it. To steal the actual password text, we take advantage of the fact that Java garbage collector won't collect the password string immediately, and even when it does, it isn't required to fill that memory with zeros (which is why you should store passwords in a mutable container and clear it when you're done using it). Immediately after the readPassword() we're going to create another String, then, since we can obtain the address of the string we just created, we will navigate to the address of our string's char array. Once we know where our string's chars are stored, we assume that since the strings were created one after the other, the char array of password string should be located on a nearby address. Simply traverse the nearest bytes and find out the password! Here is how the doUnsafeStuff() might look in practice:

private static void doUnsafeStuff() throws Exception {
	String dummy = stdin.nextLine();
	Field value = String.class.getDeclaredField("value");
	value.setAccessible(true);
	long offset = unsafe.objectFieldOffset(value);
	// read the address of char array as a long (x64 mode)
	long address = unsafe.getLong(dummy, offset);
	for (int i = -250; i < -150; i++) {
		// range where the password chars might be
		char ch = (char) unsafe.getByte(address + i);
		System.out.print(ch);
	}
}

There are a few things to mention about this method. First, I'm creating the 'dummy' string in the same way the password is obtained. During my tests, if the dummy was obtained another way, I couldn't locate the char array. I'm sure there is a way around that. If you've found such a method, feel free to contact me. Second, I've used a very concrete range (-250...-150) to locate the characters. It is likely to vary across JVM implementations and you might need to tune it yourself. Also, if you're using a 32 bit system, you probably need to read the char array reference as an int instead of long.

Results

Examples of running the above program:

input:
SecretPassword
dummy
output:
;ᅥ				￵	  	  SecretPassword
dummy
   	   Y7          │ᅩ;ᅥネᅩ;ᅥ	   m     

Java 9 uses compressed Latin-1 strings, which is why the strings look correct when viewed as a byte array. Here is an example using UTF-8 strings (uncompressed):

input
UTF8Passwordﮝ
dummy
output:
  U T F 8 P a s s w o r d ン
 d u m m y
	   	   ᄁ7 	 	 	 ￈ヌ9ᅥ(ヌ9ᅥ	   m     

Note the intentional utf-8 character at the end of password to force it being stored uncompressed. If you're running the program on Java 8 or earlier, you will probably see this kind of output for all strings. Interestingly, the the dummy string is stored uncompressed as well, the exact reason of such behaviour is unknown to me.

Conclusion

The sun.misc.Unsafe is an excellent tool for analysing memory. Always use mutable containers that you can zero-out afterwards for sensitive data. You can never know how long the actual data remains in memory. Thankfully, at least under Java 9, the example generates an "Illegal reflective access" warning.