CREST CTF - Read Between The Lines
Challenge
We are given a memo file:
1
misc/challenge_memo.txt
Prompt summary:
- The memo looks normal.
- No malicious links or attachments.
- Ghost Mantis is known for hiding signals in plain sight.
- We need to recover the hidden communication.
- Flag format:
CREST{}
Initial thought process
Since the challenge title is Read Between The Lines, I immediately assumed this was not going to be a normal visible-text challenge. The most likely possibilities were:
- Zero-width Unicode characters
- Homoglyphs / mixed scripts
- Whitespace stego
- Line/word positional encoding
- A decoy visible layer plus a second real layer
So I started by checking the file type and printing the contents in a way that would expose invisible characters.
Step 1: Basic inspection
I first checked the file size and type:
1
2
3
4
5
$ wc -c misc/challenge_memo.txt
1654 misc/challenge_memo.txt
$ file misc/challenge_memo.txt
misc/challenge_memo.txt: Unicode text, UTF-8 text
That already matters because if the file is UTF-8 text, weird Unicode tricks are very possible.
Then I printed the file normally:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ sed -n '1,220p' misc/challenge_memo.txt
Subject: Faculty Resеarch coordination
Generally speaking, as we prеpаre for the upcoming intеrdisсiplinаry rеview,
all documеntation must bе finаlized and аrchivеd beforе the deаdline.
сооrdination with depаrtment heаds is expеctеd withоut еxcеptiоn.
Rеsearсh summaries are tо be submitted bеfоrе thе end оf the month.
all tеams must cоnfirm participatiоn and еnsure аccurасy of reсоrds.
Revisions aftеr submission will not bе асceptеd undеr nоrmal cirсumstanсеs.
In light оf reсеnt schedule сhanges, pleasе accоunt for аdditiоnal review timе.
соntaсt your depаrtmеnt сoordinаtоr if any issues аrise during preparatiоn.
careful аttentiоn tо formatting guidelinеs will be apprеciаted аnd nоtеd.
Many оf you have alrеady completed initial drаfts — thаnk yоu fоr your еffort.
additionаl rеsources are available on the shared faculty portal if needed.
No extensions will be granted except in cases of documented emergencies.
Regards,
Office of Academic Affairs
Even from the raw view, two things looked suspicious:
- There were clearly invisible separators between letters in the first few lines.
- Some letters looked normal visually but were probably different Unicode code points later in the file.
Step 2: Make invisible characters visible
I used cat -A to force weird bytes to show up:
1
2
3
$ sed -n '1,220p' misc/challenge_memo.txt | cat -A
SM-bM-^@M-^LuM-bM-^@M-^KbM-bM-^@M-^LjM-bM-^@M-^LeM-bM-^@M-^LcM-bM-^@M-^LtM-bM-^@M-^K:M-bM-^@M-^K FM-bM-^@M-^LaM-bM-^@M-^KcM-bM-^@M-^LuM-bM-^@M-^KlM-bM-^@M-^LtM-bM-^@M-^LyM-bM-^@M-^K RM-bM-^@M-^LeM-bM-^@M-^LsM-bM-^@M-^KM-PM-5M-bM-^@M-^LaM-bM-^@M-^LrM-bM-^@M-^LcM-bM-^@M-^KhM-bM-^@M-^L cM-bM-^@M-^KoM-bM-^@M-^LoM-bM-^@M-^KrM-bM-^@M-^LdM-bM-^@M-^KiM-bM-^@M-^LnM-bM-^@M-^LaM-bM-^@M-^KtM-bM-^@M-^KiM-bM-^@M-^LoM-bM-^@M-^KnM-bM-^@M-^L$
...
This confirmed that hidden Unicode bytes were all over the text.
At that point I wanted exact code points, not mangled terminal escapes.
Step 3: Count the non-ASCII characters
I ran a short Python script to count non-ASCII characters:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ python3 - <<'PY'
from pathlib import Path
from collections import Counter
text = Path('misc/challenge_memo.txt').read_text('utf-8')
nonascii = Counter(ch for ch in text if ord(ch) > 127)
for ch, n in nonascii.most_common():
print(f'U+{ord(ch):04X} {ch!r} {n}')
PY
U+200B '\\u200b' 101
U+200C '\\u200c' 99
U+0435 'е' 33
U+043E 'о' 23
U+0430 'а' 20
U+0441 'с' 13
U+2014 '—' 1
This was the big turning point.
The file contains:
U+200BZERO WIDTH SPACEU+200CZERO WIDTH NON-JOINER- Cyrillic
е о а с
That means the file has two different hidden channels:
- Zero-width binary-looking data
- Mixed-script homoglyph data
That screamed decoy + real payload.
Step 4: Confirm where the weird characters are
I printed each line with non-ASCII characters annotated:
1
2
3
4
5
6
7
8
9
$ python3 - <<'PY'
from pathlib import Path
text = Path('misc/challenge_memo.txt').read_text('utf-8')
for i, line in enumerate(text.splitlines(), 1):
if any(ord(ch) > 127 for ch in line):
print('LINE', i)
print(''.join(f'{ch}(U+{ord(ch):04X}) ' if ord(ch) > 127 else ch for ch in line))
print()
PY
Important observations:
- Lines
1,3,4,5are full ofU+200BandU+200C. - Lines
7onward are full of Cyrillic homoglyphs like:еinstead of Latineоinstead of Latinoаinstead of Latinaсinstead of Latinc
So the memo absolutely had layered hiding.
Step 5: Decode the zero-width layer first
The zero-width characters are the easiest thing to try first.
I mapped:
U+200B->1U+200C->0
Actually I tested both directions because either mapping could be correct.
This script was enough:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
$ python3 - <<'PY'
from pathlib import Path
text = Path('misc/challenge_memo.txt').read_text('utf-8')
seq = ''.join('0' if ch == '\u200b' else '1' for ch in text if ch in '\u200b\u200c')
print('len bits', len(seq))
for name, bits in [
('200b=0,200c=1', seq),
('200b=1,200c=0', ''.join('1' if b == '0' else '0' for b in seq)),
]:
print('\\n', name)
for off in range(8):
s = bits[off:]
n = len(s) // 8 * 8
by = bytes(int(s[i:i+8], 2) for i in range(0, n, 8))
printable = ''.join(chr(c) if 32 <= c < 127 else '.' for c in by)
print('offset', off, printable)
PY
len bits 200
200b=0,200c=1
offset 0 .........................
offset 1 y[uYW.3.)5A....'A#../.#1
...
200b=1,200c=0
offset 0 CREST{f4ke_tr41l_n0th1ng}
...
So the zero-width channel decodes perfectly to:
1
CREST{f4ke_tr41l_n0th1ng}
At first glance that looks like a flag, but it literally says:
1
fake_trail_n0th1ng
So this is obviously a trap.
That matches the challenge story too: Ghost Mantis is subtle, and this is exactly the kind of bait I would expect in a layered challenge.
So I discarded that as the final answer and moved on.
Step 6: Focus only on the homoglyph layer
Now I needed to inspect the second hidden channel.
I normalized the text by replacing the Cyrillic lookalikes with visible tags:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
$ python3 - <<'PY'
from pathlib import Path
text = Path('misc/challenge_memo.txt').read_text('utf-8')
mapc = {'а':'[a]','е':'[e]','о':'[o]','с':'[c]'}
for i, line in enumerate(text.splitlines(), 1):
cleaned = ''.join(ch for ch in line if ch not in '\u200b\u200c')
marked = ''.join(mapc.get(ch, ch) for ch in cleaned)
print(f'{i:02}: {marked}')
PY
01: Subject: Faculty Res[e]arch coordination
02:
03: Generally speaking, as we pr[e]p[a]re for the upcoming int[e]rdis[c]iplin[a]ry r[e]view,
04: all docum[e]ntation must b[e] fin[a]lized and [a]rchiv[e]d befor[e] the de[a]dline.
05: [c][o][o]rdination with dep[a]rtment he[a]ds is exp[e]ct[e]d with[o]ut [e]xc[e]pti[o]n.
06:
07: R[e]sear[c]h summaries are t[o] be submitted b[e]f[o]r[e] th[e] end [o]f the month.
08: all t[e]ams must c[o]nfirm participati[o]n and [e]nsure [a]ccur[a][c]y of re[c][o]rds.
09: Revisions aft[e]r submission will not b[e] [a][c]cept[e]d und[e]r n[o]rmal cir[c]umstan[c][e]s.
10:
11: In light [o]f re[c][e]nt schedule [c]hanges, pleas[e] acc[o]unt for [a]dditi[o]nal review tim[e].
12: [c][o]nta[c]t your dep[a]rtm[e]nt [c]oordin[a]t[o]r if any issues [a]rise during preparati[o]n.
13: careful [a]ttenti[o]n t[o] formatting guidelin[e]s will be appr[e]ci[a]ted [a]nd n[o]t[e]d.
14:
15: Many [o]f you have alr[e]ady completed initial dr[a]fts — th[a]nk y[o]u f[o]r your [e]ffort.
16: addition[a]l r[e]sources are available on the shared faculty portal if needed.
17: No extensions will be granted except in cases of documented emergencies.
18:
19: Regards,
20: Office of Academic Affairs
So the second layer is not random at all. It is systematically replacing letters in a/c/e/o.
The obvious interpretation is:
- normal Latin letter =
0 - Cyrillic lookalike =
1
But the question is: which characters are carriers?
That part matters a lot.
Step 7: The wrong way that almost works
My first attempt was to only look at the altered letters themselves and flatten them into bits.
That gave structured data, but not a clean decode.
I also tested:
- byte grouping
- 7-bit ASCII
- Baconian / 5-bit grouping
- line-wise grouping
- grouping only the later paragraph
- grouping only altered words
- symbol-identity encodings using
a/c/e/o
All of those produced either noise or misleading almost-readable garbage.
That told me the real carrier selection was broader than “only the visibly modified letters”.
Step 8: The actual carrier set
The thing that finally worked was this:
Use every ambiguous
a/c/e/oin the whole memo as a carrier, not just the Cyrillic ones.
Meaning:
- For every occurrence of
a,c,e,o(or their Cyrillic lookalikes), - write
0if the character is the normal Latin version, - write
1if the character is the Cyrillic homoglyph.
That gives one long bitstream.
This exact script extracts it:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ python3 - <<'PY'
from pathlib import Path
text = Path('misc/challenge_memo.txt').read_text('utf-8')
trans = {'а':'a','с':'c','е':'e','о':'o','А':'A','С':'C','Е':'E','О':'O'}
bits = []
for ch in text:
base = trans.get(ch, ch)
if base.lower() in 'aceo':
bits.append('1' if ch != base else '0')
bitseq = ''.join(bits)
print('bitlen', len(bitseq))
print(bitseq[:64])
PY
bitlen 276
0000010000000000000011000001111000010011001010010010111000100101
Then I grouped those bits into bytes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ python3 - <<'PY'
from pathlib import Path
text = Path('misc/challenge_memo.txt').read_text('utf-8')
trans = {'а':'a','с':'c','е':'e','о':'o','А':'A','С':'C','Е':'E','О':'O'}
bits = []
for ch in text:
base = trans.get(ch, ch)
if base.lower() in 'aceo':
bits.append('1' if ch != base else '0')
bitseq = ''.join(bits)
by = bytes(int(bitseq[i:i+8], 2) for i in range(0, len(bitseq)//8*8, 8))
print(by)
PY
b\"\\x04\\x00\\x0c\\x1e\\x13).%w!=\\x12*f'9v!\\x16:s!\\x16%t z0\\x00\\x00\\x00\\x00\\x00\\x00\"
This is not printable, but it is very structured:
- sensible byte length
- clear padding zeros at the end
- not random garbage
So I knew I was close.
Step 9: Recover the XOR key using the known flag prefix
Since the flag format is known, I used the classic known-plaintext trick:
The decoded text should start with:
1
CREST{
So I XORed the first few ciphertext bytes against that prefix.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
$ python3 - <<'PY'
from pathlib import Path
import re
text = Path('misc/challenge_memo.txt').read_text('utf-8')
trans = {'а':'a','с':'c','е':'e','о':'o','А':'A','С':'C','Е':'E','О':'O'}
bits = ''.join(
'1' if ch != trans.get(ch, ch) else '0'
for ch in text
if trans.get(ch, ch).lower() in 'aceo'
)
by = bytes(int(bits[i:i+8], 2) for i in range(0, len(bits)//8*8, 8))
crib = b'CREST{'
for klen in range(1, 9):
key = [None] * klen
ok = True
for i, ch in enumerate(crib):
kval = by[i] ^ ch
idx = i % klen
if key[idx] is None:
key[idx] = kval
elif key[idx] != kval:
ok = False
break
print('klen', klen, 'ok', ok, 'key', key)
if ok:
keybytes = bytes(k if k is not None else 0 for k in key)
dec = bytes(by[i] ^ keybytes[i % klen] for i in range(len(by)))
print('dec', dec[:40])
PY
klen 1 ok False key [71]
klen 2 ok False key [71, 82]
klen 3 ok False key [71, 82, 73]
klen 4 ok True key [71, 82, 73, 77]
dec b'CREST{gh0st_m4nt1s_w4s_h3r3}GRIMGR'
...
The key bytes [71, 82, 73, 77] are ASCII:
1
GRIM
So the hidden byte stream is XORed with repeating key:
1
GRIM
At that point the plaintext becomes:
1
CREST{gh0st_m4nt1s_w4s_h3r3}GRIMGR
The extra GRIMGR at the end is just leftover trailing noise because the bitstream length is not a perfect multiple of the full plaintext structure and the file has padded carriers.
The actual flag is the proper flag-shaped substring:
1
CREST{gh0st_m4nt1s_w4s_h3r3}
Step 10: Final confirmation
I extracted the flag cleanly with one final script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ python3 - <<'PY'
import re
from pathlib import Path
text = Path('misc/challenge_memo.txt').read_text('utf-8')
trans = {'а':'a','с':'c','е':'e','о':'o','А':'A','С':'C','Е':'E','О':'O'}
bits = ''.join(
'1' if ch != trans.get(ch, ch) else '0'
for ch in text
if trans.get(ch, ch).lower() in 'aceo'
)
by = bytes(int(bits[i:i+8], 2) for i in range(0, len(bits)//8*8, 8))
key = b'GRIM'
dec = bytes(b ^ key[i % len(key)] for i, b in enumerate(by))
print(dec)
print(re.search(rb'CREST\\{[^}]+\\}', dec).group(0).decode())
PY
b'CREST{gh0st_m4nt1s_w4s_h3r3}GRIMGR'
CREST{gh0st_m4nt1s_w4s_h3r3}
Why the challenge is nice
What makes this challenge good is that it is layered on purpose:
- The first hidden channel is easy to find.
- That first channel gives a believable-looking but obviously fake flag:
1
CREST{f4ke_tr41l_n0th1ng}
- The second hidden channel is harder because it uses:
- Unicode homoglyphs
- a wider carrier set than the obvious modified letters
- XOR after the bit extraction
So the solve is not just “spot invisible chars and decode”.
It is:
- detect the bait
- refuse to stop at the bait
- identify the second Unicode channel
- choose the correct carrier set
- extract bits
- recover XOR key from known flag prefix
That is why the fake flag is actually a clue, not just trolling.
Final flag
1
CREST{gh0st_m4nt1s_w4s_h3r3}