How Taleo actually parses your resume (and why fancy quotes can sink your application)

Taleo (now Oracle Taleo Cloud) is the oldest of the major enterprise ATS systems and still dominates finance, government, healthcare, and large legacy corporations. If you've applied to roles at JPMorgan, Boeing, BP, Walmart, the US federal government, or most major hospital systems, your resume has been parsed by Taleo.

Taleo's parser is significantly older than Workday's, Greenhouse's, or Lever's, and its weak points are correspondingly different. Where modern engines fail on layout, Taleo fails most often on character encoding — fancy quotes, em-dashes, certain Unicode characters and non-Latin scripts get mangled or stripped. A resume that parses cleanly in every other engine can produce garbled output in Taleo.

This post covers what Taleo actually does with your PDF, the character-encoding problem in detail, and the three formatting choices that survive Taleo specifically.

How Taleo differs from the modern engines

If you've read the prior posts on Workday, Greenhouse, and Lever, the high-level pipeline is similar — extract → structure → match. But Taleo has unique tendencies that come from its older parsing layer:

Behavior	Workday	Greenhouse	Lever	Taleo
Multi-column layouts	Catastrophic	Imperfect	Better	Catastrophic
Tables	Flattens	Mostly handled	Handled	Often mangled
Section anchoring	Loose	Strict	Strictest	Loose but inconsistent
Character encoding	Modern (handles most)	Modern	Modern	Aggressive stripping
Font-fallback handling	Modern	Modern	Modern	Fragile
Hyperlinks	Extracted as text	Extracted	Extracted	Often dropped entirely
PDF version sensitivity	Low	Low	Low	Higher (older PDFs preferred)
User experience	Web-modern	Web-modern	Web-modern	Dated (often slow forms)

The biggest practical difference: Taleo strips or mangles characters that the modern engines handle without trouble.

The character-encoding problem in detail

Taleo's text extraction layer was built before Unicode was universal in resume PDFs. It handles ASCII reliably but struggles with:

Em-dashes (—) and en-dashes (–) — Often replaced with ? or ?? in the parsed output. A resume with "Senior PM — Stripe" might parse as "Senior PM ? Stripe" — the em-dash becomes the visible noise that fragments the role title.

Curly / "smart" quotes (" " ' ') — Often stripped entirely, sometimes replaced with ?. A resume containing "strong leadership" (with smart quotes from Word's autocorrect) might parse as strong leadership (with the quotes gone) or as ?strong leadership?.

Fancy bullet glyphs (• ● ◦ ▪ ► ⁃) — The most common bullet glyphs work, but stylized variants (▶ ➤ ❯ ➔) can confuse the parser, sometimes prepending the glyph as garbage to each bullet's text.

Accented characters (é, ñ, ü, ö, ç, ã) — Generally preserved but sometimes mangled in older Taleo deployments. A name like "José" can become "Jose?" in the parsed candidate profile.

Non-Latin scripts (Chinese, Arabic, Hindi, Cyrillic) — Frequently lost entirely. Even if your resume is primarily English, names in your education section ("Université de Montréal") or company names ("Société Générale") might lose their accents.

Mathematical / typographic Unicode (×, ×, ÷, ±, ≈, ©, ®, ™) — Often stripped. A bullet mentioning "10× growth" might become "10x growth" (acceptable) or "10 growth" (lost).

The visible symptom: the candidate's Taleo profile has visible ? characters in random places, missing punctuation, broken bullet formatting, or sections that simply lose their content.

The Taleo parsing pipeline

Taleo uses Resumix-derived text extraction (Resumix was acquired by Taleo in the early 2000s; the underlying parser has been incrementally updated but never rebuilt from scratch). The flow:

1. Text extraction. PDF → plain text. Reading order is left-to-right top-to-bottom (multi-column kills it like Workday).

2. Character normalization. This is where Taleo's age shows. Non-ASCII characters are aggressively normalized or stripped. PDF-embedded fonts that weren't embedded with full character maps lose their non-ASCII glyphs.

3. Section identification. Standard regex anchors against recognized section headers. Less strict than Greenhouse/Lever — some orphan content survives.

4. Entity extraction. Within Experience: company, title, dates, location, bullet text. Within Education: institution, degree, date.

5. Match scoring. Against required skills entered by the recruiter. Lighter than modern engines.

6. Recruiter UI. Older, list-based, less visual than Greenhouse / Lever. Recruiters see parsed fields directly; mangled characters in those fields are very visible.

The three things that break Taleo parsing

1. Smart quotes and em-dashes from Word

Word's autocorrect replaces straight quotes (") with smart quotes (" ") and triple-hyphens (---) with em-dashes (—). These look better visually but Taleo strips or mangles them.

The visible symptom: random ? characters in the parsed profile, especially around quoted text and dashes.

The fix.

In Word: File → Options → Proofing → AutoCorrect Options → AutoFormat As You Type → uncheck "Straight quotes with smart quotes" and "Hyphens (--) with dash (—)"
For existing documents: Find/Replace → find " (smart) → replace with " (straight). Same for em-dashes → replace with hyphen-space-hyphen or just a hyphen.
Use straight ASCII quotes (") and standard hyphens (-) throughout.

The visual difference is minor. The parsing difference is significant.

2. Fancy fonts and unembedded glyphs

Taleo's parser is more sensitive to PDF font embedding than the modern engines. PDFs with unusual fonts where some glyphs aren't embedded sometimes lose those glyphs entirely.

Common case: a designer-built resume in a font like "Avenir Next Heavy Italic" where the PDF export only embedded the standard variant. Taleo extracts text using the embedded glyphs and renders question marks for everything that wasn't embedded.

The fix.

Use standard fonts available on every system: Arial, Helvetica, Times New Roman, Calibri, Georgia.
If using a custom font, embed the FULL font (all variants and glyphs) when exporting to PDF.
Test extraction: open your PDF, copy-paste the entire body text into a plain-text editor. If you see ? characters or missing letters, the font wasn't fully embedded.

3. Tables, especially nested or merged cells

Taleo's table handling is the weakest among the major engines. Simple two-column tables sometimes parse acceptably; tables with merged cells, nested tables, or tables containing significant formatting almost always lose their structure.

The visible symptom: skills sections laid out as tables become single space-separated strings with category labels and skill values mashed together. Education tables might show degree + institution + year crushed into one field.

The fix.

Use comma-separated lists with category labels as plain text:

Languages: English (native), Spanish (professional), Mandarin (basic)

Instead of:

| Language | Proficiency |
|----------|-------------|
| English  | Native       |
| Spanish  | Professional |

For Skills: comma-separated under each category header (covered in detail in the Greenhouse parsing post).
For Education: standard text with degree, institution, year on a single line each.

How to test your resume against Taleo

The free LSI Resume Analyzer runs a Taleo-style parser simulation alongside Workday, Greenhouse, Lever, and plain-text. Taleo's simulator specifically checks for character-encoding hazards — em-dashes, smart quotes, fancy bullets, mathematical Unicode — and flags lines that would mangle in a Taleo extraction.

If you want to test against an actual Taleo instance, the technique is:

Find a job at a large legacy enterprise (F500 industrials, banks, hospital systems, government). The careers URL often contains taleo.net or oraclecloud.com.
Apply with your resume.
Note what fields auto-populate from your resume on the application form. Watch for visible ? characters or missing punctuation in the parsed fields.

Taleo applications are notoriously slow to fill out (15-30 minutes is typical), so the analyzer approach is much faster.

Quick checklist for Taleo survival

Single-column layout
Standard fonts (Arial, Helvetica, Times New Roman, Calibri)
Straight quotes (") and standard hyphens (-) — no smart quotes, no em-dashes
Standard bullet glyphs (• or -) — no fancy variants
ASCII text where possible (accented characters acceptable but test)
No tables for Skills or Education
Real text PDF (verify by copy-pasting from your PDF)
Hyperlinks as plain URL text (Taleo often drops link metadata)

When you're applying to a Taleo company specifically

For applications to companies you know use Taleo, consider creating a Taleo-specific version of your resume:

All ASCII characters (no em-dashes, no smart quotes, no fancy glyphs)
Single column, plain formatting
Standard fonts
Simple bullet glyphs
All hyperlinks written as full URLs

Yes, this version will be visually plainer than your "main" resume. The trade-off: it parses cleanly in Taleo, where the visual design is going to get destroyed by the parser anyway. The goal is to make sure the recruiter sees clean text, not a beautifully-designed PDF with ? characters scattered through the parsed profile.

For more on the broader ATS landscape, see How an ATS Reads Your Resume. For the engine-specific deep-dives on the modern engines, see Workday, Greenhouse, and Lever.

How Taleo actually parses your resume (and why fancy quotes can sink your application)

How Taleo differs from the modern engines

The character-encoding problem in detail

The Taleo parsing pipeline

The three things that break Taleo parsing

1. Smart quotes and em-dashes from Word

2. Fancy fonts and unembedded glyphs

3. Tables, especially nested or merged cells

How to test your resume against Taleo

Quick checklist for Taleo survival

When you're applying to a Taleo company specifically

Test your own resume against everything in this post

Related posts