Skip to content
LSI Resume
September 22, 2026·7 min read

How Taleo actually parses your resume (and why fancy quotes can sink your application)

Taleo is the legacy enterprise ATS still dominant in finance, government, and healthcare. Its parser is the most fragile of the major engines on character encoding and font-fallback. The three things that break Taleo specifically, and why your em-dashes might be costing you interviews.

#taleo#ats-mechanics#resume-formatting

Taleo (now Oracle Taleo Cloud) is the oldest of the major enterprise ATS systems and still dominates finance, government, healthcare, and large legacy corporations. If you've applied to roles at JPMorgan, Boeing, BP, Walmart, the US federal government, or most major hospital systems, your resume has been parsed by Taleo.

Taleo's parser is significantly older than Workday's, Greenhouse's, or Lever's, and its weak points are correspondingly different. Where modern engines fail on layout, Taleo fails most often on character encoding — fancy quotes, em-dashes, certain Unicode characters and non-Latin scripts get mangled or stripped. A resume that parses cleanly in every other engine can produce garbled output in Taleo.

This post covers what Taleo actually does with your PDF, the character-encoding problem in detail, and the three formatting choices that survive Taleo specifically.

How Taleo differs from the modern engines

If you've read the prior posts on Workday, Greenhouse, and Lever, the high-level pipeline is similar — extract → structure → match. But Taleo has unique tendencies that come from its older parsing layer:

Behavior Workday Greenhouse Lever Taleo
Multi-column layouts Catastrophic Imperfect Better Catastrophic
Tables Flattens Mostly handled Handled Often mangled
Section anchoring Loose Strict Strictest Loose but inconsistent
Character encoding Modern (handles most) Modern Modern Aggressive stripping
Font-fallback handling Modern Modern Modern Fragile
Hyperlinks Extracted as text Extracted Extracted Often dropped entirely
PDF version sensitivity Low Low Low Higher (older PDFs preferred)
User experience Web-modern Web-modern Web-modern Dated (often slow forms)

The biggest practical difference: Taleo strips or mangles characters that the modern engines handle without trouble.

The character-encoding problem in detail

Taleo's text extraction layer was built before Unicode was universal in resume PDFs. It handles ASCII reliably but struggles with:

Em-dashes (—) and en-dashes (–) — Often replaced with ? or ?? in the parsed output. A resume with "Senior PM — Stripe" might parse as "Senior PM ? Stripe" — the em-dash becomes the visible noise that fragments the role title.

Curly / "smart" quotes (" " ' ') — Often stripped entirely, sometimes replaced with ?. A resume containing "strong leadership" (with smart quotes from Word's autocorrect) might parse as strong leadership (with the quotes gone) or as ?strong leadership?.

Fancy bullet glyphs (• ● ◦ ▪ ► ⁃) — The most common bullet glyphs work, but stylized variants (▶ ➤ ❯ ➔) can confuse the parser, sometimes prepending the glyph as garbage to each bullet's text.

Accented characters (é, ñ, ü, ö, ç, ã) — Generally preserved but sometimes mangled in older Taleo deployments. A name like "José" can become "Jose?" in the parsed candidate profile.

Non-Latin scripts (Chinese, Arabic, Hindi, Cyrillic) — Frequently lost entirely. Even if your resume is primarily English, names in your education section ("Université de Montréal") or company names ("Société Générale") might lose their accents.

Mathematical / typographic Unicode (×, ×, ÷, ±, ≈, ©, ®, ™) — Often stripped. A bullet mentioning "10× growth" might become "10x growth" (acceptable) or "10 growth" (lost).

The visible symptom: the candidate's Taleo profile has visible ? characters in random places, missing punctuation, broken bullet formatting, or sections that simply lose their content.

The Taleo parsing pipeline

Taleo uses Resumix-derived text extraction (Resumix was acquired by Taleo in the early 2000s; the underlying parser has been incrementally updated but never rebuilt from scratch). The flow:

1. Text extraction. PDF → plain text. Reading order is left-to-right top-to-bottom (multi-column kills it like Workday).

2. Character normalization. This is where Taleo's age shows. Non-ASCII characters are aggressively normalized or stripped. PDF-embedded fonts that weren't embedded with full character maps lose their non-ASCII glyphs.

3. Section identification. Standard regex anchors against recognized section headers. Less strict than Greenhouse/Lever — some orphan content survives.

4. Entity extraction. Within Experience: company, title, dates, location, bullet text. Within Education: institution, degree, date.

5. Match scoring. Against required skills entered by the recruiter. Lighter than modern engines.

6. Recruiter UI. Older, list-based, less visual than Greenhouse / Lever. Recruiters see parsed fields directly; mangled characters in those fields are very visible.

The three things that break Taleo parsing

1. Smart quotes and em-dashes from Word

Word's autocorrect replaces straight quotes (") with smart quotes (" ") and triple-hyphens (---) with em-dashes (). These look better visually but Taleo strips or mangles them.

The visible symptom: random ? characters in the parsed profile, especially around quoted text and dashes.

The fix.

  • In Word: File → Options → Proofing → AutoCorrect Options → AutoFormat As You Type → uncheck "Straight quotes with smart quotes" and "Hyphens (--) with dash (—)"
  • For existing documents: Find/Replace → find " (smart) → replace with " (straight). Same for em-dashes → replace with hyphen-space-hyphen or just a hyphen.
  • Use straight ASCII quotes (") and standard hyphens (-) throughout.

The visual difference is minor. The parsing difference is significant.

2. Fancy fonts and unembedded glyphs

Taleo's parser is more sensitive to PDF font embedding than the modern engines. PDFs with unusual fonts where some glyphs aren't embedded sometimes lose those glyphs entirely.

Common case: a designer-built resume in a font like "Avenir Next Heavy Italic" where the PDF export only embedded the standard variant. Taleo extracts text using the embedded glyphs and renders question marks for everything that wasn't embedded.

The fix.

  • Use standard fonts available on every system: Arial, Helvetica, Times New Roman, Calibri, Georgia.
  • If using a custom font, embed the FULL font (all variants and glyphs) when exporting to PDF.
  • Test extraction: open your PDF, copy-paste the entire body text into a plain-text editor. If you see ? characters or missing letters, the font wasn't fully embedded.

3. Tables, especially nested or merged cells

Taleo's table handling is the weakest among the major engines. Simple two-column tables sometimes parse acceptably; tables with merged cells, nested tables, or tables containing significant formatting almost always lose their structure.

The visible symptom: skills sections laid out as tables become single space-separated strings with category labels and skill values mashed together. Education tables might show degree + institution + year crushed into one field.

The fix.

  • Use comma-separated lists with category labels as plain text:
Languages: English (native), Spanish (professional), Mandarin (basic)

Instead of:

| Language | Proficiency |
|----------|-------------|
| English  | Native       |
| Spanish  | Professional |
  • For Skills: comma-separated under each category header (covered in detail in the Greenhouse parsing post).
  • For Education: standard text with degree, institution, year on a single line each.

How to test your resume against Taleo

The free LSI Resume Analyzer runs a Taleo-style parser simulation alongside Workday, Greenhouse, Lever, and plain-text. Taleo's simulator specifically checks for character-encoding hazards — em-dashes, smart quotes, fancy bullets, mathematical Unicode — and flags lines that would mangle in a Taleo extraction.

If you want to test against an actual Taleo instance, the technique is:

  1. Find a job at a large legacy enterprise (F500 industrials, banks, hospital systems, government). The careers URL often contains taleo.net or oraclecloud.com.
  2. Apply with your resume.
  3. Note what fields auto-populate from your resume on the application form. Watch for visible ? characters or missing punctuation in the parsed fields.

Taleo applications are notoriously slow to fill out (15-30 minutes is typical), so the analyzer approach is much faster.

Quick checklist for Taleo survival

  • Single-column layout
  • Standard fonts (Arial, Helvetica, Times New Roman, Calibri)
  • Straight quotes (") and standard hyphens (-) — no smart quotes, no em-dashes
  • Standard bullet glyphs (• or -) — no fancy variants
  • ASCII text where possible (accented characters acceptable but test)
  • No tables for Skills or Education
  • Real text PDF (verify by copy-pasting from your PDF)
  • Hyperlinks as plain URL text (Taleo often drops link metadata)

When you're applying to a Taleo company specifically

For applications to companies you know use Taleo, consider creating a Taleo-specific version of your resume:

  • All ASCII characters (no em-dashes, no smart quotes, no fancy glyphs)
  • Single column, plain formatting
  • Standard fonts
  • Simple bullet glyphs
  • All hyperlinks written as full URLs

Yes, this version will be visually plainer than your "main" resume. The trade-off: it parses cleanly in Taleo, where the visual design is going to get destroyed by the parser anyway. The goal is to make sure the recruiter sees clean text, not a beautifully-designed PDF with ? characters scattered through the parsed profile.

For more on the broader ATS landscape, see How an ATS Reads Your Resume. For the engine-specific deep-dives on the modern engines, see Workday, Greenhouse, and Lever.

Test your own resume against everything in this post

The free analyzer runs in your browser, simulates 5 ATS engines, and surfaces every issue with a snippet + fix. No signup, fully private.

Related posts