Table Of Contents
Who is this for
Breakdance pages store their content in WordPress post meta as breakdance_data. When Dynamic Fields or ACF fields are used on a page, only the variable reference is stored in breakdance_data — not the resolved value.
This means that to get an accurate word count, image count, or link count — or to send your page content to AI for analysis, repurposing, or SEO assistance — the tool needs to read what actually renders on the page, not what’s stored in the database.
The PHP lives in E:\GIT_REPOS\wp-scos-strategic-content-operating-system, not in the SEO Command Center repo. There are two layers: a legacy extractor in brighter-core, and a newer CA module in site-essentials that delegates to it.
Core content extraction — BW_Content_Analysis
File: brighter-core/includes/class-content-analysis.php
This is the single source of truth for “full page content” (Breakdance + editor + ACF).
| Method | Visibility | Role |
|---|---|---|
get_aggregated_content($post_id) | public static | Entry point — call this from WP-CLI, REST, Make.com, etc. |
aggregate_content($post_id, $post) | private | Builds the combined string |
get_breakdance_content($post_id) | private | Reads _breakdance_data meta |
parse_breakdance_structure($data) | private | Decodes outer JSON → tree_json_string |
extract_breakdance_tree_to_html($node) | private | Walks the Breakdance tree |
scan_content_props($data, $parent_key) | private | Generic property scanner (headings, rich text, links, images) |
extract_acf_content($fields, $content) | private | Recursively flattens ACF values to text |
analyze_content($post_id, $post, $update) | public | Runs on save_post (priority 20) — writes bw_* meta |
calculate_stats($html) | private | Word count, H2 count, image count |
on_breakdance_data_saved() | public | Re-runs analysis when _breakdance_data is written |
Aggregation priority
private static function aggregate_content($post_id, $post) {
$bd_content = self::get_breakdance_content($post_id);
if ($bd_content !== '') {
// Breakdance data exists: use it as primary content (do not double-count with post_content)
$content = $bd_content;
} else {
$content = $post->post_content;
}
// ACF fields (if ACF is active)
if (function_exists('get_fields')) {
$fields = get_fields($post_id);
if ($fields) {
$content .= ' ' . self::extract_acf_content($fields);
}
}
return $content;
}Breakdance: reads _breakdance_data (fallback breakdance_data), decodes tree_json_string, walks the tree via scan_content_props() — no per-element whitelist. Handles headings (text + tags), rich HTML, links, and image keys — but not dynamic elements (Query Loop, Post Repeater).
ACF: uses get_fields($post_id) for all fields on the post. extract_acf_content():
- Skips keys starting with
_(ACF internal refs) - Concatenates string values
- Recurses arrays (repeaters, flexible content, groups)
- Does not resolve relationship/post object fields to linked post content — only stored scalar/meta values
Post types are gated by bw_cs_post_types() in brighter-core/includes/bw-content-strategy.php (all public CPTs except attachments, nav items, WooCommerce types, etc.).
Word count / CAR analysis — Content_Analysis (site-essentials)
File: site-essentials/Modules/ContentArchitecture/Content_Analysis.php
| Method | Role |
|---|---|
analyze($post_id, $post, $update) | Main engine — writes scos_ca_* meta on save_post priority 25 |
get_content($post_id, $post) | Delegates to BW_Content_Analysis::get_aggregated_content(); falls back to post_content |
calculate_stats($html) | Word count via wp_strip_all_tags + str_word_count; H2/img via regex |
ajax_run_batch() | Admin “Run All” backfill (wp_ajax_scos_run_analysis_batch) |
on_breakdance_data_saved() | Re-analyze when Breakdance meta is saved |
private static function get_content( $post_id, $post ) {
if ( class_exists( '\BW_Content_Analysis' )
&& method_exists( '\BW_Content_Analysis', 'get_aggregated_content' ) ) {
$content = \BW_Content_Analysis::get_aggregated_content( $post_id );
if ( $content ) {
return $content;
}
}
return $post->post_content;
}So both bw_word_count and scos_ca_word_count should come from the same aggregated source — but they’re written by separate save hooks and can drift if one path fails (permissions, skip-on-unchanged, etc.).
Important: gather_content_inventory.py in the command center only reads pre-computed scos_ca_word_count meta — it does not call get_aggregated_content() live.
Social amplification — how content gets to Make.com
Two-step flow:
1. Webhook trigger (metadata only)
File: brighter-core/includes/social-amplification/class-webhook-trigger.php
manual_trigger($post_id) POSTs JSON to Make.com with post ID, URL, title, excerpt, dates, breadcrumb, content_type, featured image URLs — no body content in the manual payload.
Automatic trigger (currently disabled) would include image_optimization with full content via get_image_optimization_data().
2. REST callback (full content for AI)
File: brighter-core/includes/social-amplification/class-social-amplification-api.php
| Endpoint | Method | Content source |
|---|---|---|
/social-amplification/generate-prompt | get_prompt_data() | BW_Content_Analysis::get_aggregated_content() → sanitize_content_for_prompt() |
/social-amplification/talking-points | get_talking_points() | Post Framing templates (not page body) |
/social-amplification/inventory | get_content_inventory() | Post list metadata |
/image-optimization/get-data | get_image_optimization_data() | Same aggregated content + images |
The Make.com scenario is designed to call generate-prompt after the webhook fires — that’s where Breakdance + ACF body text becomes source_material.
// Use same aggregated content as Content Analysis (post_content + ACF + Breakdance)
$raw_content = class_exists('BW_Content_Analysis') ? BW_Content_Analysis::get_aggregated_content($post_id) : $post->post_content;
$source_material = self::sanitize_content_for_prompt($raw_content);Known bug vectors (matches your “content still buggy” note)
- Breakdance images often count as 0 — BD JSON stores
from/id, not resolved URLs;<img src="">tags may be empty at analysis time (confirmed on some productions site pages). - Dynamic BD elements invisible — Query Loops / Post Repeaters have no static text in the tree; word count will be low or zero on pages that are mostly dynamic (e.g.
word_count: "7"on the QR generator post in your inventory). - ACF relationship fields — only raw stored values are included, not linked post content.
- Skip-on-unchanged — both engines skip if
last_analyzed === post_modified. Breakdance saves via REST may update_breakdance_datawithout changingpost_modified; theon_breakdance_data_saved()hooks exist specifically to fix that, but anything that bypasses those hooks will leave stale counts. - Dual meta prefixes — legacy
bw_*vsscos_ca_*; inventory uses whichever prefix it detects first. - Social webhook vs generate-prompt split — if Make.com isn’t calling
generate-prompt, the automation only gets excerpt/title, not aggregated body content.
Quick reference — what to call when
# Live aggregated content (Breakdance + ACF + editor)
wp eval ‘echo BW_Content_Analysis::get_aggregated_content(POST_ID);’
# Pre-computed word count (may be stale)
wp post meta get POST_ID scos_ca_word_count
# Force re-analysis batch
# Admin → CA Overview → Run All, or wp_ajax scos_run_analysis_batch