[Chat Agent] Data Sampling vs. Complete Dataset Analysis

Question:

I’ve encountered situations where Quick chat agents appear to sample data rather than analyze complete datasets, particularly for large analyses.

Current Behavior:

  • System sometimes samples data due to API System response limits or presentation decisions (showing subsets for readability)
  • Not always clear upfront whether analysis covers complete dataset or sampled data
  • Can lead to incomplete insights for critical business decisions

Current Workarounds:

  1. Pre-configure response templates to ensure these elements are included upfront:
  • Metric Definition
  • Data Source
  • Analysis Time Period
  • Completeness Level (total records analyzed vs. available)
  1. Use explicit prompts to request complete dataset analysis:
  • “Analyze the complete dataset”
  • “Show all results without sampling”
  • “Verify total data points available vs. analyzed”

Questions for the Community:

  1. Has anyone developed effective prompt templates that consistently return complete dataset analysis?
  2. Are there specific dashboard or topic configurations that help avoid sampling issues?
  3. What’s the best way to verify data completeness in agent responses?
  4. Is there a roadmap for improved handling of pagination/truncation in large datasets?

Would appreciate any tips on ensuring comprehensive data analysis from chat agents.

Hello luyujun,

I think Quick Topics have a known row limit when returning results to chat agents—typically around 20 rows. yes it could lead to:

  • Incomplete analysis: Agents analyze only the truncated subset of data
  • Potentially incorrect conclusions: Decisions based on partial data may not represent the full picture
  • No clear indication: The system doesn’t always explicitly warn that results are truncated

Understanding the Root Cause

Dashboard vs. Topic Data Access

When chat agents query data sources, the behavior differs significantly:

  • Dashboards: Agents see only the currently rendered/filtered/paginated view—not the full underlying dataset
  • Topics: While Topics query the full dataset and maintain an index of all data, they may still have output limits when returning results to the agent (around 20 rows)

This means even when using the recommended approach (Topics instead of Dashboards), you could still hit sampling limitations for queries returning large result sets.

Potential approach to ensure Complete Data Analysis

1. Use Topics Instead of Dashboards

This is the foundational best practice:

  • Create a Topic on top of your underlying dataset
  • Topics maintain an index of the complete dataset
  • They provide more reliable access than Dashboard views
  • Configure fields to be natural-language-friendly

However, be aware that Topics may still have the 20-row output limitation for large result sets.

2. Structure Your Prompts for Aggregated Analysis

Instead of asking for row-level details, request aggregated insights:

Less effective: “Show me all customer orders”
More effective: “What is the total count of customer orders, average order value, and breakdown by region?”

Aggregated queries are less likely to hit row limits since they return summary statistics rather than individual records.

3. Add Explicit Instructions to Your Agent

You can configure custom agent instructions to handle truncation transparently. Add instructions like:

“When querying a Topic and receiving only 20 rows of truncated data, do not make up additional values to answer the user query fully. Instead, inform the user that you have reached a system limitation of 20 rows and refer them to the associated generated visuals or suggest aggregated analysis approaches.”

4. Request Completeness Metadata in Your Prompts

Include explicit requests if possible in your queries:

  • “Analyze the complete dataset and indicate total records available vs. analyzed”
  • “Provide a count of total data points before performing analysis”
  • “If results are truncated, explicitly state how many records were analyzed out of the total”

5. Use Code Interpreter for Uploaded Files

Code Interpreter is automatically available when you upload files to chat. tabular data analysis where you need guaranteed complete dataset access:

  • Upload CSV/Excel files directly to chat (up to 20 files, 5MB for Excel/CSV)
  • Use Code Interpreter which can analyze the complete uploaded dataset
  • This may bypasses Topic row limitations

Addition questions

Q: Has anyone developed effective prompt templates that consistently return complete dataset analysis?

The most effective approach is requesting aggregated analysis rather than row-level details, combined with explicit completeness verification in your prompts.

Q: Are there specific dashboard or topic configurations that help avoid sampling issues?

Using Topics instead of Dashboards is my recommendation, though it doesn’t eliminate the 20-row output limit. Ensure Topics have clear field definitions and minimal filters.

Q: What’s the best way to verify data completeness in agent responses?

Include explicit completeness checks in your prompts (total records available vs. analyzed) and cross-reference with dashboard visuals when available.

Q: Is there a roadmap for improved handling of pagination/truncation in large datasets?

Not aware of the roadmap yet. However, your feedback is much appreciated and i will mark this as feature request.

Hope this gives some insight.

Cheers,

Deep

1 Like