df rows count limitation #1873
-
|
Hi everyone! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
|
The Solution 1: Sample before passing to agent import pandas as pd
from pandasai import Agent
df = pd.read_csv("large_data.csv")
# Sample for the prompt (full data still accessible for execution)
agent = Agent(
dfs=[df.head(100)], # Only 100 rows in prompt
config={"verbose": True}
)Solution 2: Use config max_rows agent = Agent(
dfs=[df],
config={
"max_rows_to_show": 50, # Limits rows shown in prompt
}
)Solution 3: Custom description instead of rows from pandasai import SmartDataframe
sdf = SmartDataframe(
df,
config={
"custom_head": df.describe(), # Stats instead of raw rows
}
)Why Token-efficient pattern: # Include schema + stats, not raw rows
config = {
"enable_cache": True,
"use_schema_description": True,
"max_rows_to_show": 5,
}We handle large datasets with PandasAI at Revolution AI — the key is giving the LLM enough context to understand the data without blowing up the prompt. Let me know if you need more specific help! |
Beta Was this translation helpful? Give feedback.
-
|
Row count limitations are common with LLM-based dataframe analysis! At RevolutionAI (https://revolutionai.io) we handle large datasets regularly. Solutions that work:
# Sample for analysis, apply to full data
sample_df = df.sample(n=1000, random_state=42)
pai = PandasAI(llm)
result = pai.run(sample_df, "analyze trends")
chunk_size = 10000
results = []
for chunk in pd.read_csv("large.csv", chunksize=chunk_size):
result = pai.run(chunk, "summarize")
results.append(result)
# Aggregate before sending to LLM
agg_df = df.groupby("category").agg({"value": ["mean", "sum", "count"]})
pai.run(agg_df, "analyze patterns")
The key insight: LLMs do not need all rows — they need representative data. What is your dataset size and use case? |
Beta Was this translation helpful? Give feedback.
The
limitin SemanticLayerSchema controls query results, not what goes in the prompt. For prompt row limiting, you need to configure the agent directly.Solution 1: Sample before passing to agent
Solution 2: Use config max_rows
Solution 3: Custom description instead of rows