Skip to content

add a glance to library index#59

Merged
theorm merged 4 commits intomainfrom
indexupdate-aglance
Apr 22, 2026
Merged

add a glance to library index#59
theorm merged 4 commits intomainfrom
indexupdate-aglance

Conversation

@caiocmello
Copy link
Copy Markdown
Collaborator

I've added a section 'a glance' to the python library index file. It appears here (https://impresso.readthedocs.io/en/latest/). Could you please revise and let me know if there is missing information that could be added?

@mduering
Copy link
Copy Markdown

Small tweak: "The Impresso Python library designed to.."

@simon-clematide
Copy link
Copy Markdown

simon-clematide commented Apr 20, 2026

Sorry to mangle in. For me this looks more like a "Quickstart" not "Glance" which is typically more conceptual (which we should provide as well.) I would not use the word "page" here, as in the context of newspapers "page" has another dominant meaning.

I would make it even more "quick and condensed".
(beware of markdown in markdown artefacts below

## Quickstart

### Create a session

```python
from impresso import connect

client = connect()

Search the archive

results = client.search.find(term="moon landing")
results

To view the full DataFrame:

results.df

Retrieve results in batches

Search results are returned in batches. By default, only the first batch is displayed. Use limit and offset to retrieve additional results.

import pandas as pd

total_results = 2000
limit = 1000
all_results = []

for offset in range(0, total_results, limit):
    results = client.search.find(
        term="Titanic",
        order_by="-date",
        limit=limit,
        offset=offset,
    )
    all_results.append(results.df)

full_results_df = pd.concat(all_results, ignore_index=True)
full_results_df

Get a content item by ID

item = client.content_items.get("NZG-1877-10-20-a-i0024")
item

Transcript text is available in text.content.

Open a content item in the web app

https://impresso-project.ch/app/article/{id}

@caiocmello
Copy link
Copy Markdown
Collaborator Author

Hi @simon-clematide thanks very much for your feedback. Your suggestion looks great. I would just avoid the subtitle 'get content item by ID' as it says nothing to the new user. Content item is not defined here. The whole point of this part is to make it very clear, from the beginning, where the transcripts are hidden. For the rest, it reads well in the more concise version. Thank you!

Regarding 'batches', I understand the point of avoiding the word page. But here it's used the term 'pagination' (https://impresso.readthedocs.io/en/latest/result/#pagination-information). Would you advice we change the word pagination throughout the entire python library documentation?

@simon-clematide
Copy link
Copy Markdown

@caiocmello I would not change pagination to batches in technical documentation, but just avoid the bare word "page" (as I think you already did now). May we can call it "pagination batches" in the quickstart. This connects then the more technical term.

@caiocmello
Copy link
Copy Markdown
Collaborator Author

Comment from Roman:

Update pagination code with better option:

import pandas as pd
# Get first page with 50 items per page
results = impresso.search.find(term="revolution", limit=50)
df = results.df

# Iterate through all pages
for page in results.pages():
    print(f"Processing page at offset {page.offset}")
    print(f"Contains {page.size} items")
    df = pd.concat([df, page.df])

Add 'warning banner' informing users of monthly limit of 200.000 (double-check limit). Eg. be careful when using concat...

@theorm theorm merged commit 192f907 into main Apr 22, 2026
2 checks passed
@theorm theorm deleted the indexupdate-aglance branch April 22, 2026 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants