An In-House Scientific Article Summarizer I Built at Beijing Nievedor
I built an internal scientific article summarizer at Beijing Nievedor Intelligent Technology Co., Ltd. to make AI papers easier to consume for non-experts.
It’s an in-house tool, so no links or code here — just a quick portfolio-style overview.
What I Built
A bilingual (Chinese/English) summarization app that turns papers and articles into readable summaries with user-controlled output.
Key Features
- Multilingual end-to-end: UI, input, and output support Chinese and English
- Flexible input: PDF upload, plain text, URL-to-Markdown, or direct text (including multi-file)
- Custom summaries: choose length and style (layman-friendly, technical, bullet points)
- Visual understanding: key figures/plots are identified with PDF-Extract-Kit, cropped out, and embedded into the final summary with inline references
- Performance at scale: The large PDF bottleneck was solved by adding a preprocessing step where extracted text is used to identify the pages with the most important information and figures, passing only the top 10 pages to the heavier figure detection models.
- Feedback loop: bilingual ratings + comments stored to iterate on quality
Technical Stack (High Level)
- Python + Streamlit UI
- LangChain / LangGraph for pipeline orchestration (stateful, debuggable)
- PDF-Extract-Kit for figure/plot detection and cropping
- HTML parsing to Markdown
- Model API for summarization + multimodal visual descriptions
Deployment
Runs on a local-network machine in the office with secrets managed via environment variables.
What’s Next
Better audience adaptation, more output formats, and smarter performance handling for very large PDFs.