Commit Graph

10 Commits

Author SHA1 Message Date
Marco Minerva e0cf824dd6 Refactor document processing and embedding generation
- Updated `DocxContentDecoder` to process Word documents as chunks of text, removing page tracking and enhancing content handling.
- Modified `VectorSearchService.ImportAsync` to work with chunks, implementing batching for embedding generation.
- Added `EmbeddingBatchSize` property to `AppSettings` for configurable batch processing.
- Updated `appsettings.json` to include the new `EmbeddingBatchSize` setting for improved control over embedding processes.
2025-06-18 14:45:08 +02:00
Marco Minerva 1975d63189 Improve layout and readability in Documents.razor
- Wrapped checkbox input in a div for better alignment.
- Changed documents initialization from an empty array to a list.
- Updated document addition code for improved readability.
- Modified ConfirmDialogOptions and ToastMessage initializations to use object initializer syntax.
- Translated comment in DocxContentDecoder.cs from Italian to English.
2025-06-11 17:47:44 +02:00
Marco Minerva cdbe2e3a91 Enhance content decoders and update dependencies
- Modified `DocxContentDecoder` to use `IServiceProvider` for text chunking and improved paragraph processing with page break handling.
- Updated `PdfContentDecoder` and `TextContentDecoder` to trim whitespace from text before splitting into paragraphs.
- Reordered service registrations in `Program.cs` while retaining existing functionality.
- Updated `SqlDatabaseVectorSearch.csproj` with new package versions for several dependencies, including `Microsoft.AspNetCore.OpenApi` and `Microsoft.EntityFrameworkCore`.
2025-06-11 17:20:56 +02:00
Marco Minerva dc6bbfde91 Update content decoding and validation logic
- Changed `PageNumber` in `Chunk` to nullable `int?` in `IContentDecoder` and updated related logic in `TextContentDecoder`.
- Revised citation rules in `ChatService` for stricter placement and formatting.
- Introduced `QuestionValidator` class with validation rules for `Question` model's `Text` property.
2025-06-06 10:50:03 +02:00
Marco Minerva 2fc070d0aa Refactor response handling and content decoding
- Updated `TextContentDecoder` to use `ITextChunker` for paragraph splitting and return a list of `Chunk` objects.
- Changed return type of `Stream` method in `AskEndpoints.cs` from `IAsyncEnumerable<QuestionResponse>` to `IAsyncEnumerable<Response>`.
- Removed `QuestionResponse` class and introduced `Response` class to better handle streaming responses.
- Modified `AskQuestionAsync` and `AskStreamingAsync` methods in `VectorSearchService` to return `Response` instead of `QuestionResponse`, and adjusted token count calculation.
- Added namespace declaration in `Response.cs` and defined properties to align with new response structure.
2025-06-04 10:22:15 +02:00
Marco Minerva 1e531e5ad6 Filter out empty paragraphs in PdfContentDecoder
Updated the paragraph processing to exclude empty or whitespace-only entries before creating Chunk objects, ensuring only meaningful text is included.
2025-05-27 17:19:25 +02:00
Marco Minerva fa81f01c27 Refactor content decoders and restructure data layer
Updated `DocxContentDecoder`, `PdfContentDecoder`, and `TextContentDecoder` to return `Task<IEnumerable<Chunk>>` instead of `Task<string>`, introducing a new `Chunk` record for structured output.

Restructured the `ApplicationDbContext`, `Document`, and `DocumentChunk` classes by moving them to the `SqlDatabaseVectorSearch.Data` namespace for better organization.

Updated database migration files to align with the new entity structure and modified references in `Program.cs`, `DocumentService.cs`, and `VectorSearchService.cs` to use the new namespace.
2025-05-27 17:10:17 +02:00
Marco Minerva a0f1755c85 Add CancellationToken support to async methods #9
Introduce support for `CancellationToken` across various methods to allow for task cancellation and improve responsiveness.
- Update `DecodeAsync` method in `DocxContentDecoder.cs`, `PdfContentDecoder.cs`, `TextContentDecoder.cs`, and `IContentDecoder.cs` to include an optional `CancellationToken` parameter.
- Modify endpoint handlers in `Program.cs` to accept and pass `CancellationToken` parameters.
- Update methods in `ChatService.cs` to include `CancellationToken` parameters.
- Update methods in `DocumentService.cs` to include `CancellationToken` parameters.
- Update methods in `VectorSearchService.cs` to include `CancellationToken` parameters.
These changes ensure that long-running operations can be canceled if needed, improving the application's ability to handle cancellation requests gracefully.
2025-02-10 16:20:35 +01:00
Marco Minerva af9158873f Add support for DOCX and TXT files, update error handling
Updated README.md to reflect support for PDF, DOCX, and TXT files.
Removed commented-out code in DocxContentDecoder.cs.
Added TextContentDecoder service in Program.cs and updated exception handling middleware.
Updated document upload endpoint description in Program.cs.
Modified VectorSearchService to throw NotSupportedException for unsupported content types.
Added TextContentDecoder class in TextContentDecoder.cs.
2025-01-29 09:58:22 +01:00
Marco Minerva 110e21e1e0 Add content decoding for PDF and DOCX files
- Added `using` statements in `Program.cs` for new content decoding.
- Registered new content decoder services in `builder.Services`.
- Modified `documentsApiGroup.MapPost` to pass `file.ContentType`.
- Refactored `VectorSearchService` to use `IServiceProvider` and handle content types.
- Added `DocumentFormat.OpenXml` package reference.
- Created `DocxContentDecoder` and `PdfContentDecoder` classes.
- Created `IContentDecoder` interface.
2025-01-29 09:43:22 +01:00