Commit Graph

29 Commits

Author SHA1 Message Date
Marco Minerva f02a1c9b69 Refactor document operations into DocumentService
Refactored Program.cs to use AddAzureSql with new options. Added VectorSearchService and DocumentService as scoped services. Updated documentsApiGroup to use DocumentService for document operations and added a delete document endpoint. Moved document-related methods from VectorSearchService to new DocumentService for better separation of concerns.
2025-02-07 11:30:14 +01:00
Marco Minerva cdd0199e8f Refactor services and update token handling
- Replace `TotalTokenCount` with `EmbeddingTokenCount` in `ImportDocumentResponse`.
- Add `OriginalQuestion` and `ReformulatedQuestion` fields to `QuestionResponse` and a new constructor.
- Add a new constructor to `TokenUsageResponse` to initialize `Question`.
- Add `TextChunkerService` to service collection in `Program.cs`.
- Clarify prompt and update token counting in `ChatService`.
- Differentiate token counting in `TokenizerService` with `CountChatCompletionTokens` and `CountEmbeddingTokens`.
- Update `VectorSearchService` to use `TextChunkerService` and new token counting methods.
- Introduce `TextChunkerService` for text splitting and tokenization.
2025-02-07 10:24:16 +01:00
Marco Minerva 8e06979993 Refactor response types and enhance token usage handling 2025-01-30 12:56:33 +01:00
Marco Minerva af9158873f Add support for DOCX and TXT files, update error handling
Updated README.md to reflect support for PDF, DOCX, and TXT files.
Removed commented-out code in DocxContentDecoder.cs.
Added TextContentDecoder service in Program.cs and updated exception handling middleware.
Updated document upload endpoint description in Program.cs.
Modified VectorSearchService to throw NotSupportedException for unsupported content types.
Added TextContentDecoder class in TextContentDecoder.cs.
2025-01-29 09:58:22 +01:00
Marco Minerva 110e21e1e0 Add content decoding for PDF and DOCX files
- Added `using` statements in `Program.cs` for new content decoding.
- Registered new content decoder services in `builder.Services`.
- Modified `documentsApiGroup.MapPost` to pass `file.ContentType`.
- Refactored `VectorSearchService` to use `IServiceProvider` and handle content types.
- Added `DocumentFormat.OpenXml` package reference.
- Created `DocxContentDecoder` and `PdfContentDecoder` classes.
- Created `IContentDecoder` interface.
2025-01-29 09:43:22 +01:00
Marco Minerva f15f387510 Update VectorSearchService and appsettings.json
- Clarified comment in `ChatService.cs`.
- Added `TokenizerService` and `ILogger` parameters to `VectorSearchService` constructor.
- Updated paragraph splitting to use `tokenizerService.CountTokens`.
- Added logging for token count of each paragraph in `VectorSearchService`.
- Updated `ModelId` comment in `appsettings.json` to include "gpt-4o-mini".
- Changed `MaxTokensPerParagraph` in `appsettings.json` from 1024 to 1000.
2025-01-28 16:09:34 +01:00
Marco Minerva 1ef2d384ec Add streaming support and improve JSON serialization
- Updated `Response` record class to allow nullable `Question` and `Answer` properties; moved `StreamState` enum to a new file.
- Added `JsonStringEnumConverter` in `Program.cs` for better enum serialization.
- Corrected terminology in document upload endpoint description.
- Introduced `/api/ask-streaming` endpoint for streaming question responses.
- Added `AskStreamingAsync` method in `VectorSearchService` for handling streaming logic.
- Created `StreamState.cs` to define `StreamState` enum with `Start`, `Append`, and `End` values.
2025-01-28 11:00:45 +01:00
Marco Minerva 44c6193674 Add streaming and refactor chat/question handling
Updated `Response` record in `Response.cs` to include an optional `StreamState` property, which can be `Start`, `Append`, or `End`. Added a new `StreamState` enum to `Response.cs`.

In `ChatService.cs`, added new methods `AskQuestionAsync` and `AskStreamingAsync` to handle asking questions and streaming responses, respectively. Refactored `CreateChatAsync` to return a `ChatHistory` object.

In `VectorSearchService.cs`, added a new `AskQuestionAsync` method to handle questions using `ChatService`. Updated `CreateContextAsync` to return a tuple with the reformulated question and chunks. Removed the previous implementation of `AskQuestionAsync` and replaced it with the new method utilizing `ChatService`.
2025-01-28 10:14:47 +01:00
Marco Minerva 232be6f083 Update code style, prompt, and dependencies
.editorconfig: Add new code style preferences.
ChatService.cs: Add formatted question to prompt string.
VectorSearchService.cs: Remove TinyHelpers.Extensions using directive.
VectorSearchService.cs: Use paragraphs.Index() in foreach loop.
SqlDatabaseVectorSearch.csproj: Update target framework to net9.0.
SqlDatabaseVectorSearch.csproj: Update package references, remove TinyHelpers.
2024-11-21 17:46:50 +01:00
Marco Minerva 0435f042f1 Refactor to use EF Core for database operations
Refactored the codebase to replace raw SQL connections and Dapper with Entity Framework Core (EF Core). Modified `Program.cs` to configure EF Core services. Refactored `VectorSearchService` to use EF Core for all database operations. Updated project dependencies to remove Dapper and `Microsoft.Data.SqlClient`, and add EF Core packages. Added `ApplicationDbContext` for EF Core context and new `Document` and `DocumentChunk` classes for entity models.
2024-10-31 15:16:38 +01:00
Marco Minerva 1a5542f1d2 Refactor to use Dapper async methods and improve readability
Refactored code to utilize Dapper's `ExecuteAsync` and `ExecuteScalarAsync` methods, reducing boilerplate and simplifying SQL command execution. Updated `DeleteDocumentAsync` to a concise expression-bodied member. Replaced manual SQL parameter addition with anonymous objects for better readability and maintainability. Transaction handling remains unchanged to ensure consistent database operations.
2024-10-16 15:06:34 +02:00
Marco Minerva 8c6cc3c969 Improve README, add comments, and clean up VectorSearchService
Updated README.md for clarity and additional setup instructions:
- Refined repository description to highlight native Vector type.
- Rephrased note on Vector Support feature for readability.
- Removed mention of EFCore.SqlServer.VectorSearch library.
- Added instructions for updating VECTOR column size and setting Dimension property.

Added comment in Scripts.sql to guide vector size setting in Embedding column.

Cleaned up VectorSearchService.cs by removing unused and commented-out SQL command execution code.
2024-10-01 17:35:59 +02:00
Marco Minerva 4355f72dab Refactor DB operations, rename tables, add Dapper
Refactored `VectorSearchService.cs` to use Dapper for DB operations, replacing raw ADO.NET commands. Updated methods for inserting, retrieving, and deleting documents and chunks. Modified vector search query to use Dapper's `QueryAsync`.

Updated `SqlDatabaseVectorSearch.csproj` to include Dapper package reference, version `2.1.35`.
2024-10-01 11:39:21 +02:00
Marco Minerva 2dff0aae55 Add dimensions parameter for embeddings; reformat SQL
Updated Program.cs to include dimensions parameter for AddAzureOpenAITextEmbeddingGeneration sourced from aiSettings.Embedding.Dimensions. Reformatted SQL command texts in VectorSearchService.cs for better readability. Introduced EmbeddingServiceSettings class in AzureOpenAISettings.cs to allow optional dimensions configuration. Updated appsettings.json to include new Dimensions property under Embedding section.
2024-09-30 17:53:59 +02:00
Marco Minerva 3e95251485 Refactor to use the native VECTOR type 2024-09-30 17:08:28 +02:00
Marco Minerva 05d18c5f97 Fix typos 2024-09-02 10:48:23 +02:00
Marco Minerva b6a09d0926 Data access optimizations 2024-09-02 10:42:30 +02:00
Marco Minerva 17eee5f775 Optimize query performance and memory usage 2024-08-02 14:56:19 +02:00
Marco Minerva 6e716f3984 Optimize delete logic 2024-07-19 11:31:54 +02:00
Marco Minerva c1180d3a4f Optimize chunks retrieval 2024-07-19 11:01:44 +02:00
Marco Minerva 45de38d87a Enhanced document chunk handling and API
- Updated `Scripts.sql` to add a new `[Index]` column to `[dbo].[DocumentChunks]` for order tracking.
- Modified `DocumentChunk.cs` to include a new `Index` property, and introduced a new immutable record class for document chunks.
- Introduced new API endpoints in `Program.cs` for document and chunk retrieval, including embedding details, with OpenAPI documentation enhancements.
- Updated an API endpoint description in `Program.cs` to clarify document embedding handling.
- Updated `VectorSearchService.cs` to reflect schema changes in service logic, adding methods for fetching document chunks and specific embeddings.
2024-07-10 11:25:50 +02:00
Marco Minerva f3a0ec7c31 Little refactoring 2024-07-01 09:49:34 +02:00
Marco Minerva 8ef8836075 Minor updates 2024-06-26 11:05:00 +02:00
Marco Minerva 1840a63d75 Better transaction on document save 2024-06-24 10:31:57 +02:00
Marco Minerva 7a97000c10 Update libraries 2024-06-24 09:45:05 +02:00
Marco Minerva fa58e02709 Refactor and enhance config management
Refactored code to centralize configuration access through a single `AppSettings` instance in `ChatService` and `VectorSearchService`, improving maintainability and reducing verbosity. Introduced new configuration settings (`MaxTokensPerLine`, `MaxTokensPerParagraph`, `OverlapTokens`, `MaxChunksCount`) in `AppSettings.cs` and `appsettings.json` for enhanced flexibility in content processing. Adjusted existing settings usage (`MessageLimit`, `MessageExpiration`) to align with the new access method, and removed obsolete settings (`StoragePath`, `VectorDbPath`, `QueuePath`). These changes simplify the codebase, make the application more configurable and adaptable to different content characteristics, and allow for more controlled vector search operations.
2024-06-17 11:58:30 +02:00
Marco Minerva b6c898a3f5 Refactor code and enhance API documentation
- Converted `Question.cs` and `Search.cs` records to `record class` syntax for clarity.
- Organized API endpoints with tags and added new GET and DELETE endpoints in `Program.cs`, including OpenAPI documentation improvements.
- Removed commented-out code in `Program.cs` for cleaner codebase.
- Introduced `WithTags` for better API operation categorization in Swagger UI.
- Added a TODO comment in `ChatService.cs` for future improvement on chunk length check.
- Clarified `using` directives in `VectorSearchService.cs` with namespace aliasing to improve readability.
- Refactored document deletion in `VectorSearchService.cs` to use a private helper method and expanded service capabilities with a new `GetDocumentsAsync` method.
- Introduced a new `Document` model in the `Models` namespace to support document fetching functionality.
- Simplified `appsettings.json` by removing `MaxTokens` configuration for `ChatCompletion` and `Embedding` services.
2024-06-14 17:20:21 +02:00
Marco Minerva db4646330f Enhanced app with Azure AI and vector search
- Modified `ApplicationDbContext.cs` to correct the `.IsVector()` method placement for `DocumentChunk`.
- Removed `MemoryResponse.cs` class, indicating a move away from this model.
- Enhanced `Program.cs` with Azure AI services integration for text embeddings and chat completions. Updated OpenAPI descriptions and reintroduced `/api/ask` with vector search.
- Adjusted `ChatService.cs` to improve question-asking functionality using document chunks.
- Updated `VectorSearchService.cs` with a new `AskQuestionAsync` method for advanced search and response capabilities. Made `GetContentAsync` static.
- Formatted `SqlDatabaseVectorSearch.csproj` and managed NuGet package inclusions.
- Simplified `appsettings.json` by removing unused keys.
- Added a new `Response` record class for standardized service responses.
2024-06-14 12:59:09 +02:00
Marco Minerva 9284ae5377 Initial commit 2024-06-14 11:47:00 +02:00