Ever wondered how hard it would be to load a CSV file into a SQLite database. I know how I would do it in code, no rocket science needed there! However in this case I wanted to really know the speed of doing this natively and really didn’t want to code anything!
So looking at what SQLite3.exe has too offer it pretty much supports it out of the box. Very nice
Requirements:
- Loading speed
- Making the data to consuming applications available asap
While I love C# and frankly its hard to go back to C or C++, sometimes performance trumps the creature comforts we have become accustomed to.
Note: I did this without circling back to a C# implementation as I know the data and performance requirements are tight and in this case I wanted max performance with no code! The biggest factor to a successful implementation is to ensure you use the tools best for the job, not just the ones you favor in that specific year.
So first things first – create a table to take the input
DROP TABLE IF EXISTS BookSales; CREATE TABLE IF NOT EXISTS BookSales ( Store int ,Date varchar ,OrderReference varchar ,Line int ,BookISBN varchar(14) ,Quantity int ,Price int , Primary Key (OrderReference,Line) );
Next is the magic. We need to load the CSV into the table:
.separator "|" .import BookSales.txt BookSales
Wow that was easy
. You can see we set the separator to be a pipe rather than comma in this case, then the import.
.IMPORT [FileName] [Table]
Now the database is ready to be queried! But if we want to take it just one stage further:
.output SummaryBookSales.csv SELECT Store, Date, BookISBN, SUM(Quantity), SUM(Price) FROM BookSales GROUP BY Store, Date, BookISBN;
Now we output the results of our simple aggregation into a pipe separated output file.
Tying this all together in a single configuration file, which we will call “BookAnalysisLoader.sql”, gives us:
DROP TABLE IF EXISTS BookSales; CREATE TABLE IF NOT EXISTS BookSales ( Store int ,Date varchar ,OrderReference varchar ,Line int ,BookISBN varchar(14) ,Quantity int ,Price int , Primary Key (OrderReference,Line) ); .separator "|" .import BookSales.txt BookSales .output SummaryBookSales.csv SELECT Store, Date, BookISBN, SUM(Quantity), SUM(Price) FROM BookSales GROUP BY Store, Date, BookISBN; .exit
The last piece of the puzzle is the final execution:
sqlite3.exe BookSalesAnalysis.db3 < BookAnalysisLoader.sql
Now we have a newly created database with our analysis data in it, and we have a summary CSV file generated from the output. So we can load the CSV into Excel or another DB, or directly interrogate the DB for more analytical information – and all without coding!
Related Links:
- SQLite for C# – Part 1 – Am I allowed to use it?
- SQLite for C# – Part 2 – How do I setup a SQLite DB (without coding)
- SQLite for C# – Part 3 – My first C# app using SQLite aka Hello World
- SQLite for C# – Part 4 – So how does SQLite stack up against other DB’s?
- SQLite for C# – Part 5 – SQLite ‘features’, or ‘quirks’
- SQLite for C# – Part 6 – SQLite Connection String Definitions
- SQLite for C# – Part 7 – Building SQLite.Net from source
- SQLite for C# – Part 8 – Loading CSV/Pipe into SQLite via command line
Thanks
You helped me to understand this a little bit better. Too bad there are not much information about c# SQLite around
Glad to have helped! There is information out there, the trick is finding it!
Thank you! for the very informative and useful writeup. Even those Wrox and Wiley’s heavy and expensive books couldn’t cleared what you have done in few lines.
Have you played with the .Net port of SQLite? http://code.google.com/p/csharp-sqlite/
I’d be interested in knowing if you noticed any differences…
I’ve only downloaded it enough to see how well the translation went. Got to say I dont like the generated code too much. Outside of that I’ve not had much to do with it. I would like to really see a native version as that avoids the whole 64 bit vs 32 bit question, and I would hope it would also help with performance (although the charp-sqlite doesnt reflect any performance gains). My hope is that the Mono team include it, or MS includes it as a native implementation
A minor update – I haven’t tried it yet, but some recent tweaking in the dev branch is yielding improved perf for the .Net port. They are still trying to fix bugs though….
Just wanted to say thanks for the excellent series. I’ve read it 2 or 3 times now!
Simple, Ordered, Fully Explained. Excelent Tutorial Congrats. And thanks you. I’ll use it today, u’ve saved me to use MS Access !! IIUGGHHH !!! Thansk a lot.
Gab.
is it possible to implement this special command (.mod ; .separator and .import …) in C# program ?
These are really built into the SQLite3 application rather than the OLE-DB driver. So the simple answer is no, however if you look at the C# implementation they have ported the whole thing including the main program, via translation – so dont expect the code to be pretty, to C#. So you could lever that, or invoke the native SQLite3 application. It all depends on what you are trying to achieve!
Gareth
Have you tried importing of a very large file? Say on order of 200MB+. I tried it with the native SQLite3.exe and if failed without any messages. Turns out after I looked at the C implementation, the developer implemented the parsing of the file such that it never finds the newlines. I am also not sure if (did not spend enough time looking at the code, to see if there was a memory or file size issue. It appears it may attempt to load the whole file at once which defeats one of the needs I have.
I have not tried a very large file (of that size), but I seem to remember looking at the code a while a go and saw that it loaded the files in chunks – so that shouldn’t be the issue (or perhaps that us my mind playing tricks on me!). However I’ve been using notepad for the generation of the files and it loads them without issue. Does it work on a smaller file? I presume you are using the .import?
Well I tried a quick test of this (rather that diving into the code!). My test was to generate a C# program to quickly create a 342Mb file consisting of 1 int, and 3 text fields in a CSV format. It generated 2,000,000 records for the test. This was then loaded using the “.import” approach outlined above and it ran fine (and quickly). It took approximately 20 seconds to generate a 421Mb SQLite DB and during this load the memory size of sqlite3 remained level and didn’t increase during the load phase.
Hopefully this helps some, if you want my test files I can supply them – but on the surface there doesn’t seem to be an issue here.
Gareth