SQLite for C# – Part 8 – Loading CSV/Pipe into SQLite via command line

Ever wondered how hard it would be to load a CSV file into a SQLite database. I know how I would do it in code, no rocket science needed there! However in this case I wanted to really know the speed of doing this natively and really didn’t want to code anything!

So looking at what SQLite3.exe has too offer it pretty much supports it out of the box. Very nice :-)

Requirements:

  • Loading speed
  • Making the data to consuming applications available asap

While I love C# and frankly its hard to go back to C or C++, sometimes performance trumps the creature comforts we have become accustomed to.

Note: I did this without circling back to a C# implementation as I know the data and performance requirements  are tight and in this case I wanted max performance with no code! The biggest factor to a successful implementation is to ensure you use the tools best for the job, not just the ones you favor in that specific year.

So first things first – create a table to take the input

DROP TABLE IF EXISTS BookSales;
CREATE TABLE IF NOT EXISTS BookSales
(
   Store    int
  ,Date     varchar
  ,OrderReference varchar
  ,Line     int
  ,BookISBN varchar(14)
  ,Quantity int
  ,Price    int
, Primary Key (OrderReference,Line)
);

Next is the magic. We need to load the CSV into the table:

.separator "|"
.import BookSales.txt BookSales

Wow that was easy :-) . You can see we set the separator to be a pipe rather than comma in this case, then the import.

.IMPORT [FileName] [Table]

Now the database is ready to be queried! But if we want to take it just one stage further:

.output SummaryBookSales.csv
SELECT Store, Date, BookISBN, SUM(Quantity), SUM(Price)
FROM BookSales
GROUP BY Store, Date, BookISBN;

Now we output the results of our simple aggregation into a pipe separated output file.

Tying this all together in a single configuration file, which we will call “BookAnalysisLoader.sql”, gives us:

DROP TABLE IF EXISTS BookSales;
CREATE TABLE IF NOT EXISTS BookSales
(
   Store    int
  ,Date     varchar
  ,OrderReference varchar
  ,Line     int
  ,BookISBN varchar(14)
  ,Quantity int
  ,Price    int
, Primary Key (OrderReference,Line)
);

.separator "|"
.import BookSales.txt BookSales

.output SummaryBookSales.csv
SELECT Store, Date, BookISBN, SUM(Quantity), SUM(Price)
FROM BookSales
GROUP BY Store, Date, BookISBN;
.exit

The last piece of the puzzle is the final execution:

sqlite3.exe BookSalesAnalysis.db3 < BookAnalysisLoader.sql

Now we have a newly created database with our analysis data in it, and we have a summary CSV file generated from the output. So we can load the CSV into Excel or another DB, or directly interrogate the DB for more analytical information – and all without coding!

Related Links:

13 Responses to “SQLite for C# – Part 8 – Loading CSV/Pipe into SQLite via command line”

  1. Raimis says:

    Thanks

    You helped me to understand this a little bit better. Too bad there are not much information about c# SQLite around

  2. Gareth says:

    Glad to have helped! There is information out there, the trick is finding it!

  3. Jenny says:

    Thank you! for the very informative and useful writeup. Even those Wrox and Wiley’s heavy and expensive books couldn’t cleared what you have done in few lines.

  4. Sean says:

    Have you played with the .Net port of SQLite? http://code.google.com/p/csharp-sqlite/

    I’d be interested in knowing if you noticed any differences…

  5. Gareth says:

    I’ve only downloaded it enough to see how well the translation went. Got to say I dont like the generated code too much. Outside of that I’ve not had much to do with it. I would like to really see a native version as that avoids the whole 64 bit vs 32 bit question, and I would hope it would also help with performance (although the charp-sqlite doesnt reflect any performance gains). My hope is that the Mono team include it, or MS includes it as a native implementation :-)

  6. Sean says:

    A minor update – I haven’t tried it yet, but some recent tweaking in the dev branch is yielding improved perf for the .Net port. They are still trying to fix bugs though….

  7. Sean says:

    Just wanted to say thanks for the excellent series. I’ve read it 2 or 3 times now!

  8. Gab says:

    Simple, Ordered, Fully Explained. Excelent Tutorial Congrats. And thanks you. I’ll use it today, u’ve saved me to use MS Access !! IIUGGHHH !!! Thansk a lot.

    Gab.

  9. goose1_fr says:

    is it possible to implement this special command (.mod ; .separator and .import …) in C# program ?

  10. Gareth says:

    These are really built into the SQLite3 application rather than the OLE-DB driver. So the simple answer is no, however if you look at the C# implementation they have ported the whole thing including the main program, via translation – so dont expect the code to be pretty, to C#. So you could lever that, or invoke the native SQLite3 application. It all depends on what you are trying to achieve!

    Gareth

  11. Breck says:

    Have you tried importing of a very large file? Say on order of 200MB+. I tried it with the native SQLite3.exe and if failed without any messages. Turns out after I looked at the C implementation, the developer implemented the parsing of the file such that it never finds the newlines. I am also not sure if (did not spend enough time looking at the code, to see if there was a memory or file size issue. It appears it may attempt to load the whole file at once which defeats one of the needs I have.

  12. Gareth says:

    I have not tried a very large file (of that size), but I seem to remember looking at the code a while a go and saw that it loaded the files in chunks – so that shouldn’t be the issue (or perhaps that us my mind playing tricks on me!). However I’ve been using notepad for the generation of the files and it loads them without issue. Does it work on a smaller file? I presume you are using the .import?

  13. Gareth says:

    Well I tried a quick test of this (rather that diving into the code!). My test was to generate a C# program to quickly create a 342Mb file consisting of 1 int, and 3 text fields in a CSV format. It generated 2,000,000 records for the test. This was then loaded using the “.import” approach outlined above and it ran fine (and quickly). It took approximately 20 seconds to generate a 421Mb SQLite DB and during this load the memory size of sqlite3 remained level and didn’t increase during the load phase.

    Hopefully this helps some, if you want my test files I can supply them – but on the surface there doesn’t seem to be an issue here.

    Gareth

Leave a Reply