Create a SQL database from csv files

Discussion in 'App Development' started by lordoftrades, Aug 21, 2012.

  1. 2rosy

    2rosy

    I have no clue about access but you could loop over each file and import the data into sqlite appending to a table

    http://stackoverflow.com/questions/14431646/how-to-write-pandas-dataframe-to-sqlite-with-index
     
    #11     May 28, 2014
  2. aqtrader

    aqtrader

    Exactly as I do. Your program can read binary in machine float or integer format without translation between text to binary every time to query the database which may consist of thousands of csv equivalent files or combined. Extremely fast.
     
    #12     May 28, 2014
  3. Bob111

    Bob111

    i have no idea about VBA,but i've done this in VB6(which is pretty much same thing).
    i can explain in 'general' what is needed:

    you create your database in access(see my post above)
    now you need an ADO connection between your VBA app and Access.
    it can be done programmatically or manually in windows->control panel->administrative tools->data sources(ODBC)->system DSN
    google for more details

    then you basically go to whatever directory your files are stored,get their names via script(again-google it) and put them into string array

    then you go thru the loop from lbound to ubound of this array

    and each time you do so-you create a table named as a file(or whatever name you like) with columns that are in file(you can pick data type as well)

    then you load this cvs file and after that-go thru each record in the file and insert it into a table with same name.

    you can use same or similar procedure for update those tables or you can store all data in one very long table

    that's pretty much it. it's not hard,but rather time consuming. all automated. one button

    here is some food to chew:

    http://www.mrexcel.com/forum/excel-...ual-basic-applications-connection-access.html
     
    #13     May 28, 2014
  4. gmst

    gmst

    I store my data in ascii format (OHLC data) in notepad in windows. I have one notepad file for 1 day each stock. So, since I store data for 100 stocks, I have 100 notepad files for each day. I read these files into memory in vba, for further calculations in excel.

    Is this same as what you are suggesting using binary files? Do you mean binary files = my notepad files or are they something else?
     
    #14     May 29, 2014
  5. newwurldmn

    newwurldmn

    I did the exact same thing for vol surfaces. 1 file per stock per day. It was faster to run through csv files than through the database. So I bought the data (which came in SQL) and wrote scripts to convert them to CSV files. It's not fast enough for real time, but it was fast enough for morning screens and fast enough for analysis on a trade to put on today (where was the vol over the last 1 month?).

    Apparently, you can optimize SQL so that it could be faster (my db had millions and millions of rows (1 row for every option in OPRA for every day), but it wasn't worth my time to figure it out. My DB got slow based on how the data was stored in memory. Using the file system changed that. When I needed a faster query I would use JAVA instead of VBA and that was almost instantaneous. C would be even faster but my C is very rusty.

    I also bought stock price data and that was fast in the database. Not nearly the number of rows (almost 100x fewer than the options table).
     
    #15     May 29, 2014
  6. Bob111

    Bob111

    how do you read those files? how do you load them?
    if you load ASCII (txt,csv or whatever) file into array-you have to go thru each record (line) in that file, stretching(redim) your array as you go thru each record.. binary file(.bin) loading differently,in one shoot (for simplicity sake). i don't know how much faster,but probably at least 10 times. quite a bit of a difference in performance,specially if you working with large amounts of data(tick data for a day let say 1000 stocks). ASCII would take forever to load this amount of data from each file.

    http://www.vb6.us/tutorials/reading-and-writing-binary-files-visual-basic
     
    #16     May 29, 2014
  7. Bob111

    Bob111

    OP-here is how you create table in ms access using code,after you establish connection between DB and your application (this is VB,but should be same in VBA)
    note that i'm creating table for each ticker that was in arrTicker

    For i = LBound(arrTicker) To UBound(arrTicker)


    bChk = TableExists(arrTicker(i))

    If bChk = False Then

    sSQl = "CREATE TABLE " & arrTicker(i) & "(Rec_ID AutoIncrement,Quote_Date Date,Quote_Time Date," & _
    "Bid_Size long, Bid_Price single, Ask_Price single,Ask_Size long,Last_Time Date,Last_Price single,Last_Size long,Total_Vol long,sChange single,sHigh single,sLow single,Prev_Day_Close single,sOpen single);"

    IQ_DB.Execute (sSQl)


    End If


    Next
     
    #17     May 29, 2014
  8. Bob111

    Bob111

    here is how i convert stored streaming data from data base into binary files:

    you define your array first

    Public Type sBin

    ID_Rec As Long '4
    Quote_Date As Date '8
    Quote_Time As Date '8
    Bid_Price As Single '4
    Bid_Size As Long '4
    Ask_Price As Single '4
    Ask_Size As Long '4
    Last_Time As Date '8
    Last_Price As Single '8
    Last_Size As Long '4
    Tot_Vol As Long '4
    Change As Single '4
    High As Single '4
    Low As Single '4
    PrevDayClose As Single '4
    sOpen As Single

    End Type

    Public arrData() As sBin

    so the size in bytes is known ahead,this is why you don't have to redim each time you go thru each record(as you for example do,while filling up this array)

    sQL = "Select Rec_ID,Quote_Date,Quote_Time,Bid_Size,Bid_Price,Ask_Price,Ask_Size,Last_Time,Last_Price,Last_Size,Total_Vol,sChange,sHigh,sLow,Prev_Day_Close,sOpen from " & s & " Order by Rec_ID asc;"

    RS.Open sQL, IQ_DB

    If RS.RecordCount > 0 Then

    CountRecord = CountRecord + RS.RecordCount

    ReDim arrData(RS.RecordCount + 1)

    Do Until RS.EOF

    If i = 0 Then

    ReDim arrData(i)

    arrData(i).ID_Rec = RS("Rec_ID")
    arrData(i).Quote_Date = RS("Quote_Date")
    arrData(i).Quote_Time = RS("Quote_Time")
    arrData(i).Bid_Size = RS("Bid_Size")
    arrData(i).Bid_Price = RS("Bid_Price")
    arrData(i).Ask_Price = RS("Ask_Price")
    arrData(i).Ask_Size = RS("Ask_Size")
    arrData(i).Last_Time = RS("Last_Time")
    arrData(i).Last_Price = RS("Last_Price")
    arrData(i).Last_Size = RS("Last_Size")
    arrData(i).Tot_Vol = RS("Total_Vol")
    arrData(i).Change = RS("sChange")
    arrData(i).High = RS("sHigh")
    arrData(i).Low = RS("sLow")
    arrData(i).PrevDayClose = RS("Prev_Day_Close")
    arrData(i).sOpen = RS("sOpen")

    i = i + 1

    Else

    ReDim Preserve arrData(i)

    arrData(i).ID_Rec = RS("Rec_ID")
    arrData(i).Quote_Date = RS("Quote_Date")
    arrData(i).Quote_Time = RS("Quote_Time")
    arrData(i).Bid_Size = RS("Bid_Size")
    arrData(i).Bid_Price = RS("Bid_Price")
    arrData(i).Ask_Price = RS("Ask_Price")
    arrData(i).Ask_Size = RS("Ask_Size")
    arrData(i).Last_Time = RS("Last_Time")
    arrData(i).Last_Price = RS("Last_Price")
    arrData(i).Last_Size = RS("Last_Size")
    arrData(i).Tot_Vol = RS("Total_Vol")
    arrData(i).Change = RS("sChange")
    arrData(i).High = RS("sHigh")
    arrData(i).Low = RS("sLow")
    arrData(i).PrevDayClose = RS("Prev_Day_Close")
    arrData(i).sOpen = RS("sOpen")


    i = i + 1

    End If

    RS.MoveNext

    Loop

    End If


    RS.Close

    and then you store it like this :

    sQL = GlobalFilePatch & "\" & s & ".bin"

    FileNum = FreeFile

    Open sQL For Binary Access Write Lock Read Write As #FileNum

    Put #FileNum, , arrData

    Close FileNum



    --------------

    and this is how you fill your array from binary file(note the difference between filling up array from ascii and binary):

    Public Function Fill_Binary_Tick(ByVal sPat As String) As Long
    Dim NumRec As Long
    Dim FileNum As Integer

    If Dir(sPat) = "" Then

    Fill_Binary_Tick = 0

    Exit Function

    End If

    Erase arrData

    FileNum = FreeFile

    NumRec = FileLen(sPat) / 76 ' 76 bytes in one line,defined above,remember? in sBin type

    ReDim arrData(NumRec - 1)

    Open sPat For Binary Access Read As #FileNum

    Get #FileNum, , arrData

    Close #FileNum

    Fill_Binary_Tick = NumRec


    End Function
     
    #18     May 29, 2014
  9. Brighton

    Brighton

    Thanks, Bob (and others). I learned some new terms and have some homework ahead of me, but I'm beginning to understand the looping process for multiple files. I'm going to give it a shot because I have additional underused data that I'd like to get into one dbase so I can do something with it.
     
    #19     May 29, 2014
  10. Checkout Stanford's database course in coursera.
    It'll teach you what you need to know to build it right from the start, and will end up saving you time and work in the long run.
     
    #20     May 31, 2014