I’m using Flask-SQLAlchemy to do a rather large bulk insert of 60k rows. I also have a many-to-many relationship on this table, so I can’t use
db.engine.execute for this. Before inserting, I need to find similar items in the database, and change the insert to an update if a duplicate item is found.
I could do this check beforehand, and then do a bulk insert via
db.engine.execute, but I need the primary key of the row upon insertion.
Currently, I am doing a
db.session.commit() on each insert, and I get a measly 3-4 inserts per second.
I ran a profiler to see where the bottleneck is, and it seems that the
db.session.commit() is taking 60% of the time.
Is there some way that would allow me to make this operation faster, perhaps by grouping commits, but which would give me primary keys back?
This is what my models looks like:
class Item(db.Model): id = db.Column(db.Integer, primary_key=True) title = db.Column(db.String(1024), nullable=True) created = db.Column(db.DateTime()) tags_relationship = db.relationship('Tag', secondary=tags, backref=db.backref('items', lazy='dynamic')) tags = association_proxy('tags_relationship', 'text') class Tag(db.Model): id = db.Column(db.Integer, primary_key=True) text = db.Column(db.String(255))
My insert operation is:
for item in items: if duplicate: update_existing_item else: x = Item() x.title = "string" x.created = datetime.datetime.utcnow() for tag in tags: if not tag_already_exists: y = Tag() y.text = "tagtext" x.tags_relationship.append(y) db.session.add(y) db.session.commit() else: x.tags_relationship.append(existing_tag) db.session.add(x) db.session.commit()
Perhaps you should try to
db.session.flush() to send the data to the server, which means any primary keys will be generated. At the end you can
db.session.commit() to actually commit the transaction.
Answered By – wouter bolsterlee
Answer Checked By – Gilberto Lyons (Easybugfix Admin)