Recent texture generation methods achieve impressive results due to the powerful generative prior they leverage from large-scale text-to-image diffusion models. However, abstract textual prompts are limited in providing textural or global shape information, which results in the texture generation methods producing blurry or inconsistent patterns. To tackle this, we present FlexiTex, embedding rich information via visual guidance to generate a high-quality texture. The core of FlexiTex is the Visual Guidance Enhancement module, which incorporates more specific information from visual guidance to reduce ambiguity in the text prompt and preserve high-frequency details. To further enhance the visual guidance, we introduce a Direction-Aware Adaptation module that automatically designs direction prompts based on different camera poses, avoiding the Janus problem and maintaining semantically global consistency. Benefiting from the visual guidance, FlexiTex produces quantitatively and qualitatively sound results, demonstrating its potential to advance texture generation for real-world applications.
The framework of FlexiTex. Given a mesh with corrected orientation and text/image prompts, the Visual Guidance Enhancement module extracts visual features to provide informative guidance during the denoising steps. The Direction-Aware Adaptation module then injects direction prompts according to camera poses, employing diffusion models that are more geometrically sensitive. Through iterative texture warpping and rasterization, our method generates high-fidelity textures of both rich details and multi-view consistency.